├── 01.Python常用語法入門.ipynb ├── 02.Python資料分析入門-打底篇.ipynb ├── 03.Python資料分析入門-資料擷取篇crawler.ipynb ├── 04.Python資料分析應用-語意分析篇NLP.ipynb ├── 05.Python深度學習入門-標準神經網路DNN做手寫辨識(MNIST).ipynb ├── 06.Python資料分析應用-股票分析入門.ipynb ├── 07.抓取網路圖片以CNN進行真實人臉辨識.ipynb ├── 11.標準神經網路NN做手寫辨識(MNIST).ipynb ├── 12.以深度學習VGG19進行風格轉換.ipynb ├── 13.Python資料分析應用-語意分析篇NLP.ipynb ├── LICENSE ├── MQTT_Demo_Publisher.ipynb ├── MQTT_Demo_Subscriber.ipynb ├── data ├── dict.txt.big ├── gossiping.json ├── handwriting_model_architecture.json ├── handwriting_model_weights.h5 ├── id_to_body.json ├── model01.png ├── mov_neg.csv ├── mov_pos.csv └── my-plot.pdf └── readme.md /03.Python資料分析入門-資料擷取篇crawler.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"03.Python資料分析入門-資料擷取篇crawler.ipynb","provenance":[],"collapsed_sections":["Dx_lBqeojcmd","2KGJGR9OBAYy","AiAe7hnEjftP","yGUfAF1GjsBF","uZP7CHTATafy","IWzSAtfvWyYq","WOtmHmThBJcB","sGcVUywikBdi","Bgtv3fFn6AD2","fw1fg9t0ChkF","PbkO2Q-K9QCi","_InTBJdlqQNE","sdsBCb4BkOkK","vnv7fzVMaPCz","Ta5rh-Up1vQe","wyXAXDIN1MBI","rgZmPsh9zZlN"]},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","metadata":{"id":"0bxcSKyjgZtB","colab_type":"text"},"source":["# 爬蟲-網頁資料擷取"]},{"cell_type":"markdown","metadata":{"id":"Dx_lBqeojcmd","colab_type":"text"},"source":["## 概念"]},{"cell_type":"markdown","metadata":{"id":"2KGJGR9OBAYy","colab_type":"text"},"source":["### 網頁的組成\n"]},{"cell_type":"markdown","metadata":{"id":"CTj87dMTfWtr","colab_type":"text"},"source":["\n","- HTML + CSS + JavaScritp\n"," - HTML: 定義網頁的內容、結構\n"," - CSS: 顯示的風格Style\n"," - JS: 行為...\n","\n","- HTML是階層式文件結構,由許多元素(Elements)組成\n","- 一個元素包含開始標籤、結束標籤、屬性及內容\n","\n","`內容`"]},{"cell_type":"markdown","metadata":{"id":"ZE-8bpbtg02X","colab_type":"text"},"source":["#### 常用標籤\n","\n","|標籤名稱|用途|\n","-|-\n","`

~

`| 標題\n","`

`|段落\n","``|超連結\n","\\|表格\n","\\|表格內的row\n","\\|表格內的cell\n","\\
|換行(無結束標籤)"]},{"cell_type":"markdown","metadata":{"id":"1MDh03CGhhdg","colab_type":"text"},"source":["#### 常用屬性(Attributes)\n","\n","|屬性名稱|用途|\n","-|-\n","class|標籤的類別(可重複)\n","id|標籤的id(不可重複)\n","title|標籤的顯示資訊\n","style|標籤的樣式\n","data-*|自行定義的屬性"]},{"cell_type":"markdown","metadata":{"id":"AiAe7hnEjftP","colab_type":"text"},"source":["### 擷取網頁必要知識"]},{"cell_type":"markdown","metadata":{"id":"G9fT8Fykjhi0","colab_type":"text"},"source":["- 在HTTP協定中,定義了多種不同的method做為服務的請求方法,近年來由於行動裝置的普及化,越來越多的產品及網站都提供了WebAPI服務,既然我們要擷取網頁內容,就必須知道對HTTP請求方式。\n","- 在HTTP 1.1的版本中定義了八種 Method (方法),如下所示:\n"," - OPTIONS\n"," - **GET**\n"," - HEAD\n"," - **POST**\n"," - PUT\n"," - DELETE\n"," - TRACE\n"," - CONNECT"]},{"cell_type":"markdown","metadata":{"id":"ckztGOzfjlSG","colab_type":"text"},"source":["- 最常見的method為以下5種:\n","\n","|method|意義|\n","|-|- |\n","|GET|取得(想要的服務)的資料或是狀態。|\n","|POST|如同填表般的行為,以新增一項資料。\n","|PUT|利用更新的方式於\"指定位置\"新增一項資料。\n","|PATCH|在現有的資料欄位中,增加或部分更新一筆新的資料。\n","|DELETE|刪除指定資料。"]},{"cell_type":"markdown","metadata":{"id":"6S1aqUi_jquX","colab_type":"text"},"source":["- 更進一步了解請參閱W3C制定規範[RFC 5789](https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.5)\n","- [淺談 HTTP Method:表單中的 GET 與 POST 有什麼差別?](https://blog.toright.com/posts/1203/%E6%B7%BA%E8%AB%87-http-method%EF%BC%9A%E8%A1%A8%E5%96%AE%E4%B8%AD%E7%9A%84-get-%E8%88%87-post-%E6%9C%89%E4%BB%80%E9%BA%BC%E5%B7%AE%E5%88%A5%EF%BC%9F.html)"]},{"cell_type":"markdown","metadata":{"id":"mxDpTI-8RfSb","colab_type":"text"},"source":["- 另外在網頁與資料庫的操作過程中,也會經常聽到CRUD這個詞,CRUD是指 新增(Create)、讀取(Read)、更新(Update)、刪除(Delete)的主要4個操作資料庫(如MySQL等)常用的功能\n","\n","- 參閱[Day 15 - 實作第一個 CRUD 之 Create、Update、Delete](https://ithelp.ithome.com.tw/articles/10206716)"]},{"cell_type":"markdown","metadata":{"id":"yGUfAF1GjsBF","colab_type":"text"},"source":["### 淺談Restful API\n"]},{"cell_type":"markdown","metadata":{"id":"btWqwWF_juHm","colab_type":"text"},"source":["- REST全名 Resource Representational State Transfer,可譯為具象狀態傳輸,其核心精神在於借用 HTTP 協定做為基礎,讓API規格簡單一致:\n"," - Resource:資源。\n"," - Representational:表現形式,如JSON,XML...\n"," - State Transfer:狀態變化。即上述講到的可利用HTTP動詞們來做呼叫。"]},{"cell_type":"markdown","metadata":{"id":"PigVvHPgjwfe","colab_type":"text"},"source":["- REST指的是網路中Client端和Server端的一種呼叫服務形式,透過既定的規則,滿足約束條件和原則的應用程式設計,對資源的操作包括獲取、創建、修改和刪除資源,可對應資料庫基本操作:新增、讀取、更新、刪除(Create、Read、Update、Delete, **CRUD**)。"]},{"cell_type":"markdown","metadata":{"id":"z9v2bBUVjzsZ","colab_type":"text"},"source":["- 舉例商品WebAPI的interface:\n"," - 獲得商品資料 GET /getItem/9527\n"," - 新增商品資料 POST /createItem\n"," - 更新商品資料 POST /updateItem/\n"," - 刪除商品資料 POST /deleteItem/\n","\n","- 運用RESTful API 開發的WebAPI的interface:\n"," - 獲取商品資料 /GET/items/9527\n"," - 新增商品資料 /POST/items\n"," - 更新商品資料 /PATCH/items/9527\n"," - 刪除商品資料 /DELETE/items/9527\n","\n","- 即便有些離題,但增加網頁常識對蒐集真實世界資料總有助益。"]},{"cell_type":"markdown","metadata":{"id":"TBsilSkyj7t3","colab_type":"text"},"source":["- 延伸閱讀\n"," - [[不是工程師] 休息(REST)式架構- 寧靜式(RESTful)的Web API是現在的潮流?](https://progressbar.tw/posts/53)\n"," >當Web service使用Web API進行介面介接時,每一串我們設計的URL,就會是一個專屬的服務『窗口』。\n"," - [RESTful API 設計準則與實務經驗](https://blog.toright.com/posts/5523/restful-api-%E8%A8%AD%E8%A8%88%E6%BA%96%E5%89%87%E8%88%87%E5%AF%A6%E5%8B%99%E7%B6%93%E9%A9%97.html)"]},{"cell_type":"markdown","metadata":{"id":"uZP7CHTATafy","colab_type":"text"},"source":["### 什麼是網路爬蟲(Web Crawler)"]},{"cell_type":"markdown","metadata":{"id":"sOh4tWNniCUa","colab_type":"text"},"source":["![](https://miro.medium.com/max/1132/1*YfeP5WFbn0MwI76kuTM38w.png)\n","https://blog.apify.com/what-is-web-scraping-1b548f8d6ac1\n"]},{"cell_type":"markdown","metadata":{"id":"TTuweXJRT23m","colab_type":"text"},"source":["- 網路爬蟲像是機器人,自動化的幫你擷取目標資訊\n","- 爬蟲無所不在,谷哥(度娘?)都是"]},{"cell_type":"markdown","metadata":{"id":"gRUO9OXPVT0E","colab_type":"text"},"source":["- 爬蟲應用?\n"," - 熱門遊戲評論、輿論分析系統、銷售分析、旅遊訂票...\n"]},{"cell_type":"markdown","metadata":{"id":"K0NoiLeiiANu","colab_type":"text"},"source":["- 再看一次網頁元素結構\n"," - `

target

`\n"," - `<目標標籤+輔助資訊>目標資訊`"]},{"cell_type":"markdown","metadata":{"id":"pwGvs5JYicez","colab_type":"text"},"source":["- 寫爬蟲之前要注意的\n"," - 有沒有人寫過?\n"," - 該網站是否已經有API供人取用?\n"," - 要有禮貌(大量、頻繁的請求會造成伺服器負荷)"]},{"cell_type":"markdown","metadata":{"id":"IWzSAtfvWyYq","colab_type":"text"},"source":["### 爬蟲的主要步驟"]},{"cell_type":"markdown","metadata":{"id":"_YEX9RhlW9iI","colab_type":"text"},"source":["- 取得指定的HTML資料\n"," - 你有Python的requests模組可以取得HTML\n","- 解析資料取得目標資訊\n"," - 你有Python的 BeautifulSoup模組可以解析HTML\n","- 自動化(Robotic Process Automation, RPA)串起你的服務"]},{"cell_type":"markdown","metadata":{"id":"WOtmHmThBJcB","colab_type":"text"},"source":["### 有禮貌的爬蟲"]},{"cell_type":"markdown","metadata":{"id":"q3i-MAVdBOuY","colab_type":"text"},"source":["- 爬取網站資料時,請勿過於頻繁的索取資料,善用time.sleep()\n"]},{"cell_type":"code","metadata":{"id":"N4_7pmC1Bdpx","colab_type":"code","outputId":"7e833873-ac70-4277-8e9e-d1e334fdb7f8","executionInfo":{"status":"ok","timestamp":1572072862856,"user_tz":-480,"elapsed":3666,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":50}},"source":["import time\n","\n","print('----start----')\n","time.sleep(3)\n","print('----done----')\n"],"execution_count":0,"outputs":[{"output_type":"stream","text":["----start----\n","----done----\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"mx54vzuoCS1d","colab_type":"code","outputId":"08078393-ec9f-4016-c5c8-bd6095ddca2e","executionInfo":{"status":"ok","timestamp":1572072893537,"user_tz":-480,"elapsed":2569,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":50}},"source":["import time\n","import random\n","\n","random_s = 1 + random.randint(0,2) #加入隨機秒數\n","print('----start----')\n","time.sleep( random_s)\n","print('----done----')\n"],"execution_count":0,"outputs":[{"output_type":"stream","text":["----start----\n","----done----\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"0QQXheu3Dnj8","colab_type":"text"},"source":["- 經過SEO的網站可能有允許/禁止爬取的頁面規範,可至該網站網域`https://*.*.*/robots.txt`查看,如`https://www.facebook.com/robots.txt`及`https://twitter.com/robots.txt`\n","- `robots.txt`只是表明不要到網站這些地方,許多web scraping工具會遵循(但也可關掉預設值)\n","\n","- 另外也請注意智慧財產權(Intellectual Property, IP)相關的類型,如商標、著作權、專利,如果有未獲同意、實際傷害及故意,則有觸法之虞。\n"]},{"cell_type":"markdown","metadata":{"id":"iWkiq1Q_GxCK","colab_type":"text"},"source":["- 為了避免頻繁請求被目標伺服器阻擋,測試爬蟲時可採用你的手機(4g)網路,如果被ben,手機改飛航模式一陣子再開4g網路,即會在自動分配(取得)新的IP Address"]},{"cell_type":"markdown","metadata":{"id":"sGcVUywikBdi","colab_type":"text"},"source":["## 開始動手做GET網頁\n"]},{"cell_type":"markdown","metadata":{"id":"iodaXDy83xtV","colab_type":"text"},"source":["### 以example網頁為例"]},{"cell_type":"markdown","metadata":{"id":"RKprYoPf31yH","colab_type":"text"},"source":["\n","- 先觀察目標網頁: http://www.example.com/\n","- 以`requests.get`抓取網頁原始碼,並輸出結果"]},{"cell_type":"code","metadata":{"id":"3GFnb4ACf3ad","colab_type":"code","outputId":"ae8cb3fe-41f6-4ba0-c227-7ac0a69b11cb","executionInfo":{"status":"ok","timestamp":1572073234919,"user_tz":-480,"elapsed":608,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":289}},"source":["import requests\n","\n","\n","res = requests.get('http://www.example.com/')\n","print(res.text[:500])"],"execution_count":0,"outputs":[{"output_type":"stream","text":["\n","\n","\n"," Example Domain\n","\n"," \n"," \n"," \n"," \n"," \n"," \n","
\n","

\n"," Example Domain\n","

\n","

\n"," This domain is for use in illustrative examples in documents. You may use this\n"," domain in literature without prior coordination or asking for permission.\n","

\n","

\n"," \n"," More information...\n"," \n","

\n","
\n"," \n","\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"Bgtv3fFn6AD2","colab_type":"text"},"source":["#### BeautifulSoup的常用函數\n","- `soup.find()` 找一個標籤 tag\n","- 回傳第一個被tag包圍的區塊\n","- 傳入的引數第一個通常是 tag 名稱,第二個引數若未指明屬性就代表 class 名稱,也可以直接使用 id 等屬性去定位區塊。定位到區塊後,可以取出其屬性與包含的字串值"]},{"cell_type":"code","metadata":{"id":"-F3nFNZF6XUA","colab_type":"code","colab":{}},"source":["?soup.find()\n","#soup.find(name=None, attrs={}, recursive=True, text=None, **kwargs)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"9oNQZZcyknE0","colab_type":"code","outputId":"6dc630c9-5624-4433-daec-a082b7644c62","executionInfo":{"status":"ok","timestamp":1572073713068,"user_tz":-480,"elapsed":605,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":50}},"source":["soup.find('p')"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["

This domain is for use in illustrative examples in documents. You may use this\n"," domain in literature without prior coordination or asking for permission.

"]},"metadata":{"tags":[]},"execution_count":13}]},{"cell_type":"markdown","metadata":{"id":"5u8ObFd_a3JZ","colab_type":"text"},"source":["- soup.find_all() 回傳全被tag包圍的區塊,回傳為list"]},{"cell_type":"code","metadata":{"id":"YwlPhDkkk-kl","colab_type":"code","outputId":"f8a1771a-0032-4757-eccc-8cbf424e9382","executionInfo":{"status":"ok","timestamp":1572073727959,"user_tz":-480,"elapsed":599,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":87}},"source":["soup.find_all('p')"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[

This domain is for use in illustrative examples in documents. You may use this\n"," domain in literature without prior coordination or asking for permission.

,\n","

More information...

]"]},"metadata":{"tags":[]},"execution_count":14}]},{"cell_type":"code","metadata":{"id":"K20pi5WQbI4Q","colab_type":"code","outputId":"adcd0b60-4af4-42f6-dc0f-aa669165e6e0","executionInfo":{"status":"ok","timestamp":1572073755664,"user_tz":-480,"elapsed":597,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["a = soup.find(\"a\")\n","a"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["More information..."]},"metadata":{"tags":[]},"execution_count":15}]},{"cell_type":"code","metadata":{"id":"pD0ShQhibZYe","colab_type":"code","outputId":"3a4e3737-29a3-4ea3-851a-17eb20a69ff3","executionInfo":{"status":"ok","timestamp":1572073806255,"user_tz":-480,"elapsed":576,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["a[\"href\"]"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'https://www.iana.org/domains/example'"]},"metadata":{"tags":[]},"execution_count":16}]},{"cell_type":"code","metadata":{"id":"rFyV0EaQ50KG","colab_type":"code","outputId":"2c77866d-6e84-4210-a58f-34755ef0f3dc","executionInfo":{"status":"ok","timestamp":1572073860003,"user_tz":-480,"elapsed":582,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["a.text"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'More information...'"]},"metadata":{"tags":[]},"execution_count":17}]},{"cell_type":"markdown","metadata":{"colab_type":"text","id":"SVhs9e9sbya3"},"source":["- 取得節點文字內容"]},{"cell_type":"code","metadata":{"id":"FMLtwCWo3cHc","colab_type":"code","outputId":"c50b6044-ea0a-450f-e53e-0857946d3c9e","executionInfo":{"status":"ok","timestamp":1572073974054,"user_tz":-480,"elapsed":571,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["title_tag = soup.title\n","title_tag"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["Example Domain"]},"metadata":{"tags":[]},"execution_count":19}]},{"cell_type":"code","metadata":{"id":"TBmNmXIo_2lK","colab_type":"code","outputId":"9105eef7-3ccc-4968-eea2-2933e61da2cd","executionInfo":{"status":"ok","timestamp":1572074006801,"user_tz":-480,"elapsed":608,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["title_tag.string"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'Example Domain'"]},"metadata":{"tags":[]},"execution_count":21}]},{"cell_type":"markdown","metadata":{"id":"UGKFDMlZDCU1","colab_type":"text"},"source":["- 取出節點屬性\n"," - 若要取出 HTML 節點的各種屬性,可以使用 `get`。\n"," - 如果不用`get`也可以擷取屬性,但不存在時會出現錯誤,有礙後續爬蟲執行。\n"," - 使用`get`如無此屬性,回傳結果為none。\n"," - 其他詳細用法可參考 [BeautifulSoup的官方文件](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)"]},{"cell_type":"code","metadata":{"id":"ck6CqC5upT4t","colab_type":"code","colab":{}},"source":["# 會顯示錯誤的例子\n","soup.find(\"p\")['style'] "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"XkEDEZTEpiOF","colab_type":"code","outputId":"c7e05687-5e31-4279-8d24-d91569735cef","executionInfo":{"status":"ok","timestamp":1572074108573,"user_tz":-480,"elapsed":589,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["# 使用get(),未搜尋到的結果回傳為None\n","s = soup.find('p').get('style')\n","print(s)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["None\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"V00MFV48cKif","colab_type":"code","outputId":"6c27f784-a8da-4f5b-da6c-653363c076a4","executionInfo":{"status":"ok","timestamp":1572074136268,"user_tz":-480,"elapsed":634,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["s2 = soup.find('a').get('href')\n","print(s2)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["https://www.iana.org/domains/example\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"OWBddsSqp9_p","colab_type":"text"},"source":["- 延伸閱讀:\n"," - [給初學者的 Python 網頁爬蟲與資料分析 (3) 解構並擷取網頁資料](http://blog.castman.net/%E6%95%99%E5%AD%B8/2016/12/22/python-data-science-tutorial-3.html)\n"," - [Python 使用 Beautiful Soup 抓取與解析網頁資料,開發網路爬蟲教學](https://blog.gtwang.org/programming/python-beautiful-soup-module-scrape-web-pages-tutorial/)"]},{"cell_type":"code","metadata":{"id":"yYarGEZvnoow","colab_type":"code","outputId":"16f6bc71-f5d3-485a-ad8f-69f2e47d2728","executionInfo":{"status":"ok","timestamp":1572074235536,"user_tz":-480,"elapsed":563,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":94}},"source":["# 搜尋節點\n","p_tags = soup.find_all(\"p\")\n","p_tags"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[

This domain is for use in illustrative examples in documents. You may use this\n"," domain in literature without prior coordination or asking for permission.

,\n","

More information...

]"]},"metadata":{"tags":[]},"execution_count":30}]},{"cell_type":"code","metadata":{"id":"c2WOX3w8dKAn","colab_type":"code","outputId":"82ca34ac-2f7e-42d4-cc55-52ee093ea06b","executionInfo":{"status":"ok","timestamp":1572074268530,"user_tz":-480,"elapsed":629,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":67}},"source":["# 搜尋節點並從list取出內容\n","p_tags = soup.find_all(\"p\")\n","\n","for tag in p_tags:\n"," print(tag.string)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["This domain is for use in illustrative examples in documents. You may use this\n"," domain in literature without prior coordination or asking for permission.\n","More information...\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"jNMNUGqIBNRV","colab_type":"code","outputId":"4e546887-2626-4230-d6ed-dc8cd2e41546","executionInfo":{"status":"ok","timestamp":1572074321794,"user_tz":-480,"elapsed":585,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["# 取出節點屬性\n","a_tags = soup.find_all(\"a\")\n","\n","for tag in a_tags:\n"," print(tag['href'] )"],"execution_count":0,"outputs":[{"output_type":"stream","text":["https://www.iana.org/domains/example\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"p7J5TuVsBiKs","colab_type":"text"},"source":["#### 以list同時搜尋多種標籤"]},{"cell_type":"code","metadata":{"id":"HFKxz_XoBe3P","colab_type":"code","outputId":"70ebb509-0758-44bb-d211-ca07bdfc4502","executionInfo":{"status":"ok","timestamp":1572074361715,"user_tz":-480,"elapsed":376,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":161}},"source":["# 搜尋所有超連結與粗體字\n","tags = soup.find_all([\"a\", \"b\", \"p\" ,\"div\"])\n","print(tags)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["[
\n","

Example Domain

\n","

This domain is for use in illustrative examples in documents. You may use this\n"," domain in literature without prior coordination or asking for permission.

\n","

More information...

\n","
,

This domain is for use in illustrative examples in documents. You may use this\n"," domain in literature without prior coordination or asking for permission.

,

More information...

, More information...]\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"8iCzPmRpB-Em","colab_type":"text"},"source":["#### `find_all()`以`limit`參數限制搜尋數量\n","- 只有1個就可以改為`find()`"]},{"cell_type":"code","metadata":{"id":"QdWbrjToCZJe","colab_type":"code","outputId":"19cfb3ce-02cd-4fe0-eb02-9a2b6ebd1f1e","executionInfo":{"status":"ok","timestamp":1572074391893,"user_tz":-480,"elapsed":596,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["# 限制搜尋結果數量\n","tags = soup.find_all([\"a\", \"b\"], limit=2)\n","print(tags)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["[More information...]\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"fw1fg9t0ChkF","colab_type":"text"},"source":["#### 限制`find_all()`關閉遞迴搜尋\n","- 預設find_all()會遞迴搜尋所有子節點\n","- 以`recursive=False` 關閉遞迴搜尋功能"]},{"cell_type":"code","metadata":{"id":"f77l2SitC2Oq","colab_type":"code","outputId":"565918ce-c684-4939-afef-97890256127d","executionInfo":{"status":"ok","timestamp":1572074409091,"user_tz":-480,"elapsed":597,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["# 不使用遞迴搜尋,僅尋找次一層的子節點\n","soup.html.find_all(\"title\", recursive=False)"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[]"]},"metadata":{"tags":[]},"execution_count":38}]},{"cell_type":"code","metadata":{"id":"hzxqRSC77tc_","colab_type":"code","colab":{}},"source":["# 不指定標籤,但找出所有屬性 class = \"zzz\" 的標籤 \n","print(soup.find_all(\"\", {\"class\":\"zzz\"}))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"eKV3k44i7j-y","colab_type":"code","colab":{}},"source":["# 找出所有 td 標籤的第三個並找出其中的 a 標籤 \n","print(soup.find_all(\"td\")[2].find(\"a\")) "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"SqziFClS70k9","colab_type":"code","colab":{}},"source":["# 找出所有內容等於 Example Domain 的文字 \n","print(soup.find_all(text=\"Example Domain\"))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"eShAJ7Ff73le","colab_type":"code","colab":{}},"source":["# 找出第一個 a 標籤並印出屬性 \n","print(soup.find(\"a\").attrs) \n","print(soup.find(\"a\")[\"href\"]) "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"uUNaqQ-w881S","colab_type":"code","colab":{}},"source":["#找出所有 td 標籤,並用 len 計算長度 \n","print(len(soup.find_all(\"a\")))\n","\n","# 找到 div 標籤,屬性 id = \"id1\",再印出其內容 \n","print(soup.find(\"div\", id=\"id1\").text)\n","# 透過觀察網頁可以發現 列3欄3 有個 id = hyperlink 可 以幫助我們定位這個 tag,再把 tag 的 href 找出來 print(soup.find(\"a\", {\"id\":\"hyperlink\"})[\"href\"])"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"PbkO2Q-K9QCi","colab_type":"text"},"source":["### 結合正規表達式regular expression進行搜尋\n"]},{"cell_type":"markdown","metadata":{"id":"eNSsgL9jx250","colab_type":"text"},"source":["- 正規表達式對於精準抓取網頁的各種標籤及內文非常有幫助,解決了許多Xpath與CSS selector無法精確擷取的問題,有必要好好理解。\n","- 擷取的文句段落可以使用[regex101.com](https://regex101.com/)測試。"]},{"cell_type":"markdown","metadata":{"id":"ZdytI0fA9tDP","colab_type":"text"},"source":["|意義|表示|範例|\n","|-|-|-|\n","|Start|`^`|123ABC `/^1/`\n","|End|`$`|123ABC `/5$/`\n","|Range|`[-End>]`|123ABC `/^[0-2]/`\n","|Number|`\\d`|123ABC `/^\\d/`\n","|Character|`\\w`|123ABC `/\\w$/`\n","|Invisible Character|`\\s`|`Tab, Space, Escape, …`\n","|Zero or One|`+`|\n","|Zero or More|`*`|123ABC `/\\w+$/`\n","|One or More|`?`|123ABC `/[0-2]/`\n","|Named Group|`(?Pexpression)`|\n","|Named Group|`(?expression)`|\n"]},{"cell_type":"markdown","metadata":{"id":"bhyq1jQH_Hzm","colab_type":"text"},"source":["#### Python的re模組\n","- 推薦使用re.findall()\n","- 可至[regex101](https://regex101.com/)嘗試"]},{"cell_type":"markdown","metadata":{"id":"tMcBImeQIiDh","colab_type":"text"},"source":["##### 參考寫法\n","```python\n","import re\n","\n","# 找出所有內容等於 python_crawler 的文字 \n","pattern = \"我寫好的 regular expression\" \n","string = \"我想要找的字串\" \n","re.findall(pattern, string)\n","```"]},{"cell_type":"code","metadata":{"id":"usSdU3aN_QLS","colab_type":"code","outputId":"47519db1-ddd9-4f07-b21c-8d6bfa6e5232","executionInfo":{"status":"ok","timestamp":1572074773862,"user_tz":-480,"elapsed":568,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["import re\n","\n","# 找出所有內容等於 python_crawler 的文字 \n","pattern = \"的\" \n","string = \"我想要找的字串\" #resquests.text也是字串\n","re.findall(pattern, string)"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["['的']"]},"metadata":{"tags":[]},"execution_count":40}]},{"cell_type":"code","metadata":{"id":"6IuiFf38JGly","colab_type":"code","outputId":"3d6bf2c0-f8c4-4f50-fc8e-35463e3120d8","executionInfo":{"status":"ok","timestamp":1572075179869,"user_tz":-480,"elapsed":570,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["import re\n","\n","# 找出html裡的超連結 \n","pattern = r'href=\\\"(.*)\\\"|href=\\'(.*)\\'' #參閱https://regex101.com/r/uw6MLH/1\n","string = res.text\n","re.findall(pattern, string)[0][0]"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'https://www.iana.org/domains/example'"]},"metadata":{"tags":[]},"execution_count":45}]},{"cell_type":"markdown","metadata":{"id":"p2XMf03gyj0v","colab_type":"text"},"source":["##### 更多re"]},{"cell_type":"code","metadata":{"id":"UbgkXvVtyio0","colab_type":"code","colab":{}},"source":["import re\n","\n","let re = //;\n","\n","# Find First Match\n","match = re.search(, )\n","\n","let match = re.exec();\n","\n","# Find All Matches\n","match = re.findall(, )\n","\n","# Get Matched Groups\n","match.group()\n","match.group()\n","\n","# Get Matched Groups\n","match[]\n","match.groups[]\n","\n","#Split\n","re.split(, )\n","# Replace\n","re.sub(, , )"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"_InTBJdlqQNE","colab_type":"text"},"source":["## 網頁擷取實例\n"]},{"cell_type":"markdown","metadata":{"id":"RrUdffk-pK7M","colab_type":"text"},"source":["### 以PPT 為例\n"]},{"cell_type":"markdown","metadata":{"id":"qF4cq73HpMg1","colab_type":"text"},"source":["- 這邊開始要示範使用Chrome開發者工具進行搜尋\n","- 先觀察目標網頁: https://www.ptt.cc/bbs/StupidClown/index.html\n","- 使用Chrome瀏覽器,以滑鼠右鍵選擇「檢查」,快捷鍵在windows環境為ctrl+Shift+I或F12\n","\n","- 另外如果要用別人寫好的,參閱https://dotblogs.com.tw/codinghouse/2018/10/22/pttcrawler"]},{"cell_type":"markdown","metadata":{"id":"XMdTi4BBfd-C","colab_type":"text"},"source":["![](https://i.imgur.com/K55v4SH.png)\n"]},{"cell_type":"markdown","metadata":{"id":"JiNSwFzUhMb9","colab_type":"text"},"source":["- 文章列表可以觀察到推文數、文章標題、作者、日期及文章連結\n","- 我們先觀察他的樹狀結構,對應的標籤與屬性\n","- 以COPY XPath紀錄\n","\n","|名稱|selector|\n","-|-\n","標題|`//*[@id=\"main-container\"]/div[2]/div[4]/div[2]/a`\n","連結|`//*[@id=\"main-container\"]/div[2]/div[4]/div[2]/a`"]},{"cell_type":"code","metadata":{"id":"nFIcyv2v6hRg","colab_type":"code","colab":{}},"source":["#目標網址https://www.ptt.cc/bbs/StupidClown/index.html\n","import requests\n","from bs4 import BeautifulSoup\n","\n","res = requests.get('https://www.ptt.cc/bbs/StupidClown/index.html')\n","soup = BeautifulSoup(res.text ,\"lxml\")\n","print(res.text[:500])"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"BSjTlja9pqx_","colab_type":"code","colab":{}},"source":["print(soup.prettify())"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"oW-IKihpv0HR","colab_type":"text"},"source":["- 有抓到網頁,接下來如果簡單針對連結、標題的話,觀察都在div標籤的class='title'裡"]},{"cell_type":"code","metadata":{"id":"2NorSJDRXBok","colab_type":"code","outputId":"9acd6a36-3a8d-4d69-bf61-00d453b34dbd","executionInfo":{"status":"ok","timestamp":1572075562153,"user_tz":-480,"elapsed":592,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":262}},"source":["results = soup.select(\"div.title\")\n","print(results)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["[, , , , , ]\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"vRrphsEiq8YP","colab_type":"code","outputId":"2f7228db-538b-4511-e508-fc75d4d5ec4f","executionInfo":{"status":"ok","timestamp":1572075593395,"user_tz":-480,"elapsed":757,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":145}},"source":["article_href = soup.select(\"div.title a\")\n","article_href"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[[無言] 誰會委託同事買餐巾紙?,\n"," Fw: [問卦] 鮭魚握壽司被蒸熟了該怎麼辦?,\n"," [無言] 牙線棒卡在牙套上,\n"," [公告] 笨板板規,\n"," [公告]本板即日起不可PO問卷文,\n"," [公告] 10月份置底閒聊文]"]},"metadata":{"tags":[]},"execution_count":51}]},{"cell_type":"code","metadata":{"id":"zKnxpxaorHjr","colab_type":"code","outputId":"34acdd84-d21c-4a2e-cfe7-c8b4321d50d7","executionInfo":{"status":"ok","timestamp":1572075835147,"user_tz":-480,"elapsed":6039,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":319}},"source":["# 逐一取出標題、合併超連結\n","for a in article_href:\n"," print('title:', a.text)\n"," print('href:','https://www.ptt.cc'+a['href'])\n","\n"," #打開連結內的網頁並另存\n"," content_url = 'https://www.ptt.cc'+ a['href']\n"," r = requests.get(content_url)\n"," with open ( a.text + '.html', 'w+') as f:\n"," f.write(r.text)\n"," print('saved')"],"execution_count":0,"outputs":[{"output_type":"stream","text":["title: [無言] 誰會委託同事買餐巾紙?\n","href: https://www.ptt.cc/bbs/StupidClown/M.1572014689.A.F9B.html\n","saved\n","title: Fw: [問卦] 鮭魚握壽司被蒸熟了該怎麼辦?\n","href: https://www.ptt.cc/bbs/StupidClown/M.1572064384.A.96C.html\n","saved\n","title: [無言] 牙線棒卡在牙套上\n","href: https://www.ptt.cc/bbs/StupidClown/M.1572068103.A.43A.html\n","saved\n","title: [公告] 笨板板規\n","href: https://www.ptt.cc/bbs/StupidClown/M.1158735717.A.828.html\n","saved\n","title: [公告]本板即日起不可PO問卷文\n","href: https://www.ptt.cc/bbs/StupidClown/M.1435710970.A.31E.html\n","saved\n","title: [公告] 10月份置底閒聊文\n","href: https://www.ptt.cc/bbs/StupidClown/M.1569938128.A.51D.html\n","saved\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"MkRd8kgpuIQh","colab_type":"code","outputId":"9978aa63-838b-490f-f091-4d67c5fa8bab","executionInfo":{"status":"ok","timestamp":1572075857697,"user_tz":-480,"elapsed":1769,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":134}},"source":["%ls"],"execution_count":0,"outputs":[{"output_type":"stream","text":["'Fw: [問卦] 鮭魚握壽司被蒸熟了該怎麼辦?.html'\n"," \u001b[0m\u001b[01;34msample_data\u001b[0m/\n","'[公告] 10月份置底閒聊文.html'\n","'[公告]本板即日起不可PO問卷文.html'\n","'[公告] 笨板板規.html'\n","'[無言] 牙線棒卡在牙套上.html'\n","'[無言] 誰會委託同事買餐巾紙?.html'\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"1vyg4HrFLfVJ","colab_type":"code","colab":{}},"source":["#需滿18歲要加cookies\n","import requests\n","\n","def fetch(url):\n"," response = requests.get(url)\n"," response = requests.get(url, cookies={'over18': '1'}) # 一直向 server 回答滿 18 歲了 !\n"," return response\n","\n","url = 'https://www.ptt.cc/bbs/Gossiping/index.html'\n","resp = fetch(url) # step-1\n","\n","print(resp.text) # result of setp-1"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"lDF2XVSvNs_4","colab_type":"text"},"source":["- 更多可參考[爬蟲教學 CrawlerTutorial](https://github.com/leVirve/CrawlerTutorial)"]},{"cell_type":"markdown","metadata":{"id":"sdsBCb4BkOkK","colab_type":"text"},"source":["### 以wiki亞洲國家資訊為例"]},{"cell_type":"markdown","metadata":{"id":"rINKbz4-kXl8","colab_type":"text"},"source":["- 參考來源[Web Scraping Wikipedia Tables using BeautifulSoup and Python](https://medium.com/analytics-vidhya/web-scraping-wiki-tables-using-beautifulsoup-and-python-6b9ea26d8722)"]},{"cell_type":"code","metadata":{"id":"MwJK3gS2kpmJ","colab_type":"code","colab":{}},"source":["import requests\n","\n","website_url = requests.get('https://en.wikipedia.org/wiki/\\\n"," List_of_Asian_countries_by_area').text\n","\n","from bs4 import BeautifulSoup\n","soup = BeautifulSoup(website_url,'lxml')\n","print(soup.prettify())"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ysX4Mq1JlKTT","colab_type":"text"},"source":["![](https://miro.medium.com/max/740/1*NyaaGqqHnemKSWu8DQqUHQ.png)"]},{"cell_type":"code","metadata":{"id":"ELYuAc4plPCN","colab_type":"code","colab":{}},"source":["My_table = soup.find(\"table\",{\"class\":\"wikitable sortable\"})\n","My_table"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"vPKOb288lhI1","colab_type":"code","colab":{}},"source":["links = My_table.findAll('a')\n","links"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"m7n_vzVClthW","colab_type":"code","colab":{}},"source":["Country = [ link.get('title') for link in links if link.get('title') != None]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"lD4Uf9lVDNxL","colab_type":"code","colab":{}},"source":["Country"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"5QaTp8NcmdGZ","colab_type":"code","colab":{}},"source":["import pandas as pd\n","\n","df = pd.DataFrame()\n","df['Country'] = Country\n","df"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"0Hdt3qwunW_g","colab_type":"code","colab":{}},"source":["df = df.sort_values(by=\"Country\")\n","df.reset_index(drop = True)\n","df"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"vnv7fzVMaPCz","colab_type":"text"},"source":["## 練習"]},{"cell_type":"markdown","metadata":{"id":"Ta5rh-Up1vQe","colab_type":"text"},"source":["###  練習1"]},{"cell_type":"markdown","metadata":{"id":"E76LBd6BaSRE","colab_type":"text"},"source":["- 試著看懂並執行、拆解以下程式\n","- 程式來源https://github.com/jwlin/web-crawler-tutorial/blob/master/ch3/ptt_gossiping.py"]},{"cell_type":"code","metadata":{"id":"8HJHhY5qaJtf","colab_type":"code","outputId":"cc214cef-ddf0-456e-9a12-2bc8f4c9868c","executionInfo":{"status":"ok","timestamp":1572076619044,"user_tz":-480,"elapsed":22301,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":581}},"source":["import requests\n","import time\n","import json\n","from bs4 import BeautifulSoup\n","\n","\n","PTT_URL = 'https://www.ptt.cc'\n","\n","\n","def get_web_page(url):\n"," resp = requests.get(\n"," url=url,\n"," cookies={'over18': '1'}\n"," )\n"," if resp.status_code != 200:\n"," print('Invalid url:', resp.url)\n"," return None\n"," else:\n"," return resp.text\n","\n","\n","def get_articles(dom, date):\n"," soup = BeautifulSoup(dom, 'html5lib')\n","\n"," # 取得上一頁的連結\n"," paging_div = soup.find('div', 'btn-group btn-group-paging')\n"," prev_url = paging_div.find_all('a')[1]['href']\n","\n"," articles = [] # 儲存取得的文章資料\n"," divs = soup.find_all('div', 'r-ent')\n"," for d in divs:\n"," if d.find('div', 'date').text.strip() == date: # 發文日期正確\n"," # 取得推文數\n"," push_count = 0\n"," push_str = d.find('div', 'nrec').text\n"," if push_str:\n"," try:\n"," push_count = int(push_str) # 轉換字串為數字\n"," except ValueError:\n"," # 若轉換失敗,可能是'爆'或 'X1', 'X2', ...\n"," # 若不是, 不做任何事,push_count 保持為 0\n"," if push_str == '爆':\n"," push_count = 99\n"," elif push_str.startswith('X'):\n"," push_count = -10\n","\n"," # 取得文章連結及標題\n"," if d.find('a'): # 有超連結,表示文章存在,未被刪除\n"," href = d.find('a')['href']\n"," title = d.find('a').text\n"," author = '' # author = d.find('div', 'author').text if d.find('div', 'author') else ''\n"," articles.append({\n"," 'title': title,\n"," 'href': href,\n"," 'push_count': push_count,\n"," 'author': author\n"," })\n"," return articles, prev_url\n","\n","\n","def get_author_ids(posts, pattern):\n"," ids = set()\n"," for post in posts:\n"," if pattern in post['author']:\n"," ids.add(post['author'])\n"," return ids\n","\n","if __name__ == '__main__':\n"," current_page = get_web_page(PTT_URL + '/bbs/Gossiping/index.html')\n"," if current_page:\n"," articles = [] # 全部的今日文章\n"," today = time.strftime(\"%m/%d\").lstrip('0') # 今天日期, 去掉開頭的 '0' 以符合 PTT 網站格式\n"," current_articles, prev_url = get_articles(current_page, today) # 目前頁面的今日文章\n"," while current_articles: # 若目前頁面有今日文章則加入 articles,並回到上一頁繼續尋找是否有今日文章\n"," articles += current_articles\n"," current_page = get_web_page(PTT_URL + prev_url)\n"," current_articles, prev_url = get_articles(current_page, today)\n","\n"," # 印出所有不同的 5566 id\n"," # print(get_author_ids(articles, '5566'))\n","\n"," # 儲存或處理文章資訊\n"," print('今天有', len(articles), '篇文章')\n"," threshold = 50\n"," print('熱門文章(> %d 推):' % (threshold))\n"," for a in articles:\n"," if int(a['push_count']) > threshold:\n"," print(a)\n"," with open('gossiping.json', 'w', encoding='utf-8') as f:\n"," json.dump(articles, f, indent=2, sort_keys=True, ensure_ascii=False)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["今天有 776 篇文章\n","熱門文章(> 50 推):\n","{'title': '[新聞] 星宇航空首架機交付 張國煒親自駕回台', 'href': '/bbs/Gossiping/M.1572072025.A.F0A.html', 'push_count': 57, 'author': ''}\n","{'title': '[新聞] 同志遊行前夕 蘇貞昌:相互尊重讓台灣更', 'href': '/bbs/Gossiping/M.1572070242.A.738.html', 'push_count': 58, 'author': ''}\n","{'title': '[新聞] 違法三缺一?中國觸怒民怨的「麻將館禁令」', 'href': '/bbs/Gossiping/M.1572068601.A.93D.html', 'push_count': 60, 'author': ''}\n","{'title': '[新聞] 進台北市區注意!明天同志遊行中午起交通', 'href': '/bbs/Gossiping/M.1572066682.A.275.html', 'push_count': 54, 'author': ''}\n","{'title': '[新聞] 非洲豬瘟肆虐 菲律賓養豬業每月損失約6億', 'href': '/bbs/Gossiping/M.1572066826.A.A34.html', 'push_count': 80, 'author': ''}\n","{'title': '[問卦] 有沒有做出PttChrome計算推樓插件的八卦', 'href': '/bbs/Gossiping/M.1572066384.A.30B.html', 'push_count': 99, 'author': ''}\n","{'title': '[爆卦] 中國食品凍漲擋不住了', 'href': '/bbs/Gossiping/M.1572065077.A.9C0.html', 'push_count': 99, 'author': ''}\n","{'title': '[新聞] 貧窮線調高! 北市月收不到2萬4293元就', 'href': '/bbs/Gossiping/M.1572064119.A.A22.html', 'push_count': 55, 'author': ''}\n","{'title': '[爆卦] GD權志龍退伍了!', 'href': '/bbs/Gossiping/M.1572064886.A.775.html', 'push_count': 61, 'author': ''}\n","{'title': 'Re: [新聞] 李亞萍哭認余苑綺「是末期了」:心裡有數', 'href': '/bbs/Gossiping/M.1572063783.A.974.html', 'push_count': 58, 'author': ''}\n","{'title': '[新聞] 卓榮泰說沒民進黨柯文哲會落選?對手丁守', 'href': '/bbs/Gossiping/M.1572063861.A.27F.html', 'push_count': 51, 'author': ''}\n","{'title': '[新聞] 謝震武紅遍政論節目 竟讓劉寶傑好委屈', 'href': '/bbs/Gossiping/M.1572063933.A.371.html', 'push_count': 71, 'author': ''}\n","{'title': '[問卦] 癌症三期了會治療還是放棄?', 'href': '/bbs/Gossiping/M.1572061470.A.980.html', 'push_count': 54, 'author': ''}\n","{'title': '[新聞] 韓國瑜「國外遊學」支票 蔡英文:韓也說', 'href': '/bbs/Gossiping/M.1572061561.A.6B0.html', 'push_count': 99, 'author': ''}\n","{'title': 'Re: [問卦] 故宮的東西在世界上算厲害嗎', 'href': '/bbs/Gossiping/M.1572061164.A.F36.html', 'push_count': 99, 'author': ''}\n","{'title': '[新聞] 地府只能收美金?中國新法令冥紙禁印人', 'href': '/bbs/Gossiping/M.1572060525.A.E97.html', 'push_count': 56, 'author': ''}\n","{'title': '[新聞] 強國人為何想偷渡英國?\\u3000華春瑩嗆CNN:', 'href': '/bbs/Gossiping/M.1572059049.A.526.html', 'push_count': 99, 'author': ''}\n","{'title': 'Re: [新聞] 旺中「跳船」?《旺報》社長砲轟韓國瑜', 'href': '/bbs/Gossiping/M.1572059203.A.FE5.html', 'push_count': 69, 'author': ''}\n","{'title': '[問卦] 認真問,去駕訓班學手排.還是自己考自排?', 'href': '/bbs/Gossiping/M.1572058150.A.FBA.html', 'push_count': 59, 'author': ''}\n","{'title': '[問卦] 可以數位故宮那能不能數位遊學?!', 'href': '/bbs/Gossiping/M.1572056208.A.0EE.html', 'push_count': 64, 'author': ''}\n","{'title': 'Re: [新聞] 快訊/韓國瑜晚間開支票:只要當總統,大', 'href': '/bbs/Gossiping/M.1572051391.A.DCD.html', 'push_count': 99, 'author': ''}\n","{'title': '[新聞] 美軍盼晶片在地生產 台積電評估赴美建新', 'href': '/bbs/Gossiping/M.1572051828.A.2F6.html', 'push_count': 99, 'author': ''}\n","{'title': '[新聞] 台北警公務車載女友!女方「屁蛋妹」身份', 'href': '/bbs/Gossiping/M.1572050526.A.01D.html', 'push_count': 53, 'author': ''}\n","{'title': 'Re: [新聞] 快訊/韓國瑜晚間開支票:只要當總統,大', 'href': '/bbs/Gossiping/M.1572048367.A.98A.html', 'push_count': 99, 'author': ''}\n","{'title': '[問卦] 你周遭朋友,買過春的男生多嗎?', 'href': '/bbs/Gossiping/M.1572037109.A.44C.html', 'push_count': 88, 'author': ''}\n","{'title': '[問卦] 香港反送中為何不平息?', 'href': '/bbs/Gossiping/M.1572029681.A.250.html', 'push_count': 51, 'author': ''}\n","{'title': '[新聞] 死刑「注射針」插歪!他痛苦3倍時間才身亡', 'href': '/bbs/Gossiping/M.1572026898.A.906.html', 'push_count': 70, 'author': ''}\n","{'title': 'Re: [新聞] 藏人自焚說 柯文哲:用字不精確但我意思', 'href': '/bbs/Gossiping/M.1572026908.A.C09.html', 'push_count': 53, 'author': ''}\n","{'title': 'Re: [問卦] 五月天跪中國了?', 'href': '/bbs/Gossiping/M.1572020507.A.C3B.html', 'push_count': 99, 'author': ''}\n","{'title': '[新聞] 告別立院演說 王金平:太陽花學運不流血落幕 我無愧天地', 'href': '/bbs/Gossiping/M.1572019252.A.AE8.html', 'push_count': 99, 'author': ''}\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"wyXAXDIN1MBI","colab_type":"text"},"source":["### 練習2"]},{"cell_type":"markdown","metadata":{"id":"l9CxzWKIcqEZ","colab_type":"text"},"source":["- 試著看懂並執行、拆解以下程式\n","- 程式來源https://github.com/jwlin/web-crawler-tutorial/blob/master/ch3/yahoo_movie.py"]},{"cell_type":"code","metadata":{"id":"mq4Xi5X-cc9A","colab_type":"code","outputId":"1666f312-3144-4b81-99ce-45f0511f9a08","executionInfo":{"status":"ok","timestamp":1572076585531,"user_tz":-480,"elapsed":2183,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":211}},"source":["import requests\n","import re\n","import json\n","from bs4 import BeautifulSoup\n","\n","\n","Y_MOVIE_URL = 'https://tw.movies.yahoo.com/movie_thisweek.html'\n","\n","# 以下網址後面加上 \"/id=MOVIE_ID\" 即為該影片各項資訊\n","Y_INTRO_URL = 'https://tw.movies.yahoo.com/movieinfo_main.html' # 詳細資訊\n","Y_PHOTO_URL = 'https://tw.movies.yahoo.com/movieinfo_photos.html' # 劇照\n","Y_TIME_URL = 'https://tw.movies.yahoo.com/movietime_result.html' # 時刻表\n","\n","\n","def get_web_page(url):\n"," resp = requests.get(url)\n"," if resp.status_code != 200:\n"," print('Invalid url:', resp.url)\n"," return None\n"," else:\n"," return resp.text\n","\n","\n","def get_movies(dom):\n"," soup = BeautifulSoup(dom, 'html5lib')\n"," movies = []\n"," rows = soup.find_all('div', 'release_info_text')\n"," for row in rows:\n"," movie = dict()\n"," movie['expectation'] = row.find('div', 'leveltext').span.text.strip()\n"," movie['ch_name'] = row.find('div', 'release_movie_name').a.text.strip()\n"," movie['eng_name'] = row.find('div', 'release_movie_name').find('div', 'en').a.text.strip()\n"," movie['movie_id'] = get_movie_id(row.find('div', 'release_movie_name').a['href'])\n"," movie['poster_url'] = row.parent.find_previous_sibling('div', 'release_foto').a.img['src']\n"," movie['release_date'] = get_date(row.find('div', 'release_movie_time').text)\n"," movie['intro'] = row.find('div', 'release_text').text.replace(u'詳全文', '').strip()\n"," trailer_a = row.find_next_sibling('div', 'release_btn color_btnbox').find_all('a')[1]\n"," movie['trailer_url'] = trailer_a['href'] if 'href' in trailer_a.attrs.keys() else ''\n"," movies.append(movie)\n"," return movies\n","\n","\n","def get_date(date_str):\n"," # e.g. \"上映日期:2017-03-23\" -> match.group(0): \"2017-03-23\"\n"," pattern = '\\d+-\\d+-\\d+'\n"," match = re.search(pattern, date_str)\n"," if match is None:\n"," return date_str\n"," else:\n"," return match.group(0)\n","\n","\n","def get_movie_id(url):\n"," # 20180515: URL 格式有變, e.g., 'https://movies.yahoo.com.tw/movieinfo_main/%E6%AD%BB%E4%BE%8D2-deadpool-2-7820.html\n"," # e.g., \"https://tw.rd.yahoo.com/referurl/movie/thisweek/info/*https://tw.movies.yahoo.com/movieinfo_main.html/id=6707\"\n"," # -> match.group(0): \"/id=6707\"\n"," try:\n"," movie_id = url.split('.html')[0].split('-')[-1]\n"," except:\n"," movie_id = url\n"," return movie_id\n","\n","\n","def get_trailer_url(url):\n"," # e.g., 'https://tw.rd.yahoo.com/referurl/movie/thisweek/trailer/*https://tw.movies.yahoo.com/video/美女與野獸-最終版預告-024340912.html'\n"," return url.split('*')[1]\n","\n","\n","def get_complete_intro(movie_id):\n"," page = get_web_page(Y_INTRO_URL + '/id=' + movie_id)\n"," if page:\n"," soup = BeautifulSoup(page, 'html5lib')\n"," infobox = soup.find('div', 'gray_infobox_inner')\n"," print(infobox.text.strip())\n","\n","\n","def main():\n"," page = get_web_page(Y_MOVIE_URL)\n"," if page:\n"," movies = get_movies(page)\n"," for movie in movies:\n"," print(movie)\n"," # get_complete_intro(movie[\"movie_id\"])\n"," with open('movie.json', 'w', encoding='utf-8') as f:\n"," json.dump(movies, f, indent=2, sort_keys=True, ensure_ascii=False)\n","\n","\n","if __name__ == '__main__':\n"," main()"],"execution_count":0,"outputs":[{"output_type":"stream","text":["{'expectation': '90%', 'ch_name': '雙子殺手', 'eng_name': 'Gemini Man', 'movie_id': '10017', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/October2019/YSeFXqDiSK0XrIcSdHi1-486x720.JPG', 'release_date': '2019-10-23', 'intro': '威爾史密斯飾演的一名頂尖殺手亨利布羅根,卻被一位神秘的年輕殺手追殺,而且這位年輕的殺手竟然能事先預測亨利的一舉一動。《雙子殺手》由金像獎得主李安執導,知名製作人傑瑞布洛克海默、大衛艾利森、戴娜戈柏和唐葛蘭傑製作。其他演員還有瑪麗伊莉莎白文斯蒂德、克里夫歐文和班奈狄克王。', 'trailer_url': 'https://movies.yahoo.com.tw/video/%E9%9B%99%E5%AD%90%E6%AE%BA%E6%89%8B-%E6%9C%80%E6%96%B0%E9%A0%90%E5%91%8A-035402781.html?movie_id=10017'}\n","{'expectation': '88%', 'ch_name': '電影版 吹響吧!上低音號~誓言的終章~', 'eng_name': 'Sound! Euphonium, the Movie -Our Promise: A Brand New Day-', 'movie_id': '10306', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/October2019/ZXplEx71mOQQlcTpxiYd-992x1418.JPG', 'release_date': '2019-10-23', 'intro': '順利在去年的全日本管樂競演會中出場的北宇治高中管樂社,升上二年級的黃前久美子和三年級的加部友惠,兩人開始一起負責指導從四月開始新加入的一年級新生。\\n\\xa0\\n由於身為全國大賽的出場學校,因此吸引了大量的一年級新生入部,其中,有四名新生來到了低音部,包括乍看之下似乎毫無問題的久石奏、不融入周圍的鈴木美玲、想要和美玲成為朋友的鈴木五月,以及從不提及私事的月永求。面對接連而來的Sunrise祭、選拔賽以及競演會,以「全國大賽金獎」為目標的管樂社,卻接連發生問題……!?北宇治高中管樂社,風波不斷的日子開始了!', 'trailer_url': ''}\n","{'expectation': '56%', 'ch_name': '日本國民導演—山田洋次影展', 'eng_name': '', 'movie_id': '10299', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/October2019/CLNjaW36KsCZoGzmmhMn-509x720.jpg', 'release_date': '2019-10-24', 'intro': '現年88歲的日本國民導演-山田洋次,從事電影工作超過一甲子的時間,已累績八十多部的導演作品,至今仍持續推出高品質的新作,半世紀以來,一直深受不同世代觀眾的共鳴。2019年適逢《男人真命苦》問世五十週年,光點台北因此特別規劃山田洋次回顧影展,精選八部代表作品,從1960年代到二十一世紀,一窺這位日本庶民大師五十年來的創作軌跡!10.04周五起正式售票,10.24-11.06影展期間於光點台北電影館播映,歡迎影迷們共襄盛舉!\\n\\xa0\\n本次影展共選《男人真命苦》、《家族》、《故鄉》、《幸福的黃手帕》、《遠山的呼喚》、《兒子》《黃昏清兵衛》、《東京小屋的回憶》八部經典之作,透過不同時期的電影回顧,激起屬於每個時代的獨特記憶。其中1969年正式上映的松竹電影《男人真命苦》,由渥美清、倍賞千惠子、前田吟、森川信等人主演,描述東京平凡市井小民─寅次郎的故事,本片不只創下破億日幣票房,在接下來的二十六年間以相同班底連續推出48集,同時也締造金氏世界紀錄最長系列片,成為日本人的共同記憶!\\n\\xa0\\n山田洋次從影歷程幾乎是半部日本電影史,耕耘多年的他繼承小津安二郎、木下惠介等前輩樹立的「庶民劇」傳統,終其一生致力於書寫市井小民的喜怒哀樂,也如同小津一樣,長期與固定的劇組合作,不僅培養出工作夥伴間的良好默契,也宛如陪伴觀眾成長的左鄰右舍一般親切。\\n\\xa0\\n「身為一介小市民,我要將日常生活中觸動─人心的事,透過某種契機使之成形,在構思中建立骨骼、填上血肉,完成具體的作品」,多年來山田洋次的秉持著相同的理念,在他的電影裡可以感受到最真摯、最貼近生活的創作,欲重溫這些經典作品的影迷們,10.24-11.06期間於光點台北電影館千萬別錯過【日本國民導演—山田洋次 影展】。\\n\\xa0\\n【售票資訊】10.04(五)起開始售票\\n\\xa0\\n全票:240元/張 ,光點會員200元/張\\n\\xa0\\n套票:720元/4張乙套\\n\\xa0\\n✔專屬優惠好禮─凡於現場一次購買『#兩組套票』,即贈 大春煉皂 提供【經典米萃皂】乙份。(數量有限,送完為止)\\n\\xa0\\n●購票一律請親至光點台北售票服務台並於現場確認場次\\n\\xa0\\n●光點會員請持會員卡及有效證件購票享會員優惠\\n\\xa0\\n●愛心票僅供65歲以上老人、身心障礙人士與乙名必要陪同者(需同時入場)購買,購票時請出示相關證件\\n\\xa0\\n●套票售出概不退換', 'trailer_url': ''}\n","{'expectation': '89%', 'ch_name': '金翅雀', 'eng_name': 'The Goldfinch', 'movie_id': '10072', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/October2019/WR18DRSUJn9Z1GdkwUNI-2985x4263.jpg', 'release_date': '2019-10-25', 'intro': '《金翅雀》故事敘述小名「席歐」的13歲少年席爾鐸戴克(奧克斯弗格雷 飾)與母親參觀大都會藝術博物館,當他們在欣賞母親最愛的「金翅雀」這幅畫作時,館內突然發生大爆炸,席歐幸運地逃過一劫,卻也因此與母親從此天人永隔,這場突如其來的鉅變改變了席歐的一生,讓他展開充滿波折的人生與探索的旅程;成年後的席歐(安索艾格特 飾)經歷了無止盡的悲傷與自責、一連串的重生、贖罪與溫暖的愛之後,當他回首生命的這一切,他心中放不下的還是那幅改變他一生的畫作:一隻腳被細細的鎖鏈綁在棲木上的小鳥兒,那個看似美麗,卻永遠無法獲得自由,同時也是母親在世時摯愛的作品:「金翅雀」。', 'trailer_url': 'https://movies.yahoo.com.tw/video/%E9%87%91%E7%BF%85%E9%9B%80-%E4%B8%AD%E6%96%87%E9%A0%90%E5%91%8A-072238436.html?movie_id=10072'}\n","{'expectation': '94%', 'ch_name': '陪你很久很久', 'eng_name': 'Stand by me', 'movie_id': '10160', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/September2019/IyCzEAuwXvQbjJFZZ4TI-510x720.jpg', 'release_date': '2019-10-25', 'intro': '★今年最熱血、最青春、最動感的校園愛情電影!\\n★李淳首挑大樑,攜手兩大超人氣女星邵雨薇、蔡瑞雪,共解最複雜的愛情習題!\\n★顛覆傳統浪漫小清新,荒唐、逗趣、充滿驚喜,新世代青春爆笑喜劇!\\n\\xa0\\n憨直的九餅(李淳飾)從小就暗戀著薄荷(邵雨薇飾),兩人一路相互陪伴,卻始終維持著「好朋友」的關係。直到上了大學後,九餅意外地與甜美高中生夏天(蔡瑞雪飾)成為同房室友,又在迎新試膽大會時,不小心將薄荷推向校園天菜麥子學長(宋柏緯飾)懷中,眼看著心愛的她即將被追走,九餅究竟要如何守住薄荷,為自己的青春奮力一搏!', 'trailer_url': 'https://movies.yahoo.com.tw/video/%E9%99%AA%E4%BD%A0%E5%BE%88%E4%B9%85%E5%BE%88%E4%B9%85-%E9%9D%92%E6%98%A5%E7%86%B1%E8%A1%80%E7%89%88%E9%A0%90%E5%91%8A-132846573.html?movie_id=10160'}\n","{'expectation': '62%', 'ch_name': '伊索遊戲', 'eng_name': 'Aesop’s Game', 'movie_id': '10216', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/August2019/fJBtPhAz38ToLmsSaIVb-800x1142.jpg', 'release_date': '2019-10-25', 'intro': '「一隻烏龜從天而降砸傷了路人,不知是意外還是惡作劇,警方正在追查這隻肇事逃逸的烏龜…」故事從烏龜開始,加上兔子,還有狗狗來亂入?顛覆你所知道的伊索寓言!\\n\\xa0\\n龜田美羽(龜),內向的女大學生,和家人感情很好,唯一的朋友卻是隻烏龜。\\n兔草早織(兔),超人氣的星二代,出生「明星家族」,卻天生一副戀愛體質。\\n戌井小柚(犬),身手不凡,和爸爸一起經營「復仇」生意,天天幫人尋仇。\\n當三位少女相遇,你說這是部酸酸甜甜的青春電影?……大錯特錯!\\n\\xa0\\n綁架、背叛、復仇、揭開虛偽的假象,一場超乎想像的鬥智心理遊戲正要開始,結局會和你想像的……完全不同!!!校園純愛戀曲是假的,懸疑驚悚才是真的!當綁架事件爆發,最難預測的才不是青春,而是這部電影的劇情走向!\\n\\xa0\\n【關於電影】\\n\\xa0\\n延續神片《一屍到底》最強騙術\\n三位導演、三倍翻轉,一次滿足!\\n本片由日本觀影人次超過220萬、票房突破31億日圓、2018年最受矚目的話題作品《一屍到底》導演上田慎一郎親自操刀原創劇本,借三名導演之力搬上大銀幕,承襲《一屍到底》一路「騙」很大、在騙局中加洋蔥的反轉風格,將再次帶給觀眾就算被騙到底,也甘之如飴的觀影體驗。此次與上田慎一郎共同執導的導演之一是在《一屍到底》擔任副導的中泉裕矢,他在2011年推出第一部導演長片《圓罪》,其後作品《與母親的旅程》、《來去拍片尾》都在日本國內影展蔚為話題。另一位導演則是在《一屍到底》中擔任劇照師的淺沼直也,他曾以《若冬天燃起》在SKIP CITY國際電影節獲得2017年短片單元最佳作品,是日本當前備受期待的新銳創作者。三人在2015年便曾合作過短片集《四個與貓的故事》,當時三人各自擔任其中一部的導演,因此這次決定再度攜手合作。截然不同的三人共同擔任導演,將蹦出三種不同特色的火花,這部花費三年以上製作的電影,終於要登上大銀幕和觀眾見面了!\\n\\xa0\\n以《一屍到底》快速竄紅的導演上田慎一郎表示自己從中學時期就開始自己嘗試拍片,當時沒有YouTube,也不知道有電影節這種管道,光是把作品拍出來就已經是對自己最大的獎勵了!長大之後,也有曾經抱持著:「作品非在電影節放映不可!」、「我的作品一定要被偉大的人認可!」然而在拍攝《一屍到底》時,反而試圖讓自己不去多想成敗,只專注於自己想拍的東西,最後的成果反而得到崇高肯定,上田慎一郎表示:「投入《伊索遊戲》創作後我也期許自己秉持初心,繼續拍自己想拍的電影!」上田導演坦言《伊索遊戲》這部作品前後花了三年構想,不過事實上前兩年都還在和另兩位導演爭論究竟要拍什麼樣的故事,「三個人一起的執導,絕對會有各自無法妥協的部分,不過我們也有著要將這些矛盾、糾結化為這部電影的魅力的覺悟!」本片也大膽啟用新銳演員石川瑠華、井桁弘恵、紅甘擔綱主演,上田慎一郎在電影上映前也再次向觀眾呼籲:「這部電影有三名導演、三位女主角,所以大家請務必三人以上同行前往電影院觀看!」', 'trailer_url': 'https://movies.yahoo.com.tw/video/%E4%BC%8A%E7%B4%A2%E9%81%8A%E6%88%B2-%E5%B0%8E%E6%BC%94%E5%95%8F%E5%80%99%E7%AF%87-105145492.html?movie_id=10216'}\n","{'expectation': '69%', 'ch_name': '西幽玹歌', 'eng_name': 'Thunderbolt Fantasy', 'movie_id': '10244', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/September2019/EcNK84hDr6NAucwtBl3N-506x720.jpg', 'release_date': '2019-10-25', 'intro': '生來便懷有異能歌聲的少年浪巫謠,自小便跟在隱遁雪山的盲眼母親身邊,接受經年累月的苛刻訓練。\\n\\xa0\\n母親懷著野心,想將兒子的歌聲鍛鍊成天下無雙,然後送入宮廷。然而過於苛刻的訓練卻招致了不幸的事故,母親在浪巫謠眼前斷送了性命。\\n\\xa0\\n失去照顧者後,浪巫謠成了流浪之身。但他的異能卻總被無情的人們利用,成為他們滿足慾望的道具,漸漸消磨著少年的心靈。\\n\\xa0\\n儘管如此,浪巫謠罕見的歌聲終於吸引到了西幽皇女,因而得到了過去母親所夢想的飛黃騰達。但是等待著他的,卻是成為執政者玩物、賭上性命與其他樂師進行演奏競賽的殘忍遊戲。\\n\\xa0\\n就在某一天,浪巫謠聽聞有一名在西幽各地奪取魔劍、占為己有的大罪人──「啖劍太歲」。而啖劍太歲的下一個目標,正是皇帝藏在宮中的聖劍。\\n\\xa0\\n【關於電影】\\n\\xa0\\n在臺灣可以說是「無人不知、無人不曉」,從大人到小孩都非常喜愛的木偶劇「布袋戲」。 本次作品由對布袋戲深深著迷的Nitroplus「虛淵玄」擔任故事原著・劇本・總監修,與臺灣布袋戲中最具知名度及以其高品質製作聞名的「霹靂國際多媒體」(日文簡稱:霹靂社)合作, 以此奇蹟般的組合所完成的日本及臺灣地區共同企劃之影像作品。\\n\\xa0\\n於2018年12月24日(一)播放TV版第二季《Thunderbolt Fantasy 東離劍遊紀2》最後一集, 在那之後隨即發表「第三季製作決定!」的情報, 並且在3月22日(五)公佈第二季外傳《Thunderbolt Fantasy 西幽玹歌》正在進行製作的消息。\\xa0\\n\\xa0\\n2019年10月25日(五)於戲院上映的《Thunderbolt Fantasy 西幽玹歌》,以活躍於TV版第二季的「浪巫謠」為主角,講述他的過去、以及他在西幽發生的故事,為Nitroplus「虛淵玄」擔任故事原著・劇本・總監修的全新作品。\\n\\xa0\\n角色設計和至今為止的系列作品同樣,由Nitroplus所率領的設計團隊(新加入成員「minoa」)負責,同時亦邀請公仔製作公司「GOOD SMILE COMPANY」擔任戲偶造型顧問。 配樂的部份也同樣邀請到專注在偶像劇、動畫、電影界配樂、並且擔任許多歌手樂曲之提供的「澤野弘之」擔任製作。', 'trailer_url': 'https://movies.yahoo.com.tw/video/%E8%A5%BF%E5%B9%BD%E7%8E%B9%E6%AD%8C-%E6%AD%A3%E5%BC%8F%E9%A0%90%E5%91%8A-053315107.html?movie_id=10244'}\n","{'expectation': '80%', 'ch_name': '失憶的總理大臣', 'eng_name': '', 'movie_id': '10246', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/October2019/w1eivF3UfcYZ5YI6T45k-800x1132.jpg', 'release_date': '2019-10-25', 'intro': '無良政客一夕失憶,竟然轉性善良大叔!?\\n史上民調最低的總理大臣黑田啟介(中井貴一飾),走到哪都被市民噓爆,黑粉滿街跑!某天,總理又被沿街抗議,沒想到一顆石頭飛來,砸中額頭而昏迷,醒來後,他竟完全喪失記憶,黑粉美夢一夕成真?!為了避免國家大亂,他的三位貼身秘書竭盡所能讓國事照常運作,全面隱瞞真相。而一向唯利是圖的無良總理,一夕之間轉性變成傾聽民心、致力國政的善良大叔,異常反差的態度也讓全民開始起疑。眼看總理「我失憶了」秘密即將露餡,政壇的狐群狗黨們為私利大亂鬥,美國總統偏偏在此時將參訪日本,一場顛覆政壇的狂喜劇即將鬧上全世界!\\n\\xa0\\n【關於電影】\\n\\xa0\\n喜劇大師三谷幸喜醞釀四十年《失憶的總理大臣》引爆全民共鳴\\n導演撿到槍電影質問首相安倍晉三幽默回應心得秒破冰!\\n日本喜劇大師三谷幸喜睽違多年,在影迷千呼萬喚之下,終於帶來最高傑作《失憶的總理大臣》!三谷幸喜端出醞釀四十年的故事,並集結黃金卡司包括日本奧斯卡影帝中井貴一、《月薪嬌妻》「全民小阿姨」石田百合子、《大叔之愛》田中圭、《告白》木村佳乃、老班底日本奧斯卡影帝佐藤浩市等一線明星,再次打造三谷幸喜無人能擋的魅力!三谷幸喜表示,故事靈感來自他高中的志願發想,「如果像我這樣沒有私利私慾、不追求權力金錢慾的人做總理,不是很好的政治家嗎?」但路人突然變成總理根本不可能,喜劇鬼才的他就想到石頭砸破腦袋的橋段,創作出這部不分國界引爆觀眾共鳴的喜劇新作。此外,膽大包天的《失憶的總理大臣》劇組甚至直接邀請日本現任首相安倍晉三看片,映後導演戰戰兢兢地問首相:「感想是?」沒想到安倍幽默拿出政治人物愛用語回:「我失去記憶了!」爆笑氣氛現場一秒破冰!三谷再次以縝密的故事編排和獨特「三谷式」幽默驚艷觀眾,電影對政客的惡搞更引起日本全民共鳴,展開「最希望哪位政治人物失憶」的熱烈討論,造就另一股社會話題炫風!\\n\\xa0\\n每一部都讓你笑到哭又感動哭!\\n三谷幸喜魅力無人能敵日本巨星爭相合作朝聖\\n鬼才導演三谷幸喜能編能導,作品橫跨電影、電視及舞台劇,過去不僅以叫好叫座的喜劇《魔幻時刻》、《鬼壓床了沒》、《有頂天大飯店》累積破百億票房,NHK大河劇《真田丸》也大獲好評,他所執導的舞台劇更是在日本及台灣都場場爆滿,是日本巨星阿部寬、妻夫木聰、綾瀨遙、松隆子、深津繪里、役所廣司等人爭相合作的對象,就算只能演小配角也沒問題!這次更加入首度合作的石田百合子及藤岡靛黃金陣容,加上三谷幸喜作品天馬行空卻又天衣無縫的巧合與誤會,荒謬劇情再度讓觀眾笑到岔氣,同時透過小人物的熱血初衷,帶來意想不到的逆轉結局,讓人忍不住笑著笑著就感動哭了!今年新作《失憶的總理大臣》以觀眾再熟悉不過的政治為題,三谷幸喜表示:「這不是政治諷刺劇,而是政治狂想曲!」大導透過架空的舞台為觀眾創造更多聯想空間,就算來自不同國家的觀眾,也能輕鬆帶入自己所熟悉的時事!', 'trailer_url': 'https://movies.yahoo.com.tw/video/%E5%A4%B1%E6%86%B6%E7%9A%84%E7%B8%BD%E7%90%86%E5%A4%A7%E8%87%A3-%E6%AD%A3%E5%BC%8F%E9%A0%90%E5%91%8A-122854501.html?movie_id=10246'}\n","{'expectation': '46%', 'ch_name': '阿嬤養的豬', 'eng_name': \"Esmeralda's Twilight\", 'movie_id': '10272', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/September2019/crRK7LKzsnj0hdMz3Yv2-488x720.JPG', 'release_date': '2019-10-25', 'intro': '★2019墨西哥奧斯卡「阿里爾獎」奪最佳處女電影獎導演提名\\n★2019墨西哥奧斯卡「阿里爾獎」最佳女主角提名\\n★墨西哥電影界最高榮譽「墨西哥電影電視藝術學院獎」提名\\n★比《我不笨,我有話要說》更感人的溫馨催淚片\\xa0\\n最佛心的阿嬤,豬年行大運\\n★ 豬儂我儂,有阿嬤疼的小豬最窩心、像個寶\\n\\xa0\\n年度必看的感人溫馨催淚電影 內含洋蔥讓人揪心落淚\\n隨著丈夫的去世和她兒子不在身旁,老婦人已經失去了對生活的樂趣,直到她重新獲得希望…當一隻小豬進入她的生活。\\n\\xa0\\n在丈夫過世後,老婦人獨居在小鎮,與移居美國的兒子通上電話,成為每天生活唯一的動力,對習慣照顧人的她來說,生命像失去目標。\\n\\xa0\\n有天,一頭小豬意外闖進她的生活,她開始把小豬當成孩子照顧,三餐讓牠吃飽飽,梳毛散步都不少,帶給她與鄰居很多歡樂,不久這隻豬懷孕了,她傾盡所有的積蓄、精力與時間,準備迎接豬孫的到來,卻發現豬女兒身體不如以往了…\\n\\xa0\\n【關於電影】\\xa0\\n\\xa0\\n喪偶奶奶透過小豬重獲新生 純樸鄉村裡人與豬之間最溫馨的羈絆關係\\n《阿嬤養的豬》透過獨居婦人對動物的依戀,點出老人現實處境,熱帶風情的景色、美味的料理、人情味的可愛人物,打造出充滿愛與信念的樸素小鎮,細膩堆疊出人與小豬的真摯情感,已近薄暮的主角,面對人生的無常,選擇放手與釋懷,心境轉換帶出希望的光輝,更顯溫暖令人動容。', 'trailer_url': 'https://movies.yahoo.com.tw/video/%E9%98%BF%E5%AC%A4%E9%A4%8A%E7%9A%84%E8%B1%AC-%E4%B8%AD%E6%96%87%E9%A0%90%E5%91%8A-134459693.html?movie_id=10272'}\n","{'expectation': '53%', 'ch_name': '我的朋友霸王龍', 'eng_name': 'My TYRANO: Together, Forever', 'movie_id': '10274', 'poster_url': 'https://movies.yahoo.com.tw/x/r/w420/i/o/production/movies/September2019/FjcsFOd9I7bDwpFHNcpP-800x1142.jpg', 'release_date': '2019-10-25', 'intro': '★改編自全球暢銷繪本作家宮西達也的恐龍系列作品,取自〈永遠永遠在一起〉〈我愛你〉〈我相信你〉三部繪本,是《你看起來好像很好吃》同系列電影。\\n★由《名偵探柯南》動畫系列導演靜野孔文出任導演、監製,是一部中、日、韓三地聯合製作的動畫電影。\\xa0\\n★奧斯卡金獎配樂大師坂本龍一第一部於動畫電影界亮相的作品。\\n★由手塚治虫創辦的手塚製作公司擔任動畫製作。\\n\\xa0\\n在冰河世紀來臨前的恐龍時期,一隻正被魔鬼龍追逐的粉紅色小翼龍普娜,就在她正要被吞下肚之前,一隻巨大威猛的食肉霸王龍突然出現在牠們面前;\\n當他一出現,早已不見被嚇到的魔鬼龍蹤跡,此時只獨留下普娜在這隻霸王龍面前,他張開嘴靠近普娜,一口口咬了下去,但他吞下的竟然是紅果子而不是這隻小翼龍普娜。\\n”大個子,你不是吃肉的動物嗎?怎麼會吃紅色水果?”普娜在他面前喋喋不休地問著;說著說著,她下定決心跟隨這隻霸王龍,路途中遇見了與媽媽失散的三角龍男孩,他們決定陪著牠回家並結伴一起開始尋找綠洲。\\n\\xa0\\n一隻不會飛的翼龍、一隻不吃肉的霸王龍與孤獨的男孩三角龍,在前往尋找綠洲的旅程中,三人成為像是好朋友又像是家人的好夥伴。\\n\\xa0\\n一路上,他們是否可以避開魔鬼龍的追逐,安全地到達那大家口中所說的的充滿和平與富裕的綠洲呢?', 'trailer_url': 'https://movies.yahoo.com.tw/video/%E6%88%91%E7%9A%84%E6%9C%8B%E5%8F%8B%E9%9C%B8%E7%8E%8B%E9%BE%8D-%E4%B8%AD%E6%96%87%E9%A0%90%E5%91%8A-122910372.html?movie_id=10274'}\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"rgZmPsh9zZlN","colab_type":"text"},"source":["### 練習3\n"]},{"cell_type":"markdown","metadata":{"id":"w3GT1nsfzhEy","colab_type":"text"},"source":["- 擷取並parse「批批踢JOKE版的一篇文章」\n","- 請依下列步驟練習:\n"," - 以GET方法將網頁https://www.ptt.cc/bbs/joke/M.1571755669.A.663.html 原始碼讀入\n"," - 依照上述步驟parse出推文內容及推文者\n"," - 透過for迴圈,整齊印出"]}]} -------------------------------------------------------------------------------- /04.Python資料分析應用-語意分析篇NLP.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": {}, 5 | "cell_type": "markdown", 6 | "source": "# 04.Python資料分析入門-語意分析篇NLP" 7 | }, 8 | { 9 | "metadata": {}, 10 | "cell_type": "markdown", 11 | "source": "## 04-1 基本語意分析-以模組舉例\n- 以[手把手教你如何用 Python 做情感分析](https://www.itread01.com/articles/1498721884.html)為例" 12 | }, 13 | { 14 | "metadata": {}, 15 | "cell_type": "markdown", 16 | "source": "### 04-1-1 英文為例" 17 | }, 18 | { 19 | "metadata": { 20 | "trusted": true 21 | }, 22 | "cell_type": "code", 23 | "source": "#安裝相關套件\n!pip install snownlp\n!pip install -U textblob\n!python -m textblob.download_corpora", 24 | "execution_count": 1, 25 | "outputs": [ 26 | { 27 | "output_type": "stream", 28 | "text": "Requirement already satisfied: snownlp in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.12.3)\nRequirement already up-to-date: textblob in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.15.3)\nRequirement already satisfied, skipping upgrade: nltk>=3.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from textblob) (3.3)\nRequirement already satisfied, skipping upgrade: six in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from nltk>=3.1->textblob) (1.11.0)\n[nltk_data] Downloading package brown to /home/nbuser/nltk_data...\n[nltk_data] Package brown is already up-to-date!\n[nltk_data] Downloading package punkt to /home/nbuser/nltk_data...\n[nltk_data] Package punkt is already up-to-date!\n[nltk_data] Downloading package wordnet to /home/nbuser/nltk_data...\n[nltk_data] Package wordnet is already up-to-date!\n[nltk_data] Downloading package averaged_perceptron_tagger to\n[nltk_data] /home/nbuser/nltk_data...\n[nltk_data] Package averaged_perceptron_tagger is already up-to-\n[nltk_data] date!\n[nltk_data] Downloading package conll2000 to /home/nbuser/nltk_data...\n[nltk_data] Package conll2000 is already up-to-date!\n[nltk_data] Downloading package movie_reviews to\n[nltk_data] /home/nbuser/nltk_data...\n[nltk_data] Package movie_reviews is already up-to-date!\nFinished.\n", 29 | "name": "stdout" 30 | } 31 | ] 32 | }, 33 | { 34 | "metadata": { 35 | "trusted": true 36 | }, 37 | "cell_type": "code", 38 | "source": "text = \"I am happy today. I feel sad today.\"", 39 | "execution_count": 2, 40 | "outputs": [] 41 | }, 42 | { 43 | "metadata": { 44 | "trusted": true 45 | }, 46 | "cell_type": "code", 47 | "source": "from textblob import TextBlob\n\nblob = TextBlob(text)\nblob.sentences", 48 | "execution_count": 3, 49 | "outputs": [ 50 | { 51 | "output_type": "execute_result", 52 | "execution_count": 3, 53 | "data": { 54 | "text/plain": "[Sentence(\"I am happy today.\"), Sentence(\"I feel sad today.\")]" 55 | }, 56 | "metadata": {} 57 | } 58 | ] 59 | }, 60 | { 61 | "metadata": { 62 | "trusted": true 63 | }, 64 | "cell_type": "code", 65 | "source": "#情感極性的變化範圍是[-1, 1],-1代表完全負面,1代表完全正面。\nprint(blob.sentences[0].sentiment)\nprint(blob.sentences[1].sentiment)\nprint(blob.sentiment)", 66 | "execution_count": 4, 67 | "outputs": [ 68 | { 69 | "output_type": "stream", 70 | "text": "Sentiment(polarity=0.8, subjectivity=1.0)\nSentiment(polarity=-0.5, subjectivity=1.0)\nSentiment(polarity=0.15000000000000002, subjectivity=1.0)\n", 71 | "name": "stdout" 72 | } 73 | ] 74 | }, 75 | { 76 | "metadata": {}, 77 | "cell_type": "markdown", 78 | "source": "### 04-1-2 中文為例\n- 使用SnowNLP: http://t.cn/8kf1c3p" 79 | }, 80 | { 81 | "metadata": { 82 | "trusted": true 83 | }, 84 | "cell_type": "code", 85 | "source": "text = u\"我今天很快樂。我今天很憤怒。\" #u指文本的編碼是Unicode", 86 | "execution_count": 5, 87 | "outputs": [] 88 | }, 89 | { 90 | "metadata": { 91 | "trusted": true 92 | }, 93 | "cell_type": "code", 94 | "source": "from snownlp import SnowNLP\n\ns = SnowNLP(text)\ns.sentences", 95 | "execution_count": 6, 96 | "outputs": [ 97 | { 98 | "output_type": "execute_result", 99 | "execution_count": 6, 100 | "data": { 101 | "text/plain": "['我今天很快樂', '我今天很憤怒']" 102 | }, 103 | "metadata": {} 104 | } 105 | ] 106 | }, 107 | { 108 | "metadata": { 109 | "trusted": true 110 | }, 111 | "cell_type": "code", 112 | "source": "#SnowNLP的情感分析取值,表達的是“這句話代表正面情感的概率”。\nprint(SnowNLP(s.sentences[0]).sentiments)\nprint(SnowNLP(s.sentences[1]).sentiments)\nprint(s.sentiments)", 113 | "execution_count": 7, 114 | "outputs": [ 115 | { 116 | "output_type": "stream", 117 | "text": "0.9268071116367116\n0.1702660762575916\n0.7005413250638438\n", 118 | "name": "stdout" 119 | } 120 | ] 121 | }, 122 | { 123 | "metadata": { 124 | "trusted": true 125 | }, 126 | "cell_type": "code", 127 | "source": "text = u'''\n台中市雷姓男子去年10月間在西區公益路、大墩路的工地喝酒後,騎車上路,遭警方盤查後,酒測值超標為每公升0.70毫克,他遭警方依公共危險罪嫌送辦,檢方聲請簡易判決處刑,法官判他4月徒刑、得易科罰金,但他今年2月要報到執行時,檢方命他不得易科,需入監服刑,他向台中地院聲明異議,法官審理後認為,雷在2016至2018年間連續3度酒駕,前二次一次是緩起訴、另次則是易科罰金,但此次又被查獲,已是第三次,認為他漠視法律,不矯正難收矯正之效,駁回雷聲明異議,判雷要關。\n判決書指出,雷姓男子(41歲)去年10月間酒後騎車被查獲,酒測值為每公升0.70毫克,台中地院依公共危險罪,簡易判決處刑4月,得易科罰金12萬元,不過雷在今年2月到台中地檢署報到執行時,檢方諭令他不得易科,需入監服刑,他不服向台中地院聲明異議。\n\n雷辯稱,判決書上明寫可易科罰金,為何檢方堅持要讓他去服刑,而且他已離婚是單親家庭,有未成年子女要撫養,家境貧寒是中低收入戶,更是家中主要的經濟支柱,希望法官撤銷檢察官不得易科的執行指揮處分,准予讓他易科罰金。\n\n台中地院法官審理後認為,雖然法律有規定,本刑5年以下,宣告6月以下徒刑者, 得易科罰金,不過但書是「難收矯正之效或難以維持法秩序者,不在此限」,易科罰金的易刑處分,應否准許,依照《刑事訴訟法》第457條規定,由檢察官就是否准予受刑人易科罰金,有無但書情況,由檢方查明認定並指揮執行。\n\n法官指出,雷在2016年酒駕被查獲後,獲緩起訴處分,2017年又被查獲酒駕,當時被判3月徒刑,得易科9萬元,但他2018年又喝酒上路,顯示雷沒有因前兩次遭查獲的前例,而有所警惕,且去年被查獲時,是在喝完啤酒不久就騎車上路,顯然極度漠視法令,其行為對社會法秩序之危害重大,因此認定檢方的處分無不妥,駁回雷聲明異議。\n'''", 128 | "execution_count": 8, 129 | "outputs": [] 130 | }, 131 | { 132 | "metadata": { 133 | "trusted": true 134 | }, 135 | "cell_type": "code", 136 | "source": "s = SnowNLP(text)\n#s.sentences[0]\ns.sentiments", 137 | "execution_count": 9, 138 | "outputs": [ 139 | { 140 | "output_type": "execute_result", 141 | "execution_count": 9, 142 | "data": { 143 | "text/plain": "0.0" 144 | }, 145 | "metadata": {} 146 | } 147 | ] 148 | }, 149 | { 150 | "metadata": {}, 151 | "cell_type": "markdown", 152 | "source": "## 04-2 基本語意分析-以PTT電影版透過機械學習方式舉例\n- 以下來自[Python 網路爬蟲與資料分析入門實戰 ](https://www.tenlong.com.tw/products/9789864343386)\n- 2019年10月出版的新書,裡面很多台灣在地的爬蟲應用教學,[github](https://github.com/willismax/py-scraping-analysis-book)也有該書的程式碼,可以先試試看。\n- 此例為以PTT電影版關鍵字輸入影片名稱做舉例,以「好雷、負雷」做分類,以機械學習方式,將各文章內文詞斷詞,並預測分類結果" 153 | }, 154 | { 155 | "metadata": { 156 | "trusted": true 157 | }, 158 | "cell_type": "code", 159 | "source": "import requests\nimport re\nimport csv\nfrom bs4 import BeautifulSoup\n\n\nPTT_URL = 'https://www.ptt.cc'\n\n\ndef get_articles(url):\n resp = requests.get(\n url=url,\n cookies={'over18': '1'} # 告知 Server 已回答過滿 18 歲的問題\n )\n soup = BeautifulSoup(resp.text, 'html5lib')\n prev_link = soup.find('div', 'btn-group-paging').find_all('a')[1]\n # 若 有 href 屬性, 代表有上一頁的超連結\n prev_link = prev_link['href'] if 'href' in prev_link.attrs else None\n\n # 巡覽每一篇文章所在區塊\n positive = []\n negative = []\n for div in soup.find_all('div', 'r-ent'):\n href = div.find('div', 'title').a['href']\n title = div.find('div', 'title').text.strip()\n # 若標題為 [] 開頭, e.g., [好雷] 星際大戰八-各種元素集於一身\n if re.match('\\[.*\\]', title):\n tag = re.match('\\[.*\\]', title).group(0)\n # 標籤內含'好'為好評; 含'負'或'爛'為負評\n if '好' in tag:\n positive.append([title, href])\n if '爛' in tag or '負' in tag:\n negative.append([title, href])\n return prev_link, positive, negative\n\n\nif __name__ == '__main__':\n start_url = PTT_URL + '/bbs/movie/search?q=黑豹'\n positive_posts, negative_posts = [], []\n prev_link, pos, neg = get_articles(start_url)\n positive_posts += pos\n negative_posts += neg\n while prev_link:\n url = PTT_URL + prev_link\n prev_link, pos, neg = get_articles(url)\n positive_posts += pos\n negative_posts += neg\n print(len(positive_posts), positive_posts[:3])\n print(len(negative_posts), negative_posts[:3])\n\n with open('./data/mov_pos.csv', 'w', encoding='utf-8', newline='') as f:\n writer = csv.writer(f)\n writer.writerow(['title', 'href'])\n writer.writerows(positive_posts)\n\n with open('./data/mov_neg.csv', 'w', encoding='utf-8', newline='') as f:\n writer = csv.writer(f)\n writer.writerow(['title', 'href'])\n writer.writerows(negative_posts)", 160 | "execution_count": 10, 161 | "outputs": [ 162 | { 163 | "output_type": "stream", 164 | "text": "60 [['[好雷] 黑豹 --其實蠻好看的', '/bbs/movie/M.1552276420.A.F6F.html'], ['[好雷?] 黑豹 自慰片的新高度', '/bbs/movie/M.1545317279.A.EFC.html'], ['[二刷好雷] 水行俠真的不是黑豹', '/bbs/movie/M.1545065816.A.46A.html']]\n27 [['[負雷]黑豹-失衡的烏托邦', '/bbs/movie/M.1529245622.A.AF5.html'], ['[負雷] 四不像的黑豹', '/bbs/movie/M.1527918611.A.56C.html'], ['[微負雷] 黑豹有點讓人失望....', '/bbs/movie/M.1527839684.A.EF0.html']]\n", 165 | "name": "stdout" 166 | } 167 | ] 168 | }, 169 | { 170 | "metadata": { 171 | "trusted": true 172 | }, 173 | "cell_type": "code", 174 | "source": "import csv\nimport requests\nimport re\nimport json\nimport time\nfrom bs4 import BeautifulSoup\n\n\nPTT_URL = 'https://www.ptt.cc'\n\n\ndef sanitize(txt):\n # 保留英數字, 中文 (\\u4e00-\\u9fa5) 及中文標點, 部分特殊符號\n # http://ubuntu-rubyonrails.blogspot.com/2009/06/unicode.html\n expr = re.compile('[^\\u4e00-\\u9fa5。;,:“”()、?「」『』【】\\s\\w:/\\-.()]') # ^ 表示\"非括號內指定的字元\"\n txt = re.sub(expr, '', txt)\n txt = re.sub('[。;,:“”()、?「」『』【】:/\\-_.()]', ' ', txt) # 用空白取代中英文標點\n txt = re.sub('(\\s)+', ' ', txt) # 用單一空白取代多個換行或 tab 符號\n txt = txt.replace('--', '')\n txt = txt.lower() # 英文字轉為小寫\n return txt\n\n\ndef get_post(url):\n resp = requests.get(\n url=url,\n cookies={'over18': '1'} # 告知 Server 已回答過滿 18 歲的問題\n )\n soup = BeautifulSoup(resp.text, 'html5lib')\n main_content = soup.find('div', id='main-content')\n\n # 把非本文的部份 (標題區及推文區) 移除\n # 移除標題區塊\n for meta in main_content.find_all('div', 'article-metaline'):\n meta.extract()\n for meta in main_content.find_all('div', 'article-metaline-right'):\n meta.extract()\n # 移除推文區塊\n for push in main_content.find_all('div', 'push'):\n push.extract()\n\n parsed = []\n for txt in main_content.stripped_strings:\n # 移除 '※ 發信站:', '--' 開頭, 及本文區最後一行文章網址部份\n if txt[0] == '※' or txt[:2] == '--' or url in txt:\n continue\n txt = sanitize(txt)\n if txt:\n parsed.append(txt)\n return ' '.join(parsed)\n\n\ndef get_article_body(csv_file):\n id_to_body = {}\n with open(csv_file, 'r', encoding='utf-8') as f:\n reader = csv.DictReader(f)\n for row in reader:\n print('處理', row['title'], row['href'])\n title = ' '.join(row['title'].split(']')[1:])\n title = sanitize(title)\n body = get_post(PTT_URL + row['href'])\n id_to_body[row['href']] = title + ' ' + body # 以文章超連結為 key, 標題 + 本文為 value\n time.sleep(1) # 放慢爬蟲速度\n return id_to_body\n\n\nif __name__ == '__main__':\n d1 = get_article_body('./data/mov_pos.csv')\n d2 = get_article_body('./data/mov_neg.csv')\n id_to_body = {**d1, **d2} # 將兩個 dict 合併為一個\n with open('./data/id_to_body.json', 'w', encoding='utf-8') as f:\n json.dump(id_to_body, f, indent=2, ensure_ascii=False)", 175 | "execution_count": 11, 176 | "outputs": [ 177 | { 178 | "output_type": "stream", 179 | "text": "處理 [好雷] 黑豹 --其實蠻好看的 /bbs/movie/M.1552276420.A.F6F.html\n處理 [好雷?] 黑豹 自慰片的新高度 /bbs/movie/M.1545317279.A.EFC.html\n處理 [二刷好雷] 水行俠真的不是黑豹 /bbs/movie/M.1545065816.A.46A.html\n處理 [好雷] 盲點:關於《黑豹》的奧克蘭也關於你我的故事 /bbs/movie/M.1538060300.A.8FD.html\n處理 [好雷] 瘋狂亞洲富豪─絕不是新加坡黑豹 /bbs/movie/M.1536676591.A.C8F.html\n處理 [微好雷]《雷神索爾3諸神黃昏》&《黑豹》 /bbs/movie/M.1536217925.A.B6E.html\n處理 [好雷] 比黑豹好看的蟻人與黃蜂女 /bbs/movie/M.1531217457.A.6C8.html\n處理 [好雷]黑豹 — 符合台灣政治的隱喻分析 /bbs/movie/M.1525369009.A.CFA.html\n處理 [好雷] 黑豹:一部政治寓言 /bbs/movie/M.1521128014.A.19B.html\n處理 [好雷]黑豹,好看但可惜的反派 /bbs/movie/M.1520467155.A.8E5.html\n處理 [好雷] 黑豹心得 /bbs/movie/M.1519920385.A.14A.html\n處理 [ 好雷] 唯一缺憾的黑豹 /bbs/movie/M.1519908058.A.B99.html\n處理 [好雷] 黑豹 感想 微負評 /bbs/movie/M.1519841715.A.3BF.html\n處理 [核心好雷] 黑豹-科技與部落的反差萌 人權與人道議題 /bbs/movie/M.1519812809.A.DC3.html\n處理 [普好雷] 立體的世界,被一掌拍平,淺談【黑豹】 /bbs/movie/M.1519787493.A.BFD.html\n處理 [有意見好雷] 比上沒得比,比下有餘的黑豹 /bbs/movie/M.1519734682.A.979.html\n處理 [好雷] 黑豹 ~ 屬於黑人的童話故事 /bbs/movie/M.1519618436.A.EA0.html\n處理 [ 好 雷] 黑豹 /bbs/movie/M.1519572758.A.D3C.html\n處理 [好雷]黑豹 王者的抉擇 /bbs/movie/M.1519544126.A.E20.html\n處理 [好雷] 黑豹就真的很好看咩~ /bbs/movie/M.1519356550.A.A17.html\n處理 [ 普好雷] 黑豹 /bbs/movie/M.1519142286.A.7CD.html\n處理 [好雷] 黑豹 /bbs/movie/M.1519070229.A.DE9.html\n處理 [好雷] 黑豹:王者路大不易 /bbs/movie/M.1519042966.A.803.html\n處理 [微好雷]《黑豹》 春秋五霸的基本套路 /bbs/movie/M.1518991149.A.BDD.html\n處理 [好雷]黑豹 不太一樣的超級英雄 /bbs/movie/M.1518963410.A.A2E.html\n處理 [好雷] 黑豹 其實我比較喜歡反派@@ /bbs/movie/M.1518939706.A.D48.html\n處理 [好雷] 黑豹 Black Panther,非洲未來主義 /bbs/movie/M.1518893529.A.3AD.html\n處理 [好雷] 黑豹: 智者造橋,愚者築牆 /bbs/movie/M.1518860486.A.B45.html\n處理 [微好雷] 黑豹的電影配樂 /bbs/movie/M.1518848376.A.BBB.html\n處理 [好無雷] 黑豹 /bbs/movie/M.1518802967.A.D32.html\n處理 [好雷] 黑豹 沒有大家說的那麼糟啦 /bbs/movie/M.1518793606.A.036.html\n處理 [好雷] 黑豹 /bbs/movie/M.1518778797.A.C52.html\n處理 [普好雷] 黑豹:Do you know da way? /bbs/movie/M.1518769142.A.229.html\n處理 [普好雷] 黑豹 來談談角色動機吧 /bbs/movie/M.1518703695.A.0DC.html\n處理 [好雷]YOU SHOW OFF?——《黑豹》無&有雷推薦 /bbs/movie/M.1518641125.A.57A.html\n處理 [普好雷] 覺得黑豹的編劇陷入一種兩難的情況 /bbs/movie/M.1518632731.A.55E.html\n處理 [好笑雷] 黑豹 /bbs/movie/M.1518627430.A.ADF.html\n處理 [好雷] 黑豹 耳目一新的英雄電影 /bbs/movie/M.1518617014.A.223.html\n處理 [好雷] 黑豹 最沒有在看爽的漫威片 /bbs/movie/M.1518584186.A.CA8.html\n處理 [偏好雷] 看完黑豹我再重看一次預告片發現... /bbs/movie/M.1518541335.A.C93.html\n處理 [好雷] 黑豹 國王成長日記 /bbs/movie/M.1518538495.A.43A.html\n處理 [好雷] 黑豹 世界觀科技感超讚 但故事有些許bug /bbs/movie/M.1518537228.A.F8C.html\n處理 [超爆幹好雷] 黑豹-名副其實漫威最佳 /bbs/movie/M.1518536561.A.C1C.html\n處理 [普好雷] 黑豹 /bbs/movie/M.1518535690.A.7CD.html\n處理 [好雷]【黑豹】非洲先進文明卓然而立 /bbs/movie/M.1518521887.A.244.html\n處理 [普好雷] 充滿新風格的黑豹 /bbs/movie/M.1518517571.A.DD1.html\n處理 [好雷] 不太懂有關黑豹反派的幾件事情 /bbs/movie/M.1518512928.A.747.html\n處理 [好雷] 小細節怪怪的黑豹 /bbs/movie/M.1518511893.A.636.html\n處理 [好雷] 黑豹 /bbs/movie/M.1518507928.A.ABD.html\n處理 [好無雷] 黑豹 /bbs/movie/M.1518505340.A.C9A.html\n處理 [好雷] 黑豹觀後討論 /bbs/movie/M.1518501262.A.9AC.html\n處理 [普好雷] 黑豹 優秀的漫威宇宙擴展片 /bbs/movie/M.1518500618.A.932.html\n處理 [極好無雷] 這就是我要的黑豹! /bbs/movie/M.1518499054.A.561.html\n處理 [ 好無雷] 黑豹 絕對值得一看 /bbs/movie/M.1518496440.A.BE1.html\n處理 [好無雷] 大推推黑豹電影 /bbs/movie/M.1518495524.A.882.html\n處理 [好雷] 黑豹,劇情普通,美術太完美 /bbs/movie/M.1518495424.A.F0B.html\n處理 [好雷] 黑豹 /bbs/movie/M.1518494060.A.826.html\n處理 [ 好雷] 美國隊長3 黑豹疑問 /bbs/movie/M.1462012717.A.D23.html\n處理 [好雷]美隊3-關於黑豹以及各英雄的立場 /bbs/movie/M.1461921066.A.E77.html\n處理 [好雷] 美國隊長3 黑豹家世 /bbs/movie/M.1461748087.A.515.html\n處理 [負雷]黑豹-失衡的烏托邦 /bbs/movie/M.1529245622.A.AF5.html\n處理 [負雷] 四不像的黑豹 /bbs/movie/M.1527918611.A.56C.html\n處理 [微負雷] 黑豹有點讓人失望.... /bbs/movie/M.1527839684.A.EF0.html\n處理 [負雷] 是不是有精神分裂的黑豹 /bbs/movie/M.1526734538.A.5E2.html\n處理 [負雷] 黑豹 /bbs/movie/M.1520322497.A.E24.html\n處理 [負雷]黑豹 最差漫威電影 /bbs/movie/M.1519931938.A.207.html\n處理 [ 微負雷] 黑豹 /bbs/movie/M.1519721961.A.471.html\n處理 [負雷] 黑豹 Black Panther,暴力與分享 /bbs/movie/M.1518961786.A.79E.html\n處理 [ 負雷] 隨便拍拍隨便買單的「黑豹」 /bbs/movie/M.1518934190.A.F7B.html\n處理 [負雷] 只有政治正確的 黑豹 /bbs/movie/M.1518809750.A.4E0.html\n處理 [負雷]黑豹:除了美術視覺,其餘毫無魅力 /bbs/movie/M.1518808279.A.FC9.html\n處理 [睡負無雷] 看黑豹看到睡著 /bbs/movie/M.1518774686.A.889.html\n處理 [小負雷] 黑豹-請修邊幅可以嗎?(內有雷) /bbs/movie/M.1518761310.A.05C.html\n處理 [爛無雷] 千萬不要期待的黑豹 /bbs/movie/M.1518758578.A.8F0.html\n處理 [普負雷] 輸不起的黑豹 /bbs/movie/M.1518714311.A.957.html\n處理 [ 負雷] 黑豹-鋪陳復3的跳板 /bbs/movie/M.1518684255.A.489.html\n處理 [ 負雷] 黑豹 /bbs/movie/M.1518670825.A.A31.html\n處理 [普負雷] 黑豹出戲點 /bbs/movie/M.1518642702.A.497.html\n處理 [負雷] 黑豹特遣隊 /bbs/movie/M.1518609915.A.835.html\n處理 [負雷] 家天下黑豹 /bbs/movie/M.1518596627.A.3E0.html\n處理 [負雷] 失望的黑豹(補個優點) /bbs/movie/M.1518577527.A.4BB.html\n處理 [負雷] 真的過譽的黑豹......... /bbs/movie/M.1518536086.A.B1E.html\n處理 [負雷] 黑豹天下 /bbs/movie/M.1518534403.A.ECB.html\n處理 [ 負雷] 讓人失望的黑豹 /bbs/movie/M.1518529744.A.586.html\n處理 [普負雷] 一片歐罵罵的黑豹 /bbs/movie/M.1518513719.A.462.html\n處理 [微負雷] 黑豹 相較其他marvel片有點可惜 /bbs/movie/M.1518510319.A.1DC.html\n處理 [負雷]黑豹 /bbs/movie/M.1518495315.A.E27.html\n", 180 | "name": "stdout" 181 | } 182 | ] 183 | }, 184 | { 185 | "metadata": { 186 | "trusted": true 187 | }, 188 | "cell_type": "code", 189 | "source": "!pip3 install jieba", 190 | "execution_count": 12, 191 | "outputs": [ 192 | { 193 | "output_type": "stream", 194 | "text": "Requirement already satisfied: jieba in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.39)\r\n", 195 | "name": "stdout" 196 | } 197 | ] 198 | }, 199 | { 200 | "metadata": { 201 | "trusted": true 202 | }, 203 | "cell_type": "code", 204 | "source": "import json\nimport csv\nimport jieba\nimport random\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.feature_extraction.text import TfidfTransformer\nfrom sklearn.linear_model import SGDClassifier\nfrom sklearn.metrics import accuracy_score\n\njieba.set_dictionary('./data/dict.txt.big') # 對繁體中文斷詞較準確的字典檔\n\n\ndef load_data(a_csv, b_json, label):\n a_ids = []\n with open(a_csv, 'r', encoding='utf-8') as f:\n reader = csv.DictReader(f)\n for row in reader:\n a_ids.append(row['href'])\n with open(b_json, 'r', encoding='utf-8') as f:\n id_to_body = json.load(f)\n data = []\n for a_id in a_ids:\n tokenized_post = []\n txt = id_to_body[a_id]\n for sent in txt.split(): # 將文章以空白隔開\n # 斷詞後的結果, 若非空白且長度為 2 以上, 則列入詞庫\n filtered = [t for t in jieba.cut(sent) if t.split() and len(t) > 1]\n tokenized_post += filtered\n data.append((tokenized_post, label))\n return data\n\n\nif __name__ == '__main__':\n pos_data = load_data('./data/mov_pos.csv', './data/id_to_body.json', '正評')\n neg_data = load_data('./data/mov_neg.csv', './data/id_to_body.json', '負評')\n\n '''\n # 印出正評與負評文章的前幾個字, 確認資料無誤\n for post, label in pos_data[:3]:\n print(post[:5], label)\n for post, label in neg_data[:3]:\n print(post[:5], label)\n '''\n\n # 打亂資料順序\n random.seed(42)\n random.shuffle(pos_data)\n random.shuffle(neg_data)\n\n x_train, y_train, x_test, y_test = [], [], [], []\n # 前 22 筆資料 (及答案) 放入 training set\n # 建立資料時要用空白將斷好的詞重新連成一個字串, 以便之後使用 scikit-learn 建立字典並轉換文字資料為向量\n for i in range(10):\n x_train.append(' '.join(pos_data[i][0]))\n x_train.append(' '.join(neg_data[i][0]))\n y_train.append(pos_data[i][1])\n y_train.append(neg_data[i][1])\n # 最後 5 筆資料 (及答案) 放入 testing set\n# for i in range(5, len(pos_data)):\n for i in range(10, 27):\n x_test.append(' '.join(pos_data[i][0]))\n x_test.append(' '.join(neg_data[i][0]))\n y_test.append(pos_data[i][1])\n y_test.append(neg_data[i][1])\n\n vectorizer = CountVectorizer()\n x_train = vectorizer.fit_transform(x_train)\n transformer = TfidfTransformer()\n x_train = transformer.fit_transform(x_train)\n clf = SGDClassifier(random_state=42)\n clf.fit(x_train, y_train)\n x_test = vectorizer.transform(x_test)\n x_test = transformer.transform(x_test)\n y_pred = clf.predict(x_test)\n print('預測結果:', list(y_pred))\n print('正確答案:', y_test)\n print('正確率:', accuracy_score(y_test, y_pred))\n\n # 測試自己輸入的句子\n sentences = [\n '我 覺得 這部 電影 還 不錯',\n '這部 片 應該 可以 更好 才對'\n ]\n analyze = vectorizer.build_analyzer()\n print(analyze(sentences[0]))\n print(analyze(sentences[1]))\n\n custom_data = transformer.transform(vectorizer.transform(sentences))\n print(clf.predict(custom_data))", 205 | "execution_count": 14, 206 | "outputs": [ 207 | { 208 | "output_type": "stream", 209 | "text": "Building prefix dict from /home/nbuser/library/lesson/data/dict.txt.big ...\nLoading model from cache /tmp/jieba.u863534f77a5b7aa5dc55e7aac03546ba.cache\nLoading model cost 7.137 seconds.\nPrefix dict has been built succesfully.\n", 210 | "name": "stderr" 211 | }, 212 | { 213 | "output_type": "stream", 214 | "text": "預測結果: ['正評', '負評', '正評', '正評', '負評', '負評', '負評', '正評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '正評', '正評', '負評', '負評', '負評', '負評', '正評', '正評', '負評', '正評', '正評', '正評', '負評', '負評', '負評', '負評', '負評', '正評', '正評']\n正確答案: ['正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評']\n正確率: 0.6470588235294118\n['覺得', '這部', '電影', '不錯']\n['這部', '應該', '可以', '更好', '才對']\n['負評' '正評']\n", 215 | "name": "stdout" 216 | }, 217 | { 218 | "output_type": "stream", 219 | "text": "/home/nbuser/anaconda3_501/lib/python3.6/site-packages/sklearn/linear_model/stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.\n \"and default tol will be 1e-3.\" % type(self), FutureWarning)\n", 220 | "name": "stderr" 221 | } 222 | ] 223 | }, 224 | { 225 | "metadata": { 226 | "trusted": true 227 | }, 228 | "cell_type": "code", 229 | "source": "len(neg_data)", 230 | "execution_count": 15, 231 | "outputs": [ 232 | { 233 | "output_type": "execute_result", 234 | "execution_count": 15, 235 | "data": { 236 | "text/plain": "27" 237 | }, 238 | "metadata": {} 239 | } 240 | ] 241 | }, 242 | { 243 | "metadata": { 244 | "trusted": true 245 | }, 246 | "cell_type": "markdown", 247 | "source": "### refecence\n- [手把手教你如何用 Python 做情感分析](https://www.itread01.com/articles/1498721884.html)\n- [Python 網路爬蟲與資料分析入門實戰 ](https://www.tenlong.com.tw/products/9789864343386)" 248 | } 249 | ], 250 | "metadata": { 251 | "kernelspec": { 252 | "name": "python36", 253 | "display_name": "Python 3.6", 254 | "language": "python" 255 | }, 256 | "language_info": { 257 | "mimetype": "text/x-python", 258 | "nbconvert_exporter": "python", 259 | "name": "python", 260 | "pygments_lexer": "ipython3", 261 | "version": "3.6.6", 262 | "file_extension": ".py", 263 | "codemirror_mode": { 264 | "version": 3, 265 | "name": "ipython" 266 | } 267 | } 268 | }, 269 | "nbformat": 4, 270 | "nbformat_minor": 2 271 | } -------------------------------------------------------------------------------- /05.Python深度學習入門-標準神經網路DNN做手寫辨識(MNIST).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": {}, 5 | "cell_type": "markdown", 6 | "source": "# 主題 01-1. 標準神經網路做手寫辨識\n\n我們終於要開始做生命中第一個神經網路...\n(修改自蔡炎龍老師的Deep Learning MOOC 教學)" 7 | }, 8 | { 9 | "metadata": {}, 10 | "cell_type": "markdown", 11 | "source": "## 1. 初始準備\n\nKeras 可以用各種不同的深度學習套件當底層, 我們在此指定用 Tensorflow 以確保執行的一致性。" 12 | }, 13 | { 14 | "metadata": { 15 | "trusted": true 16 | }, 17 | "cell_type": "code", 18 | "source": "%env KERAS_BACKEND=tensorflow", 19 | "execution_count": 1, 20 | "outputs": [ 21 | { 22 | "output_type": "stream", 23 | "text": "env: KERAS_BACKEND=tensorflow\n", 24 | "name": "stdout" 25 | } 26 | ] 27 | }, 28 | { 29 | "metadata": {}, 30 | "cell_type": "markdown", 31 | "source": "再來是我們標準數據分析動作!" 32 | }, 33 | { 34 | "metadata": { 35 | "trusted": true 36 | }, 37 | "cell_type": "code", 38 | "source": "%matplotlib inline\nimport numpy as np\nimport matplotlib.pyplot as plt", 39 | "execution_count": 2, 40 | "outputs": [ 41 | { 42 | "output_type": "stream", 43 | "text": "/home/nbuser/anaconda3_420/lib/python3.5/site-packages/matplotlib/font_manager.py:281: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.\n 'Matplotlib is building the font cache using fc-list. '\n", 44 | "name": "stderr" 45 | } 46 | ] 47 | }, 48 | { 49 | "metadata": {}, 50 | "cell_type": "markdown", 51 | "source": "## 2. 讀入 MNIST 數據庫\n\nMNIST 是有一堆 0-9 的手寫數字圖庫。有 6 萬筆訓練資料, 1 萬筆測試資料。它是 \"Modified\" 版的 NIST 數據庫, 原來的版本有更多資料。這個 Modified 的版本是由 LeCun, Cortes, 及 Burges 等人做的。可以參考這個數據庫的[原始網頁](http://yann.lecun.com/exdb/mnist/)。\n\nMNIST 可以說是 Deep Learning 最有名的範例, 它被 Deep Learning 大師 Hinton 稱為「機器學習的果蠅」。" 52 | }, 53 | { 54 | "metadata": {}, 55 | "cell_type": "markdown", 56 | "source": "### 2.1 由 Keras 讀入 MNIST" 57 | }, 58 | { 59 | "metadata": {}, 60 | "cell_type": "markdown", 61 | "source": "Keras 很貼心的幫我們準備好 MNIST 數據庫, 我們可以這樣讀進來 (第一次要花點時間)。" 62 | }, 63 | { 64 | "metadata": { 65 | "trusted": true 66 | }, 67 | "cell_type": "code", 68 | "source": "from keras.datasets import mnist", 69 | "execution_count": 3, 70 | "outputs": [ 71 | { 72 | "output_type": "stream", 73 | "text": "/home/nbuser/anaconda3_420/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n from ._conv import register_converters as _register_converters\nUsing TensorFlow backend.\n", 74 | "name": "stderr" 75 | } 76 | ] 77 | }, 78 | { 79 | "metadata": { 80 | "trusted": true 81 | }, 82 | "cell_type": "code", 83 | "source": "(x_train, y_train), (x_test, y_test) = mnist.load_data()", 84 | "execution_count": 4, 85 | "outputs": [ 86 | { 87 | "output_type": "stream", 88 | "text": "Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz\n11264000/11490434 [============================>.] - ETA: 0s", 89 | "name": "stdout" 90 | } 91 | ] 92 | }, 93 | { 94 | "metadata": {}, 95 | "cell_type": "markdown", 96 | "source": "我們來看看訓練資料是不是 6 萬筆、測試資料是不是有 1 筆。" 97 | }, 98 | { 99 | "metadata": { 100 | "trusted": true 101 | }, 102 | "cell_type": "code", 103 | "source": "len(x_train)", 104 | "execution_count": 5, 105 | "outputs": [ 106 | { 107 | "output_type": "execute_result", 108 | "execution_count": 5, 109 | "data": { 110 | "text/plain": "60000" 111 | }, 112 | "metadata": {} 113 | } 114 | ] 115 | }, 116 | { 117 | "metadata": { 118 | "trusted": true 119 | }, 120 | "cell_type": "code", 121 | "source": "len(x_test)", 122 | "execution_count": 6, 123 | "outputs": [ 124 | { 125 | "output_type": "execute_result", 126 | "execution_count": 6, 127 | "data": { 128 | "text/plain": "10000" 129 | }, 130 | "metadata": {} 131 | } 132 | ] 133 | }, 134 | { 135 | "metadata": {}, 136 | "cell_type": "markdown", 137 | "source": "特別要注意的是, 萬一在讀的過程中失敗, 你需要找到下載的部份數據集刪去, 然後在一個網路通𣈱的地方再下載一次。" 138 | }, 139 | { 140 | "metadata": {}, 141 | "cell_type": "markdown", 142 | "source": "### 2.2 數據庫的內容\n\n每筆輸入 (x) 就是一個手寫的 0-9 中一個數字的圖檔, 大小為 28x28。而輸出 (y) 當然就是「正確答案」。我們來看看編號 9487 的訓練資料。" 143 | }, 144 | { 145 | "metadata": { 146 | "trusted": true 147 | }, 148 | "cell_type": "code", 149 | "source": "x_train[9487].shape", 150 | "execution_count": 7, 151 | "outputs": [ 152 | { 153 | "output_type": "execute_result", 154 | "execution_count": 7, 155 | "data": { 156 | "text/plain": "(28, 28)" 157 | }, 158 | "metadata": {} 159 | } 160 | ] 161 | }, 162 | { 163 | "metadata": {}, 164 | "cell_type": "markdown", 165 | "source": "因為是圖檔, 當然可以顯示出來!" 166 | }, 167 | { 168 | "metadata": { 169 | "trusted": true 170 | }, 171 | "cell_type": "code", 172 | "source": "plt.imshow(x_train[9487], cmap='Greys')", 173 | "execution_count": 8, 174 | "outputs": [ 175 | { 176 | "output_type": "execute_result", 177 | "execution_count": 8, 178 | "data": { 179 | "text/plain": "" 180 | }, 181 | "metadata": {} 182 | }, 183 | { 184 | "output_type": "display_data", 185 | "data": { 186 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvAOZPmwAADI1JREFUeJzt3W+oXPWdx/HPR5OCSasYctVgdW+36GJQNl2GsKAsLuUWswRjH1QasGS17O2DKFsssiJCfaAgy9puhU0lXUMTbdMWWtc8kN0GWXCDS3AMkthkd6tyt80mJDeoSQqBovnug3tSbuKdM5OZ82duvu8XhJk5vzP3fJjkkzMzv7nzc0QIQD6XtR0AQDsoP5AU5QeSovxAUpQfSIryA0lRfiApyg8kRfmBpJY0ebCVK1fG5ORkk4cEUpmZmdGJEyc8yL4jld/2XZK+J+lySf8cEU+X7T85OalutzvKIQGU6HQ6A+879NN+25dL+idJ6yStlrTR9uphfx6AZo3ymn+tpHci4r2I+L2kn0jaUE0sAHUbpfzXS/rtvNuHi23nsT1tu2u7Ozs7O8LhAFRplPIv9KbCJ34/OCK2RkQnIjoTExMjHA5AlUYp/2FJN8y7/VlJR0aLA6Apo5T/DUk32f6c7U9J+qqkXdXEAlC3oaf6IuIj2w9K+jfNTfVti4hfVZYMQK1GmuePiFckvVJRFgAN4uO9QFKUH0iK8gNJUX4gKcoPJEX5gaQoP5AU5QeSovxAUpQfSIryA0lRfiApyg8kRfmBpCg/kBTlB5Ki/EBSlB9IivIDSVF+ICnKDyTV6BLdaN6KFStKx0+ePFk6vnnz5tLxZ5999qIzYTxw5geSovxAUpQfSIryA0lRfiApyg8kRfmBpEaa57c9I+m0pI8lfRQRnSpC4eJ8+OGHPcciovS+tkvHt2zZUjrOPP/iVcWHfP4yIk5U8HMANIin/UBSo5Y/JP3S9pu2p6sIBKAZoz7tvz0ijti+RtJu2/8VEa/N36H4T2Fakm688cYRDwegKiOd+SPiSHF5XNJLktYusM/WiOhERGdiYmKUwwGo0NDlt73c9mfOXZf0JUlvVxUMQL1Gedp/raSXiqmiJZJ+HBH/WkkqALUbuvwR8Z6kP60wC3roN1f/3HPP9Rw7ffp01XHOs3v37tLxqampWo+P4THVByRF+YGkKD+QFOUHkqL8QFKUH0iKr+5eBE6dOlU6/vjjjzeU5JP27NlTOs5U3/jizA8kRfmBpCg/kBTlB5Ki/EBSlB9IivIDSTHPvwgsXbq0dHz16tU9xw4ePFh1HFwiOPMDSVF+ICnKDyRF+YGkKD+QFOUHkqL8QFLM8y8Cy5YtKx2///77e4498sgjVcc5z86dO0vHH3744Z5jV111VdVxcBE48wNJUX4gKcoPJEX5gaQoP5AU5QeSovxAUn3n+W1vk7Re0vGIuLXYtkLSTyVNSpqRdG9EfFBfTIyrd999t3T8zJkzPceY52/XIGf+H0q664Jtj0p6NSJukvRqcRvAItK3/BHxmqT3L9i8QdL24vp2SfdUnAtAzYZ9zX9tRByVpOLymuoiAWhC7W/42Z623bXdnZ2drftwAAY0bPmP2V4lScXl8V47RsTWiOhERGdiYmLIwwGo2rDl3yVpU3F9k6SXq4kDoCl9y297p6T/lPQntg/b/rqkpyVN2f61pKniNoBFpO88f0Rs7DH0xYqzYEgPPfRQz7Enn3yy9L4nT56sOg4WCT7hByRF+YGkKD+QFOUHkqL8QFKUH0iKr+6+BCxZ0vuv0XaDSbCYcOYHkqL8QFKUH0iK8gNJUX4gKcoPJEX5gaQoP5AU5QeSovxAUpQfSIryA0lRfiApyg8kRfmBpPh9/kvc+vXrS8dfeOGFkX7+2bNnS8f37t3bc2zDhg0jHRuj4cwPJEX5gaQoP5AU5QeSovxAUpQfSIryA0n1nee3vU3SeknHI+LWYtsTkv5G0myx22MR8UpdITG8Bx54oHT8xRdfHOnnX3ZZ+fnjmWee6Tk2NTVVet9ly5YNlQmDGeTM/0NJdy2w/bsRsab4Q/GBRaZv+SPiNUnvN5AFQINGec3/oO39trfZvrqyRAAaMWz5vy/p85LWSDoqqecLO9vTtru2u7Ozs712A9CwocofEcci4uOIOCvpB5LWluy7NSI6EdGZmJgYNieAig1Vftur5t38sqS3q4kDoCmDTPXtlHSnpJW2D0v6tqQ7ba+RFJJmJH2jxowAatC3/BGxcYHNz9eQBZeg119/vefYvn37Su97xx13VB0H8/AJPyApyg8kRfmBpCg/kBTlB5Ki/EBSlB9IivIDSVF+ICnKDyRF+YGkKD+QFOUHkqL8QFIs0X2Ju+6660rH+327Up1fvTY9PV06fvDgwdqODc78QFqUH0iK8gNJUX4gKcoPJEX5gaQoP5AU8/yXuJtvvrl0fN26daXjO3bsqDLOeVi+rV2c+YGkKD+QFOUHkqL8QFKUH0iK8gNJUX4gqb7z/LZvkLRD0nWSzkraGhHfs71C0k8lTUqakXRvRHxQX1TU4amnniodr3Oe/8yZM6XjBw4cKB2/7bbbqoyTziBn/o8kfSsibpH055I2214t6VFJr0bETZJeLW4DWCT6lj8ijkbEvuL6aUmHJF0vaYOk7cVu2yXdU1dIANW7qNf8ticlfUHSXknXRsRRae4/CEnXVB0OQH0GLr/tT0v6uaRvRsSpi7jftO2u7S6f5QbGx0Dlt71Uc8X/UUT8oth8zPaqYnyVpOML3TcitkZEJyI6/b4sEkBz+pbftiU9L+lQRHxn3tAuSZuK65skvVx9PAB1GeRXem+X9DVJB2y/VWx7TNLTkn5m++uSfiPpK/VERJ2WL19eOt5vOm3//v1DH/uKK64oHb/llluG/tnor2/5I2KPJPcY/mK1cQA0hU/4AUlRfiApyg8kRfmBpCg/kBTlB5Liq7uTu/LKK0vH77777tLxUeb5+1myhH+edeLMDyRF+YGkKD+QFOUHkqL8QFKUH0iK8gNJMZGKUvfdd1/p+AcflH9b+5YtW6qMgwpx5geSovxAUpQfSIryA0lRfiApyg8kRfmBpBwRjR2s0+lEt9tt7HhANp1OR91ut9dX7Z+HMz+QFOUHkqL8QFKUH0iK8gNJUX4gKcoPJNW3/LZvsP3vtg/Z/pXtvy22P2H7/2y/Vfz5q/rjAqjKIF/m8ZGkb0XEPtufkfSm7d3F2Hcj4h/qiwegLn3LHxFHJR0trp+2fUjS9XUHA1Cvi3rNb3tS0hck7S02PWh7v+1ttq/ucZ9p213b3dnZ2ZHCAqjOwOW3/WlJP5f0zYg4Jen7kj4vaY3mnhk8s9D9ImJrRHQiojMxMVFBZABVGKj8tpdqrvg/iohfSFJEHIuIjyPirKQfSFpbX0wAVRvk3X5Lel7SoYj4zrztq+bt9mVJb1cfD0BdBnm3/3ZJX5N0wPZbxbbHJG20vUZSSJqR9I1aEgKoxSDv9u+RtNDvB79SfRwATeETfkBSlB9IivIDSVF+ICnKDyRF+YGkKD+QFOUHkqL8QFKUH0iK8gNJUX4gKcoPJEX5gaQaXaLb9qyk/523aaWkE40FuDjjmm1cc0lkG1aV2f4oIgb6vrxGy/+Jg9vdiOi0FqDEuGYb11wS2YbVVjae9gNJUX4gqbbLv7Xl45cZ12zjmksi27Baydbqa34A7Wn7zA+gJa2U3/Zdtv/b9ju2H20jQy+2Z2wfKFYe7racZZvt47bfnrdthe3dtn9dXC64TFpL2cZi5eaSlaVbfezGbcXrxp/2275c0v9ImpJ0WNIbkjZGxMFGg/Rge0ZSJyJanxO2/ReSfidpR0TcWmz7e0nvR8TTxX+cV0fE341Jtick/a7tlZuLBWVWzV9ZWtI9kv5aLT52JbnuVQuPWxtn/rWS3omI9yLi95J+ImlDCznGXkS8Jun9CzZvkLS9uL5dc/94Gtcj21iIiKMRsa+4flrSuZWlW33sSnK1oo3yXy/pt/NuH9Z4Lfkdkn5p+03b022HWcC1xbLp55ZPv6blPBfqu3Jzky5YWXpsHrthVryuWhvlX2j1n3Gacrg9Iv5M0jpJm4untxjMQCs3N2WBlaXHwrArXletjfIflnTDvNuflXSkhRwLiogjxeVxSS9p/FYfPnZukdTi8njLef5gnFZuXmhlaY3BYzdOK163Uf43JN1k+3O2PyXpq5J2tZDjE2wvL96Ike3lkr6k8Vt9eJekTcX1TZJebjHLecZl5eZeK0ur5cdu3Fa8buVDPsVUxj9KulzStoh4qvEQC7D9x5o720tzi5j+uM1stndKulNzv/V1TNK3Jf2LpJ9JulHSbyR9JSIaf+OtR7Y7NffU9Q8rN597jd1wtjsk/YekA5LOFpsf09zr69Yeu5JcG9XC48Yn/ICk+IQfkBTlB5Ki/EBSlB9IivIDSVF+ICnKDyRF+YGk/h85G4rir6/+MAAAAABJRU5ErkJggg==\n", 187 | "text/plain": "" 188 | }, 189 | "metadata": {} 190 | } 191 | ] 192 | }, 193 | { 194 | "metadata": {}, 195 | "cell_type": "markdown", 196 | "source": "我們人眼辨識就知道這是 1, 我們看答案是不是和我們想的一樣。" 197 | }, 198 | { 199 | "metadata": { 200 | "trusted": true 201 | }, 202 | "cell_type": "code", 203 | "source": "y_train[9487]", 204 | "execution_count": 9, 205 | "outputs": [ 206 | { 207 | "output_type": "execute_result", 208 | "execution_count": 9, 209 | "data": { 210 | "text/plain": "1" 211 | }, 212 | "metadata": {} 213 | } 214 | ] 215 | }, 216 | { 217 | "metadata": {}, 218 | "cell_type": "markdown", 219 | "source": "### 2.3 輸入格式整理\n\n我們現在要用標準神經網路學學手寫辨識。原來的每筆數據是個 28x28 的矩陣 (array), 但標準神經網路只吃「平平的」, 也就是每次要 28x28=784 長的向量。因此我們要用 `reshape` 調校一下。" 220 | }, 221 | { 222 | "metadata": { 223 | "trusted": true 224 | }, 225 | "cell_type": "code", 226 | "source": "x_train = x_train.reshape(60000, 784)\nx_test = x_test.reshape(10000, 784)", 227 | "execution_count": 10, 228 | "outputs": [] 229 | }, 230 | { 231 | "metadata": {}, 232 | "cell_type": "markdown", 233 | "source": "### 2.4 輸出格式整理\n\n我們可能會想, 我們想學的函數是這樣的型式:\n\n$$\\hat{f} \\colon \\mathbb{R}^{784} \\to \\mathbb{R}$$\n\n其實這樣不太好! 為什麼呢? 比如說我們的輸入 x 是一張 0 的圖, 因為我們訓練的神經網路總會有點誤差, 所以可能會得到:\n\n$$\\hat{f}(x) = 0.5$$\n\n那這意思是有可能是 0, 也有可能是 1 嗎!!?? 可是 0 和 1 根本不像啊。換句話說分類的問題這樣做其實不合理!\n\n於是我們會做 \"1-hot enconding\", 也就是\n\n* 1 -> [0, 1, 0, 0, 0, 0, 0, 0, 0]\n* 5 -> [0, 0, 0, 0, 0, 1, 0, 0, 0]\n\n等等。因為分類問題基本上都要做這件事, Keras 其實已幫我們準備好套件!" 234 | }, 235 | { 236 | "metadata": { 237 | "trusted": true 238 | }, 239 | "cell_type": "code", 240 | "source": "from keras.utils import np_utils", 241 | "execution_count": 11, 242 | "outputs": [] 243 | }, 244 | { 245 | "metadata": { 246 | "trusted": true 247 | }, 248 | "cell_type": "code", 249 | "source": "y_train = np_utils.to_categorical(y_train,10)", 250 | "execution_count": 12, 251 | "outputs": [] 252 | }, 253 | { 254 | "metadata": { 255 | "trusted": true 256 | }, 257 | "cell_type": "code", 258 | "source": "y_test = np_utils.to_categorical(y_test,10)", 259 | "execution_count": 13, 260 | "outputs": [] 261 | }, 262 | { 263 | "metadata": {}, 264 | "cell_type": "markdown", 265 | "source": "我們來看看剛剛是 1 的 9487 號數據的答案。" 266 | }, 267 | { 268 | "metadata": { 269 | "trusted": true 270 | }, 271 | "cell_type": "code", 272 | "source": "y_train[9487]", 273 | "execution_count": 14, 274 | "outputs": [ 275 | { 276 | "output_type": "execute_result", 277 | "execution_count": 14, 278 | "data": { 279 | "text/plain": "array([0., 1., 0., 0., 0., 0., 0., 0., 0., 0.])" 280 | }, 281 | "metadata": {} 282 | } 283 | ] 284 | }, 285 | { 286 | "metadata": {}, 287 | "cell_type": "markdown", 288 | "source": "和我們想的一樣! 至此我們可以打造我們的神經網路了。" 289 | }, 290 | { 291 | "metadata": {}, 292 | "cell_type": "markdown", 293 | "source": "## 3. 打造第一個神經網路\n\n我們決定了我們的函數是\n\n$$\\hat{f} \\colon \\mathbb{R}^{784} \\to \\mathbb{R}^{10}$$\n\n這個樣子。而我們又說第一次要用標準神網路試試, 所以我們只需要再決定要幾個隱藏層、每層要幾個神經元, 用哪個激發函數就可以了。\n\n### 3.1 決定神經網路架構、讀入相關套件\n\n假如我們要這麼做:\n\n* 使用 2 個 hidden layers\n* 每個 hidden layer 用 500 個神經元\n* Activation Function 唯一指名 sigmoid\n\n於是從 Keras 把相關套件讀進來。" 294 | }, 295 | { 296 | "metadata": { 297 | "trusted": true 298 | }, 299 | "cell_type": "code", 300 | "source": "from keras.models import Sequential\nfrom keras.layers import Dense, Activation\nfrom keras.optimizers import SGD", 301 | "execution_count": 15, 302 | "outputs": [] 303 | }, 304 | { 305 | "metadata": {}, 306 | "cell_type": "markdown", 307 | "source": "### 3.2 建構我們的神經網路\n\n和以前做迴歸或機器學習一樣, 我們就打開個「函數學習機」。標準一層一層傳遞的神經網路叫 `Sequential`, 於是我們打開一個空的神經網路。" 308 | }, 309 | { 310 | "metadata": { 311 | "trusted": true 312 | }, 313 | "cell_type": "code", 314 | "source": "model = Sequential()", 315 | "execution_count": 16, 316 | "outputs": [] 317 | }, 318 | { 319 | "metadata": {}, 320 | "cell_type": "markdown", 321 | "source": "我們每次用 `add` 去加一層, 從第一個隱藏層開始。而第一個隱藏層因為 Keras 當然猜不到輸入有幾個 features, 所以我們要告訴它。" 322 | }, 323 | { 324 | "metadata": { 325 | "trusted": true 326 | }, 327 | "cell_type": "code", 328 | "source": "model.add(Dense(500, input_dim=784))\nmodel.add(Activation('sigmoid'))", 329 | "execution_count": 17, 330 | "outputs": [] 331 | }, 332 | { 333 | "metadata": {}, 334 | "cell_type": "markdown", 335 | "source": "第二層 hidden layer 因為前面輸出是 500, 現在輸入是 500, 就不用再說了! 這裡的 500 只告訴 Keras, 我們第二層還是用 500!" 336 | }, 337 | { 338 | "metadata": { 339 | "trusted": true 340 | }, 341 | "cell_type": "code", 342 | "source": "model.add(Dense(500))\nmodel.add(Activation('sigmoid'))", 343 | "execution_count": 18, 344 | "outputs": [] 345 | }, 346 | { 347 | "metadata": {}, 348 | "cell_type": "markdown", 349 | "source": "輸出有 10 個數字, 所以輸出層的神經元是 10 個! 而如果我們的網路輸出是 \n\n$$(y_1, y_2, \\ldots, y_{10})$$\n\n我們還希望\n\n$$\\sum_{i=1}^{10} y_i = 1$$\n\n這可能嗎, 結果是很容易, 就用 `softmax` 當激發函數就可以!!" 350 | }, 351 | { 352 | "metadata": { 353 | "trusted": true 354 | }, 355 | "cell_type": "code", 356 | "source": "model.add(Dense(10))\nmodel.add(Activation('softmax'))", 357 | "execution_count": 19, 358 | "outputs": [] 359 | }, 360 | { 361 | "metadata": {}, 362 | "cell_type": "markdown", 363 | "source": "至此我們的第一個神經網路就建好了!" 364 | }, 365 | { 366 | "metadata": {}, 367 | "cell_type": "markdown", 368 | "source": "### 3.3 組裝\n\n和之前比較不一樣的是我們還要做 `compile` 才正式把我們的神經網路建好。你可以發現我們還需要做幾件事:\n\n* 決定使用的 loss function, 一般是 `mse`\n* 決定 optimizer, 我們用標準的 SGD\n* 設 learning rate\n\n為了一邊訓練一邊看到結果, 我們加設\n\n metrics=['accuracy']\n \n本行基本上和我們的神經網路功能沒有什麼關係。" 369 | }, 370 | { 371 | "metadata": { 372 | "trusted": true 373 | }, 374 | "cell_type": "code", 375 | "source": "model.compile(loss='mse', optimizer=SGD(lr=0.1), metrics=['accuracy'])", 376 | "execution_count": 20, 377 | "outputs": [] 378 | }, 379 | { 380 | "metadata": {}, 381 | "cell_type": "markdown", 382 | "source": "## 4. 檢視我們的神經網路" 383 | }, 384 | { 385 | "metadata": {}, 386 | "cell_type": "markdown", 387 | "source": "我們可以檢視我們神經網路的架構, 可以確認一下是不是和我們想像的一樣。" 388 | }, 389 | { 390 | "metadata": {}, 391 | "cell_type": "markdown", 392 | "source": "### 4.1 看 model 的 summary" 393 | }, 394 | { 395 | "metadata": { 396 | "trusted": true 397 | }, 398 | "cell_type": "code", 399 | "source": "model.summary()", 400 | "execution_count": 21, 401 | "outputs": [ 402 | { 403 | "output_type": "stream", 404 | "text": "_________________________________________________________________\nLayer (type) Output Shape Param # \n=================================================================\ndense_1 (Dense) (None, 500) 392500 \n_________________________________________________________________\nactivation_1 (Activation) (None, 500) 0 \n_________________________________________________________________\ndense_2 (Dense) (None, 500) 250500 \n_________________________________________________________________\nactivation_2 (Activation) (None, 500) 0 \n_________________________________________________________________\ndense_3 (Dense) (None, 10) 5010 \n_________________________________________________________________\nactivation_3 (Activation) (None, 10) 0 \n=================================================================\nTotal params: 648,010\nTrainable params: 648,010\nNon-trainable params: 0\n_________________________________________________________________\n", 405 | "name": "stdout" 406 | } 407 | ] 408 | }, 409 | { 410 | "metadata": {}, 411 | "cell_type": "markdown", 412 | "source": "### 4.2 畫出結構圖\n\n要使用這個功能要安裝 `pydot` 及 `graphviz` 兩個套件, 請在終端機 (Anaconda Prompt) 安裝:\n\n conda install pydot\n conda install graphviz" 413 | }, 414 | { 415 | "metadata": { 416 | "trusted": true 417 | }, 418 | "cell_type": "code", 419 | "source": "from keras.utils import plot_model\nplot_model(model, show_shapes=True, to_file='.\\data\\model01.png')", 420 | "execution_count": 22, 421 | "outputs": [] 422 | }, 423 | { 424 | "metadata": {}, 425 | "cell_type": "markdown", 426 | "source": "![我的神經網路](model01.png)" 427 | }, 428 | { 429 | "metadata": {}, 430 | "cell_type": "markdown", 431 | "source": "## 5. 訓練你的第一個神經網路\n\n恭喜! 我們完成了第一個神經網路。現在要訓練的時候, 你會發現不是像以前沒頭沒腦把訓練資料送進去就好。這裡我們還有兩件事要決定:\n\n* 一次要訓練幾筆資料 (`batch_size`), 我們就 100 筆調一次參數好了\n* 這 6 萬筆資料一共要訓練幾次 (`epochs`), 我們訓練個 20 次試試\n\n於是最精彩的就來了。你要有等待的心理準備..." 432 | }, 433 | { 434 | "metadata": { 435 | "scrolled": true, 436 | "trusted": true 437 | }, 438 | "cell_type": "code", 439 | "source": "model.fit(x_train, y_train, batch_size=100, epochs=20)", 440 | "execution_count": 23, 441 | "outputs": [ 442 | { 443 | "output_type": "stream", 444 | "text": "Epoch 1/20\n60000/60000 [==============================] - 11s - loss: 0.0834 - acc: 0.3414 \nEpoch 2/20\n60000/60000 [==============================] - 10s - loss: 0.0638 - acc: 0.6270 \nEpoch 3/20\n60000/60000 [==============================] - 11s - loss: 0.0459 - acc: 0.7645 \nEpoch 4/20\n60000/60000 [==============================] - 10s - loss: 0.0334 - acc: 0.8397 \nEpoch 5/20\n60000/60000 [==============================] - 10s - loss: 0.0257 - acc: 0.8739 \nEpoch 6/20\n60000/60000 [==============================] - 11s - loss: 0.0213 - acc: 0.8895 \nEpoch 7/20\n60000/60000 [==============================] - 11s - loss: 0.0186 - acc: 0.8991 \nEpoch 8/20\n60000/60000 [==============================] - 11s - loss: 0.0168 - acc: 0.9061 \nEpoch 9/20\n60000/60000 [==============================] - 11s - loss: 0.0154 - acc: 0.9119 \nEpoch 10/20\n60000/60000 [==============================] - 11s - loss: 0.0144 - acc: 0.9164 \nEpoch 11/20\n60000/60000 [==============================] - 11s - loss: 0.0135 - acc: 0.9210 \nEpoch 12/20\n60000/60000 [==============================] - 11s - loss: 0.0129 - acc: 0.9247 \nEpoch 13/20\n60000/60000 [==============================] - 11s - loss: 0.0122 - acc: 0.9278 \nEpoch 14/20\n60000/60000 [==============================] - 12s - loss: 0.0117 - acc: 0.9308 \nEpoch 15/20\n60000/60000 [==============================] - 11s - loss: 0.0112 - acc: 0.9338 \nEpoch 16/20\n60000/60000 [==============================] - 11s - loss: 0.0108 - acc: 0.9364 \nEpoch 17/20\n60000/60000 [==============================] - 11s - loss: 0.0105 - acc: 0.9382 \nEpoch 18/20\n60000/60000 [==============================] - 11s - loss: 0.0101 - acc: 0.9404 \nEpoch 19/20\n60000/60000 [==============================] - 12s - loss: 0.0098 - acc: 0.9429 \nEpoch 20/20\n60000/60000 [==============================] - 13s - loss: 0.0095 - acc: 0.9448 \n", 445 | "name": "stdout" 446 | }, 447 | { 448 | "output_type": "execute_result", 449 | "execution_count": 23, 450 | "data": { 451 | "text/plain": "" 452 | }, 453 | "metadata": {} 454 | } 455 | ] 456 | }, 457 | { 458 | "metadata": {}, 459 | "cell_type": "markdown", 460 | "source": "## 6. 試用我們的結果\n\n我們來用比較炫的方式來看看可愛的神經網路學習成果。對指令有問題可以參考我們之前的 MOOC 影片教學。" 461 | }, 462 | { 463 | "metadata": { 464 | "trusted": true 465 | }, 466 | "cell_type": "code", 467 | "source": "from ipywidgets import interact_manual", 468 | "execution_count": 24, 469 | "outputs": [] 470 | }, 471 | { 472 | "metadata": {}, 473 | "cell_type": "markdown", 474 | "source": "我們 \"predict\" 放的是我們神經網路的學習結果。這裡用 `predict_classes` 會讓我們 Keras 選 10 個輸出機率最大的那類。" 475 | }, 476 | { 477 | "metadata": { 478 | "trusted": true 479 | }, 480 | "cell_type": "code", 481 | "source": "predict = model.predict_classes(x_test)", 482 | "execution_count": 25, 483 | "outputs": [ 484 | { 485 | "output_type": "stream", 486 | "text": " 9984/10000 [============================>.] - ETA: 0s", 487 | "name": "stdout" 488 | } 489 | ] 490 | }, 491 | { 492 | "metadata": {}, 493 | "cell_type": "markdown", 494 | "source": "不要忘了我們的 `x_test` 每筆資料已經換成 784 維的向量, 我們要整型回 28x28 的矩陣才能當成圖形顯示出來!" 495 | }, 496 | { 497 | "metadata": { 498 | "trusted": true 499 | }, 500 | "cell_type": "code", 501 | "source": "def test(測試編號):\n plt.imshow(x_test[測試編號].reshape(28,28), cmap=\"Greys\")\n print(\"神經網路判斷為:\", predict[測試編號])", 502 | "execution_count": 26, 503 | "outputs": [] 504 | }, 505 | { 506 | "metadata": { 507 | "trusted": true 508 | }, 509 | "cell_type": "code", 510 | "source": "interact_manual(test, 測試編號 = (0, 9999));", 511 | "execution_count": 27, 512 | "outputs": [ 513 | { 514 | "output_type": "display_data", 515 | "data": { 516 | "application/vnd.jupyter.widget-view+json": { 517 | "model_id": "f7c85478d60d401fb1a6968c437b46cc", 518 | "version_minor": 0, 519 | "version_major": 2 520 | }, 521 | "text/plain": "interactive(children=(IntSlider(value=4999, description='測試編號', max=9999), Button(description='Run Interact', …" 522 | }, 523 | "metadata": {} 524 | } 525 | ] 526 | }, 527 | { 528 | "metadata": {}, 529 | "cell_type": "markdown", 530 | "source": "到底測試資料總的狀況如何呢? 我們可以給我們神經網路「考一下試」。" 531 | }, 532 | { 533 | "metadata": { 534 | "trusted": true 535 | }, 536 | "cell_type": "code", 537 | "source": "score = model.evaluate(x_test, y_test)", 538 | "execution_count": 28, 539 | "outputs": [ 540 | { 541 | "output_type": "stream", 542 | "text": " 9536/10000 [===========================>..] - ETA: 0s", 543 | "name": "stdout" 544 | } 545 | ] 546 | }, 547 | { 548 | "metadata": { 549 | "trusted": true 550 | }, 551 | "cell_type": "code", 552 | "source": "print('測試資料的 loss:', score[0])\nprint('測試資料正確率:', score[1])", 553 | "execution_count": 29, 554 | "outputs": [ 555 | { 556 | "output_type": "stream", 557 | "text": "測試資料的 loss: 0.010674761578021571\n測試資料正確率: 0.9332\n", 558 | "name": "stdout" 559 | } 560 | ] 561 | }, 562 | { 563 | "metadata": {}, 564 | "cell_type": "markdown", 565 | "source": "## 7. 訓練好的神經網路存起來!\n\n如果對訓練成果滿意, 我們當然不想每次都再訓練一次! 我們可以把神經網路的架構和訓練好的參數都存起來, 以供日後使用!" 566 | }, 567 | { 568 | "metadata": {}, 569 | "cell_type": "markdown", 570 | "source": "之前還沒裝 pyh5 要在終端機 (Anaconda Prompt) 下安裝:\n \n conda install h5py" 571 | }, 572 | { 573 | "metadata": { 574 | "trusted": true 575 | }, 576 | "cell_type": "code", 577 | "source": "model_json = model.to_json()\nopen('./data/handwriting_model_architecture.json', 'w').write(model_json)\nmodel.save_weights('./data/handwriting_model_weights.h5')", 578 | "execution_count": 30, 579 | "outputs": [] 580 | }, 581 | { 582 | "metadata": {}, 583 | "cell_type": "markdown", 584 | "source": "### reference\n- 蔡炎龍老師的Deep Learning MOOC 教學" 585 | } 586 | ], 587 | "metadata": { 588 | "anaconda-cloud": {}, 589 | "kernelspec": { 590 | "name": "python3", 591 | "display_name": "Python 3", 592 | "language": "python" 593 | }, 594 | "language_info": { 595 | "mimetype": "text/x-python", 596 | "nbconvert_exporter": "python", 597 | "name": "python", 598 | "pygments_lexer": "ipython3", 599 | "version": "3.5.4", 600 | "file_extension": ".py", 601 | "codemirror_mode": { 602 | "version": 3, 603 | "name": "ipython" 604 | } 605 | } 606 | }, 607 | "nbformat": 4, 608 | "nbformat_minor": 2 609 | } -------------------------------------------------------------------------------- /13.Python資料分析應用-語意分析篇NLP.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"name":"python36","display_name":"Python 3.6","language":"python"},"language_info":{"mimetype":"text/x-python","nbconvert_exporter":"python","name":"python","pygments_lexer":"ipython3","version":"3.6.6","file_extension":".py","codemirror_mode":{"version":3,"name":"ipython"}},"colab":{"name":"13.Python資料分析應用-語意分析篇NLP.ipynb","provenance":[],"collapsed_sections":["v4uvc5V9yf3p","7gnl_RCxyreB","oOm2PDRszMG-","z8TyrJww1txI","SOqgKxRr1xO5","jDkfbJIcEq6Z","ETbAeMnUhYaR","SwodE5dTiHEa","01IxuLcnjKQz","uAPX96dTqaOI","a5hRs_GJrA1y","zfRwhgDWs7kT","rS9yjBjvtZ6a","E2_iZuF-tnrU"]},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","metadata":{"id":"vx9mc_dLEq5r","colab_type":"text"},"source":["# 13.Python資料分析應用-語意分析篇NLP"]},{"cell_type":"markdown","metadata":{"id":"v4uvc5V9yf3p","colab_type":"text"},"source":["## 基本功"]},{"cell_type":"markdown","metadata":{"id":"7gnl_RCxyreB","colab_type":"text"},"source":["### 斷詞"]},{"cell_type":"markdown","metadata":{"id":"1F9NmXW4yxSt","colab_type":"text"},"source":["- jieba\n","https://github.com/fxsjy/jieba"]},{"cell_type":"code","metadata":{"id":"CFtDQiZUeUAb","colab_type":"code","outputId":"b179a2e4-c641-43a4-9cce-16d52ea511ff","executionInfo":{"status":"ok","timestamp":1577284555097,"user_tz":-480,"elapsed":963,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["import jieba\n","\n","seg_list = jieba.cut(\"我是胖虎我是孩子王\", cut_all=True)\n","print(\"Paddle Mode: \" + \"/ \".join(seg_list)) # paddle模式"],"execution_count":0,"outputs":[{"output_type":"stream","text":["Paddle Mode: 我/ 是/ 胖/ 虎/ 我/ 是/ 孩子/ 孩子王\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"C2cZfP7RywYz","colab_type":"code","outputId":"67da0143-6051-4c53-9f2e-801807ea8a9b","executionInfo":{"status":"ok","timestamp":1577284713595,"user_tz":-480,"elapsed":1027,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":52}},"source":["from jieba import posseg\n","\n","text = '我是胖虎,我是孩子王'\n","words = posseg.cut(text)\n","print([word for word in words])\n","words_list = posseg.lcut(text)\n","print(words_list)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["[pair('我', 'r'), pair('是', 'v'), pair('胖虎', 'n'), pair(',', 'x'), pair('我', 'r'), pair('是', 'v'), pair('孩子王', 'n')]\n","[pair('我', 'r'), pair('是', 'v'), pair('胖虎', 'n'), pair(',', 'x'), pair('我', 'r'), pair('是', 'v'), pair('孩子王', 'n')]\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"mFdkHTUjy_x1","colab_type":"text"},"source":["- 中研院中文斷詞系統PyCKIP\n"," - https://github.com/ComposeAI/pyCKIP"]},{"cell_type":"markdown","metadata":{"id":"oOm2PDRszMG-","colab_type":"text"},"source":["### 以模組進行基本語意分析\n","- 以[手把手教你如何用 Python 做情感分析](https://www.itread01.com/articles/1498721884.html)為例\n","- 使用SnowNLP"]},{"cell_type":"markdown","metadata":{"id":"z8TyrJww1txI","colab_type":"text"},"source":["#### 英文為例"]},{"cell_type":"code","metadata":{"id":"uC_vNhedzdvX","colab_type":"code","outputId":"3f8356d0-dc13-4cc1-fef3-597efe6d6c23","executionInfo":{"status":"ok","timestamp":1576769875759,"user_tz":-480,"elapsed":24889,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":516}},"source":["#安裝相關套件\n","!pip install snownlp\n","!pip install -U textblob\n","!python -m textblob.download_corpora"],"execution_count":0,"outputs":[{"output_type":"stream","text":["Collecting snownlp\n","\u001b[?25l Downloading https://files.pythonhosted.org/packages/3d/b3/37567686662100d3bce62d3b0f2adec18ab4b9ff2b61abd7a61c39343c1d/snownlp-0.12.3.tar.gz (37.6MB)\n","\u001b[K |████████████████████████████████| 37.6MB 64kB/s \n","\u001b[?25hBuilding wheels for collected packages: snownlp\n"," Building wheel for snownlp (setup.py) ... \u001b[?25l\u001b[?25hdone\n"," Created wheel for snownlp: filename=snownlp-0.12.3-cp36-none-any.whl size=37760958 sha256=7932304804a4dd051b9b95c1e501897ed2ce2c7c06514f303c7b3e74cc9b1c23\n"," Stored in directory: /root/.cache/pip/wheels/f3/81/25/7c197493bd7daf177016f1a951c5c3a53b1c7e9339fd11ec8f\n","Successfully built snownlp\n","Installing collected packages: snownlp\n","Successfully installed snownlp-0.12.3\n","Requirement already up-to-date: textblob in /usr/local/lib/python3.6/dist-packages (0.15.3)\n","Requirement already satisfied, skipping upgrade: nltk>=3.1 in /usr/local/lib/python3.6/dist-packages (from textblob) (3.2.5)\n","Requirement already satisfied, skipping upgrade: six in /usr/local/lib/python3.6/dist-packages (from nltk>=3.1->textblob) (1.12.0)\n","[nltk_data] Downloading package brown to /root/nltk_data...\n","[nltk_data] Unzipping corpora/brown.zip.\n","[nltk_data] Downloading package punkt to /root/nltk_data...\n","[nltk_data] Unzipping tokenizers/punkt.zip.\n","[nltk_data] Downloading package wordnet to /root/nltk_data...\n","[nltk_data] Unzipping corpora/wordnet.zip.\n","[nltk_data] Downloading package averaged_perceptron_tagger to\n","[nltk_data] /root/nltk_data...\n","[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.\n","[nltk_data] Downloading package conll2000 to /root/nltk_data...\n","[nltk_data] Unzipping corpora/conll2000.zip.\n","[nltk_data] Downloading package movie_reviews to /root/nltk_data...\n","[nltk_data] Unzipping corpora/movie_reviews.zip.\n","Finished.\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"5GiaqI6izi2S","colab_type":"code","colab":{}},"source":["text = \"I am happy today. I feel sad today.\""],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Ls0eFzWbzk3B","colab_type":"code","outputId":"d3e25dd5-8217-465e-c743-0e81f653353c","executionInfo":{"status":"ok","timestamp":1576770360899,"user_tz":-480,"elapsed":1401,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["from textblob import TextBlob\n","\n","blob = TextBlob(text)\n","blob.sentences"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[Sentence(\"I am happy today.\"), Sentence(\"I feel sad today.\")]"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"code","metadata":{"id":"mAi5YynTznFG","colab_type":"code","outputId":"04ddfcf8-b145-4786-ff29-5e98ad7290b4","executionInfo":{"status":"ok","timestamp":1576770363261,"user_tz":-480,"elapsed":933,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":70}},"source":["#情感極性的變化範圍是[-1, 1],-1代表完全負面,1代表完全正面。\n","print(blob.sentences[0].sentiment)\n","print(blob.sentences[1].sentiment)\n","print(blob.sentiment)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["Sentiment(polarity=0.8, subjectivity=1.0)\n","Sentiment(polarity=-0.5, subjectivity=1.0)\n","Sentiment(polarity=0.15000000000000002, subjectivity=1.0)\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"SOqgKxRr1xO5","colab_type":"text"},"source":["#### 中文為例"]},{"cell_type":"markdown","metadata":{"id":"jGMJNCgP2ANK","colab_type":"text"},"source":["- 使用SnowNLP: http://t.cn/8kf1c3p"]},{"cell_type":"markdown","metadata":{"id":"Xy6n0_Gn4lTz","colab_type":"text"},"source":["- 情感係數:SnowNLP(i).sentiments\n","- 分詞:SnowNLP(i).words\n","- 轉拼音:SnowNLP(i).pinyin\n","- 關鍵詞提取:SnowNLP(i).keywords(2) # n預設為5\n","- 自動文摘:SnowNLP(i).summary(1) # n預設為5\n","- 句子切分:SnowNLP(i).sentences\n","- 轉簡體:SnowNLP(i).han\n","- 計算相似度:SnowNLP(i).sim(doc,index)\n","- 計算term frequency詞頻:SnowNLP(i).tf 單個字的詞頻,暫時沒啥用"]},{"cell_type":"code","metadata":{"id":"X06ZV0bu2Chi","colab_type":"code","colab":{}},"source":["text = \"我今天很快樂。我今天很憤怒。\" "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"7OR0o0P72H_f","colab_type":"code","colab":{}},"source":["from snownlp import SnowNLP\n","\n","s = SnowNLP(text)\n","s.sentences"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"SWET-95z4xc_","colab_type":"code","colab":{}},"source":["s.words"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"nAVTJ4OD2K6e","colab_type":"code","colab":{}},"source":["#SnowNLP的情感分析取值,表達的是“這句話代表正面情感的概率”。\n","print(SnowNLP(s.sentences[0]).sentiments)\n","print(SnowNLP(s.sentences[1]).sentiments)\n","print(s.sentiments)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"mstO8ha52Ymx","colab_type":"code","colab":{}},"source":["text = '''\n","台中市雷姓男子去年10月間在西區公益路、大墩路的工地喝酒後,騎車上路,遭警方盤查後,酒測值超標為每公升0.70毫克,他遭警方依公共危險罪嫌送辦,檢方聲請簡易判決處刑,法官判他4月徒刑、得易科罰金,但他今年2月要報到執行時,檢方命他不得易科,需入監服刑,他向台中地院聲明異議,法官審理後認為,雷在2016至2018年間連續3度酒駕,前二次一次是緩起訴、另次則是易科罰金,但此次又被查獲,已是第三次,認為他漠視法律,不矯正難收矯正之效,駁回雷聲明異議,判雷要關。\n","判決書指出,雷姓男子(41歲)去年10月間酒後騎車被查獲,酒測值為每公升0.70毫克,台中地院依公共危險罪,簡易判決處刑4月,得易科罰金12萬元,不過雷在今年2月到台中地檢署報到執行時,檢方諭令他不得易科,需入監服刑,他不服向台中地院聲明異議。\n","\n","雷辯稱,判決書上明寫可易科罰金,為何檢方堅持要讓他去服刑,而且他已離婚是單親家庭,有未成年子女要撫養,家境貧寒是中低收入戶,更是家中主要的經濟支柱,希望法官撤銷檢察官不得易科的執行指揮處分,准予讓他易科罰金。\n","\n","台中地院法官審理後認為,雖然法律有規定,本刑5年以下,宣告6月以下徒刑者, 得易科罰金,不過但書是「難收矯正之效或難以維持法秩序者,不在此限」,易科罰金的易刑處分,應否准許,依照《刑事訴訟法》第457條規定,由檢察官就是否准予受刑人易科罰金,有無但書情況,由檢方查明認定並指揮執行。\n","\n","法官指出,雷在2016年酒駕被查獲後,獲緩起訴處分,2017年又被查獲酒駕,當時被判3月徒刑,得易科9萬元,但他2018年又喝酒上路,顯示雷沒有因前兩次遭查獲的前例,而有所警惕,且去年被查獲時,是在喝完啤酒不久就騎車上路,顯然極度漠視法令,其行為對社會法秩序之危害重大,因此認定檢方的處分無不妥,駁回雷聲明異議。\n","'''"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"dGC2v1xA21j2","colab_type":"code","colab":{}},"source":["text2 = '''\n","川普總統在眾議院召開全院辯論之際,在白宮以全大寫推文痛批彈劾為「狠毒的謊言、對美國的攻擊」,並選在眾院投票的時候在密西根州舉行「耶誕快樂」造勢大會。\n","川普在白宮以全大寫推文和誇張的慍怒語氣寫道,「激進左派如此狠毒的謊言,無所事事的民主黨。這是對美國的攻擊,這是對共和黨的攻擊」。\n","'''"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"F-WCYQaC2cPF","colab_type":"code","outputId":"b695170c-9f9d-4499-a76c-349e389bfb4f","executionInfo":{"status":"ok","timestamp":1576770973035,"user_tz":-480,"elapsed":1109,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["s = SnowNLP(text2)\n","# s.sentences[0]\n","s.sentiments"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["1.682213436327018e-07"]},"metadata":{"tags":[]},"execution_count":20}]},{"cell_type":"code","metadata":{"id":"wKAOXsOZoYm1","colab_type":"code","colab":{}},"source":[""],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"J81xvvOemtd2","colab_type":"text"},"source":["參考-https://medium.com/pyladies-taiwan/nltk-%E5%88%9D%E5%AD%B8%E6%8C%87%E5%8D%97-%E4%B8%80-%E7%B0%A1%E5%96%AE%E6%98%93%E4%B8%8A%E6%89%8B%E7%9A%84%E8%87%AA%E7%84%B6%E8%AA%9E%E8%A8%80%E5%B7%A5%E5%85%B7%E7%AE%B1-%E6%8E%A2%E7%B4%A2%E7%AF%87-2010fd7c7540"]},{"cell_type":"markdown","metadata":{"id":"jDkfbJIcEq6Z","colab_type":"text"},"source":["## 以機械學習做PTT電影版情義分析\n"]},{"cell_type":"markdown","metadata":{"id":"3YY2hXKnuPwv","colab_type":"text"},"source":["- 以下來自[Python 網路爬蟲與資料分析入門實戰 ](https://www.tenlong.com.tw/products/9789864343386)\n","- 2019年10月出版的新書,裡面很多台灣在地的爬蟲應用教學,[github](https://github.com/willismax/py-scraping-analysis-book)也有該書的程式碼,可以先試試看。\n","- 此例為以PTT電影版關鍵字輸入影片名稱做舉例,以「好雷、負雷」做分類,以機械學習方式,將各文章內文詞斷詞,並預測分類結果"]},{"cell_type":"code","metadata":{"trusted":true,"id":"Vk-XzzJbEq6a","colab_type":"code","outputId":"0177f68e-7cd0-4704-9a03-b70b04f97cd1","colab":{}},"source":["import requests\n","import re\n","import csv\n","from bs4 import BeautifulSoup\n","\n","\n","PTT_URL = 'https://www.ptt.cc'\n","\n","\n","def get_articles(url):\n"," resp = requests.get(\n"," url=url,\n"," cookies={'over18': '1'} # 告知 Server 已回答過滿 18 歲的問題\n"," )\n"," soup = BeautifulSoup(resp.text, 'html5lib')\n"," prev_link = soup.find('div', 'btn-group-paging').find_all('a')[1]\n"," # 若 有 href 屬性, 代表有上一頁的超連結\n"," prev_link = prev_link['href'] if 'href' in prev_link.attrs else None\n","\n"," # 巡覽每一篇文章所在區塊\n"," positive = []\n"," negative = []\n"," for div in soup.find_all('div', 'r-ent'):\n"," href = div.find('div', 'title').a['href']\n"," title = div.find('div', 'title').text.strip()\n"," # 若標題為 [] 開頭, e.g., [好雷] 星際大戰八-各種元素集於一身\n"," if re.match('\\[.*\\]', title):\n"," tag = re.match('\\[.*\\]', title).group(0)\n"," # 標籤內含'好'為好評; 含'負'或'爛'為負評\n"," if '好' in tag:\n"," positive.append([title, href])\n"," if '爛' in tag or '負' in tag:\n"," negative.append([title, href])\n"," return prev_link, positive, negative\n","\n","\n","if __name__ == '__main__':\n"," start_url = PTT_URL + '/bbs/movie/search?q=黑豹'\n"," positive_posts, negative_posts = [], []\n"," prev_link, pos, neg = get_articles(start_url)\n"," positive_posts += pos\n"," negative_posts += neg\n"," while prev_link:\n"," url = PTT_URL + prev_link\n"," prev_link, pos, neg = get_articles(url)\n"," positive_posts += pos\n"," negative_posts += neg\n"," print(len(positive_posts), positive_posts[:3])\n"," print(len(negative_posts), negative_posts[:3])\n","\n"," with open('./data/mov_pos.csv', 'w', encoding='utf-8', newline='') as f:\n"," writer = csv.writer(f)\n"," writer.writerow(['title', 'href'])\n"," writer.writerows(positive_posts)\n","\n"," with open('./data/mov_neg.csv', 'w', encoding='utf-8', newline='') as f:\n"," writer = csv.writer(f)\n"," writer.writerow(['title', 'href'])\n"," writer.writerows(negative_posts)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["60 [['[好雷] 黑豹 --其實蠻好看的', '/bbs/movie/M.1552276420.A.F6F.html'], ['[好雷?] 黑豹 自慰片的新高度', '/bbs/movie/M.1545317279.A.EFC.html'], ['[二刷好雷] 水行俠真的不是黑豹', '/bbs/movie/M.1545065816.A.46A.html']]\n","27 [['[負雷]黑豹-失衡的烏托邦', '/bbs/movie/M.1529245622.A.AF5.html'], ['[負雷] 四不像的黑豹', '/bbs/movie/M.1527918611.A.56C.html'], ['[微負雷] 黑豹有點讓人失望....', '/bbs/movie/M.1527839684.A.EF0.html']]\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"trusted":true,"id":"4jvcwe9SEq6d","colab_type":"code","outputId":"80770dcb-32c3-4418-9873-b4af0026cf8a","colab":{}},"source":["import csv\n","import requests\n","import re\n","import json\n","import time\n","from bs4 import BeautifulSoup\n","\n","\n","PTT_URL = 'https://www.ptt.cc'\n","\n","\n","def sanitize(txt):\n"," # 保留英數字, 中文 (\\u4e00-\\u9fa5) 及中文標點, 部分特殊符號\n"," # http://ubuntu-rubyonrails.blogspot.com/2009/06/unicode.html\n"," expr = re.compile('[^\\u4e00-\\u9fa5。;,:“”()、?「」『』【】\\s\\w:/\\-.()]') # ^ 表示\"非括號內指定的字元\"\n"," txt = re.sub(expr, '', txt)\n"," txt = re.sub('[。;,:“”()、?「」『』【】:/\\-_.()]', ' ', txt) # 用空白取代中英文標點\n"," txt = re.sub('(\\s)+', ' ', txt) # 用單一空白取代多個換行或 tab 符號\n"," txt = txt.replace('--', '')\n"," txt = txt.lower() # 英文字轉為小寫\n"," return txt\n","\n","\n","def get_post(url):\n"," resp = requests.get(\n"," url=url,\n"," cookies={'over18': '1'} # 告知 Server 已回答過滿 18 歲的問題\n"," )\n"," soup = BeautifulSoup(resp.text, 'html5lib')\n"," main_content = soup.find('div', id='main-content')\n","\n"," # 把非本文的部份 (標題區及推文區) 移除\n"," # 移除標題區塊\n"," for meta in main_content.find_all('div', 'article-metaline'):\n"," meta.extract()\n"," for meta in main_content.find_all('div', 'article-metaline-right'):\n"," meta.extract()\n"," # 移除推文區塊\n"," for push in main_content.find_all('div', 'push'):\n"," push.extract()\n","\n"," parsed = []\n"," for txt in main_content.stripped_strings:\n"," # 移除 '※ 發信站:', '--' 開頭, 及本文區最後一行文章網址部份\n"," if txt[0] == '※' or txt[:2] == '--' or url in txt:\n"," continue\n"," txt = sanitize(txt)\n"," if txt:\n"," parsed.append(txt)\n"," return ' '.join(parsed)\n","\n","\n","def get_article_body(csv_file):\n"," id_to_body = {}\n"," with open(csv_file, 'r', encoding='utf-8') as f:\n"," reader = csv.DictReader(f)\n"," for row in reader:\n"," print('處理', row['title'], row['href'])\n"," title = ' '.join(row['title'].split(']')[1:])\n"," title = sanitize(title)\n"," body = get_post(PTT_URL + row['href'])\n"," id_to_body[row['href']] = title + ' ' + body # 以文章超連結為 key, 標題 + 本文為 value\n"," time.sleep(1) # 放慢爬蟲速度\n"," return id_to_body\n","\n","\n","if __name__ == '__main__':\n"," d1 = get_article_body('./data/mov_pos.csv')\n"," d2 = get_article_body('./data/mov_neg.csv')\n"," id_to_body = {**d1, **d2} # 將兩個 dict 合併為一個\n"," with open('./data/id_to_body.json', 'w', encoding='utf-8') as f:\n"," json.dump(id_to_body, f, indent=2, ensure_ascii=False)"],"execution_count":0,"outputs":[{"output_type":"stream","text":["處理 [好雷] 黑豹 --其實蠻好看的 /bbs/movie/M.1552276420.A.F6F.html\n","處理 [好雷?] 黑豹 自慰片的新高度 /bbs/movie/M.1545317279.A.EFC.html\n","處理 [二刷好雷] 水行俠真的不是黑豹 /bbs/movie/M.1545065816.A.46A.html\n","處理 [好雷] 盲點:關於《黑豹》的奧克蘭也關於你我的故事 /bbs/movie/M.1538060300.A.8FD.html\n","處理 [好雷] 瘋狂亞洲富豪─絕不是新加坡黑豹 /bbs/movie/M.1536676591.A.C8F.html\n","處理 [微好雷]《雷神索爾3諸神黃昏》&《黑豹》 /bbs/movie/M.1536217925.A.B6E.html\n","處理 [好雷] 比黑豹好看的蟻人與黃蜂女 /bbs/movie/M.1531217457.A.6C8.html\n","處理 [好雷]黑豹 — 符合台灣政治的隱喻分析 /bbs/movie/M.1525369009.A.CFA.html\n","處理 [好雷] 黑豹:一部政治寓言 /bbs/movie/M.1521128014.A.19B.html\n","處理 [好雷]黑豹,好看但可惜的反派 /bbs/movie/M.1520467155.A.8E5.html\n","處理 [好雷] 黑豹心得 /bbs/movie/M.1519920385.A.14A.html\n","處理 [ 好雷] 唯一缺憾的黑豹 /bbs/movie/M.1519908058.A.B99.html\n","處理 [好雷] 黑豹 感想 微負評 /bbs/movie/M.1519841715.A.3BF.html\n","處理 [核心好雷] 黑豹-科技與部落的反差萌 人權與人道議題 /bbs/movie/M.1519812809.A.DC3.html\n","處理 [普好雷] 立體的世界,被一掌拍平,淺談【黑豹】 /bbs/movie/M.1519787493.A.BFD.html\n","處理 [有意見好雷] 比上沒得比,比下有餘的黑豹 /bbs/movie/M.1519734682.A.979.html\n","處理 [好雷] 黑豹 ~ 屬於黑人的童話故事 /bbs/movie/M.1519618436.A.EA0.html\n","處理 [ 好 雷] 黑豹 /bbs/movie/M.1519572758.A.D3C.html\n","處理 [好雷]黑豹 王者的抉擇 /bbs/movie/M.1519544126.A.E20.html\n","處理 [好雷] 黑豹就真的很好看咩~ /bbs/movie/M.1519356550.A.A17.html\n","處理 [ 普好雷] 黑豹 /bbs/movie/M.1519142286.A.7CD.html\n","處理 [好雷] 黑豹 /bbs/movie/M.1519070229.A.DE9.html\n","處理 [好雷] 黑豹:王者路大不易 /bbs/movie/M.1519042966.A.803.html\n","處理 [微好雷]《黑豹》 春秋五霸的基本套路 /bbs/movie/M.1518991149.A.BDD.html\n","處理 [好雷]黑豹 不太一樣的超級英雄 /bbs/movie/M.1518963410.A.A2E.html\n","處理 [好雷] 黑豹 其實我比較喜歡反派@@ /bbs/movie/M.1518939706.A.D48.html\n","處理 [好雷] 黑豹 Black Panther,非洲未來主義 /bbs/movie/M.1518893529.A.3AD.html\n","處理 [好雷] 黑豹: 智者造橋,愚者築牆 /bbs/movie/M.1518860486.A.B45.html\n","處理 [微好雷] 黑豹的電影配樂 /bbs/movie/M.1518848376.A.BBB.html\n","處理 [好無雷] 黑豹 /bbs/movie/M.1518802967.A.D32.html\n","處理 [好雷] 黑豹 沒有大家說的那麼糟啦 /bbs/movie/M.1518793606.A.036.html\n","處理 [好雷] 黑豹 /bbs/movie/M.1518778797.A.C52.html\n","處理 [普好雷] 黑豹:Do you know da way? /bbs/movie/M.1518769142.A.229.html\n","處理 [普好雷] 黑豹 來談談角色動機吧 /bbs/movie/M.1518703695.A.0DC.html\n","處理 [好雷]YOU SHOW OFF?——《黑豹》無&有雷推薦 /bbs/movie/M.1518641125.A.57A.html\n","處理 [普好雷] 覺得黑豹的編劇陷入一種兩難的情況 /bbs/movie/M.1518632731.A.55E.html\n","處理 [好笑雷] 黑豹 /bbs/movie/M.1518627430.A.ADF.html\n","處理 [好雷] 黑豹 耳目一新的英雄電影 /bbs/movie/M.1518617014.A.223.html\n","處理 [好雷] 黑豹 最沒有在看爽的漫威片 /bbs/movie/M.1518584186.A.CA8.html\n","處理 [偏好雷] 看完黑豹我再重看一次預告片發現... /bbs/movie/M.1518541335.A.C93.html\n","處理 [好雷] 黑豹 國王成長日記 /bbs/movie/M.1518538495.A.43A.html\n","處理 [好雷] 黑豹 世界觀科技感超讚 但故事有些許bug /bbs/movie/M.1518537228.A.F8C.html\n","處理 [超爆幹好雷] 黑豹-名副其實漫威最佳 /bbs/movie/M.1518536561.A.C1C.html\n","處理 [普好雷] 黑豹 /bbs/movie/M.1518535690.A.7CD.html\n","處理 [好雷]【黑豹】非洲先進文明卓然而立 /bbs/movie/M.1518521887.A.244.html\n","處理 [普好雷] 充滿新風格的黑豹 /bbs/movie/M.1518517571.A.DD1.html\n","處理 [好雷] 不太懂有關黑豹反派的幾件事情 /bbs/movie/M.1518512928.A.747.html\n","處理 [好雷] 小細節怪怪的黑豹 /bbs/movie/M.1518511893.A.636.html\n","處理 [好雷] 黑豹 /bbs/movie/M.1518507928.A.ABD.html\n","處理 [好無雷] 黑豹 /bbs/movie/M.1518505340.A.C9A.html\n","處理 [好雷] 黑豹觀後討論 /bbs/movie/M.1518501262.A.9AC.html\n","處理 [普好雷] 黑豹 優秀的漫威宇宙擴展片 /bbs/movie/M.1518500618.A.932.html\n","處理 [極好無雷] 這就是我要的黑豹! /bbs/movie/M.1518499054.A.561.html\n","處理 [ 好無雷] 黑豹 絕對值得一看 /bbs/movie/M.1518496440.A.BE1.html\n","處理 [好無雷] 大推推黑豹電影 /bbs/movie/M.1518495524.A.882.html\n","處理 [好雷] 黑豹,劇情普通,美術太完美 /bbs/movie/M.1518495424.A.F0B.html\n","處理 [好雷] 黑豹 /bbs/movie/M.1518494060.A.826.html\n","處理 [ 好雷] 美國隊長3 黑豹疑問 /bbs/movie/M.1462012717.A.D23.html\n","處理 [好雷]美隊3-關於黑豹以及各英雄的立場 /bbs/movie/M.1461921066.A.E77.html\n","處理 [好雷] 美國隊長3 黑豹家世 /bbs/movie/M.1461748087.A.515.html\n","處理 [負雷]黑豹-失衡的烏托邦 /bbs/movie/M.1529245622.A.AF5.html\n","處理 [負雷] 四不像的黑豹 /bbs/movie/M.1527918611.A.56C.html\n","處理 [微負雷] 黑豹有點讓人失望.... /bbs/movie/M.1527839684.A.EF0.html\n","處理 [負雷] 是不是有精神分裂的黑豹 /bbs/movie/M.1526734538.A.5E2.html\n","處理 [負雷] 黑豹 /bbs/movie/M.1520322497.A.E24.html\n","處理 [負雷]黑豹 最差漫威電影 /bbs/movie/M.1519931938.A.207.html\n","處理 [ 微負雷] 黑豹 /bbs/movie/M.1519721961.A.471.html\n","處理 [負雷] 黑豹 Black Panther,暴力與分享 /bbs/movie/M.1518961786.A.79E.html\n","處理 [ 負雷] 隨便拍拍隨便買單的「黑豹」 /bbs/movie/M.1518934190.A.F7B.html\n","處理 [負雷] 只有政治正確的 黑豹 /bbs/movie/M.1518809750.A.4E0.html\n","處理 [負雷]黑豹:除了美術視覺,其餘毫無魅力 /bbs/movie/M.1518808279.A.FC9.html\n","處理 [睡負無雷] 看黑豹看到睡著 /bbs/movie/M.1518774686.A.889.html\n","處理 [小負雷] 黑豹-請修邊幅可以嗎?(內有雷) /bbs/movie/M.1518761310.A.05C.html\n","處理 [爛無雷] 千萬不要期待的黑豹 /bbs/movie/M.1518758578.A.8F0.html\n","處理 [普負雷] 輸不起的黑豹 /bbs/movie/M.1518714311.A.957.html\n","處理 [ 負雷] 黑豹-鋪陳復3的跳板 /bbs/movie/M.1518684255.A.489.html\n","處理 [ 負雷] 黑豹 /bbs/movie/M.1518670825.A.A31.html\n","處理 [普負雷] 黑豹出戲點 /bbs/movie/M.1518642702.A.497.html\n","處理 [負雷] 黑豹特遣隊 /bbs/movie/M.1518609915.A.835.html\n","處理 [負雷] 家天下黑豹 /bbs/movie/M.1518596627.A.3E0.html\n","處理 [負雷] 失望的黑豹(補個優點) /bbs/movie/M.1518577527.A.4BB.html\n","處理 [負雷] 真的過譽的黑豹......... /bbs/movie/M.1518536086.A.B1E.html\n","處理 [負雷] 黑豹天下 /bbs/movie/M.1518534403.A.ECB.html\n","處理 [ 負雷] 讓人失望的黑豹 /bbs/movie/M.1518529744.A.586.html\n","處理 [普負雷] 一片歐罵罵的黑豹 /bbs/movie/M.1518513719.A.462.html\n","處理 [微負雷] 黑豹 相較其他marvel片有點可惜 /bbs/movie/M.1518510319.A.1DC.html\n","處理 [負雷]黑豹 /bbs/movie/M.1518495315.A.E27.html\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"trusted":true,"id":"FwPB3tpQEq6i","colab_type":"code","outputId":"a429cd93-847e-4107-88c9-ec51937f08ad","colab":{}},"source":["!pip3 install jieba"],"execution_count":0,"outputs":[{"output_type":"stream","text":["Requirement already satisfied: jieba in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.39)\r\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"trusted":true,"id":"EYh46tt6Eq6m","colab_type":"code","outputId":"65c4f8e2-a816-49ec-999a-37d92933423c","colab":{}},"source":["import json\n","import csv\n","import jieba\n","import random\n","from sklearn.feature_extraction.text import CountVectorizer\n","from sklearn.feature_extraction.text import TfidfTransformer\n","from sklearn.linear_model import SGDClassifier\n","from sklearn.metrics import accuracy_score\n","\n","jieba.set_dictionary('./data/dict.txt.big') # 對繁體中文斷詞較準確的字典檔\n","\n","\n","def load_data(a_csv, b_json, label):\n"," a_ids = []\n"," with open(a_csv, 'r', encoding='utf-8') as f:\n"," reader = csv.DictReader(f)\n"," for row in reader:\n"," a_ids.append(row['href'])\n"," with open(b_json, 'r', encoding='utf-8') as f:\n"," id_to_body = json.load(f)\n"," data = []\n"," for a_id in a_ids:\n"," tokenized_post = []\n"," txt = id_to_body[a_id]\n"," for sent in txt.split(): # 將文章以空白隔開\n"," # 斷詞後的結果, 若非空白且長度為 2 以上, 則列入詞庫\n"," filtered = [t for t in jieba.cut(sent) if t.split() and len(t) > 1]\n"," tokenized_post += filtered\n"," data.append((tokenized_post, label))\n"," return data\n","\n","\n","if __name__ == '__main__':\n"," pos_data = load_data('./data/mov_pos.csv', './data/id_to_body.json', '正評')\n"," neg_data = load_data('./data/mov_neg.csv', './data/id_to_body.json', '負評')\n","\n"," '''\n"," # 印出正評與負評文章的前幾個字, 確認資料無誤\n"," for post, label in pos_data[:3]:\n"," print(post[:5], label)\n"," for post, label in neg_data[:3]:\n"," print(post[:5], label)\n"," '''\n","\n"," # 打亂資料順序\n"," random.seed(42)\n"," random.shuffle(pos_data)\n"," random.shuffle(neg_data)\n","\n"," x_train, y_train, x_test, y_test = [], [], [], []\n"," # 前 22 筆資料 (及答案) 放入 training set\n"," # 建立資料時要用空白將斷好的詞重新連成一個字串, 以便之後使用 scikit-learn 建立字典並轉換文字資料為向量\n"," for i in range(10):\n"," x_train.append(' '.join(pos_data[i][0]))\n"," x_train.append(' '.join(neg_data[i][0]))\n"," y_train.append(pos_data[i][1])\n"," y_train.append(neg_data[i][1])\n"," # 最後 5 筆資料 (及答案) 放入 testing set\n","# for i in range(5, len(pos_data)):\n"," for i in range(10, 27):\n"," x_test.append(' '.join(pos_data[i][0]))\n"," x_test.append(' '.join(neg_data[i][0]))\n"," y_test.append(pos_data[i][1])\n"," y_test.append(neg_data[i][1])\n","\n"," vectorizer = CountVectorizer()\n"," x_train = vectorizer.fit_transform(x_train)\n"," transformer = TfidfTransformer()\n"," x_train = transformer.fit_transform(x_train)\n"," clf = SGDClassifier(random_state=42)\n"," clf.fit(x_train, y_train)\n"," x_test = vectorizer.transform(x_test)\n"," x_test = transformer.transform(x_test)\n"," y_pred = clf.predict(x_test)\n"," print('預測結果:', list(y_pred))\n"," print('正確答案:', y_test)\n"," print('正確率:', accuracy_score(y_test, y_pred))\n","\n"," # 測試自己輸入的句子\n"," sentences = [\n"," '我 覺得 這部 電影 還 不錯',\n"," '這部 片 應該 可以 更好 才對'\n"," ]\n"," analyze = vectorizer.build_analyzer()\n"," print(analyze(sentences[0]))\n"," print(analyze(sentences[1]))\n","\n"," custom_data = transformer.transform(vectorizer.transform(sentences))\n"," print(clf.predict(custom_data))"],"execution_count":0,"outputs":[{"output_type":"stream","text":["Building prefix dict from /home/nbuser/library/lesson/data/dict.txt.big ...\n","Loading model from cache /tmp/jieba.u863534f77a5b7aa5dc55e7aac03546ba.cache\n","Loading model cost 7.137 seconds.\n","Prefix dict has been built succesfully.\n"],"name":"stderr"},{"output_type":"stream","text":["預測結果: ['正評', '負評', '正評', '正評', '負評', '負評', '負評', '正評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '正評', '正評', '負評', '負評', '負評', '負評', '正評', '正評', '負評', '正評', '正評', '正評', '負評', '負評', '負評', '負評', '負評', '正評', '正評']\n","正確答案: ['正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評', '正評', '負評']\n","正確率: 0.6470588235294118\n","['覺得', '這部', '電影', '不錯']\n","['這部', '應該', '可以', '更好', '才對']\n","['負評' '正評']\n"],"name":"stdout"},{"output_type":"stream","text":["/home/nbuser/anaconda3_501/lib/python3.6/site-packages/sklearn/linear_model/stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.\n"," \"and default tol will be 1e-3.\" % type(self), FutureWarning)\n"],"name":"stderr"}]},{"cell_type":"code","metadata":{"trusted":true,"id":"YB1FrmgIEq6p","colab_type":"code","outputId":"88b6129c-1482-4ab1-f849-ca0a9d3ccbe1","colab":{}},"source":["len(neg_data)"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["27"]},"metadata":{"tags":[]},"execution_count":15}]},{"cell_type":"markdown","metadata":{"id":"ETbAeMnUhYaR","colab_type":"text"},"source":["## 以RNN做情意分析"]},{"cell_type":"markdown","metadata":{"id":"yyWHWjb0higr","colab_type":"text"},"source":["- RNN 是一種「有記憶」的神經網路, 具有處理時間序列的特性, 或是不定長度的輸入資料。\n","\n","- 以RNN做電影評論的「情意分析」正/負評為例,"]},{"cell_type":"code","metadata":{"id":"YT2JL4N4i4go","colab_type":"code","colab":{"base_uri":"https://localhost:8080/","height":34},"outputId":"3ee315ca-afae-45f2-8d1e-e8684bc6e42b","executionInfo":{"status":"ok","timestamp":1577511468058,"user_tz":-480,"elapsed":1778,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}}},"source":["%tensorflow_version 2.x"],"execution_count":1,"outputs":[{"output_type":"stream","text":["TensorFlow 2.x selected.\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"sf1pRzxBh4Zp","colab_type":"code","colab":{}},"source":["%matplotlib inline\n","\n","import numpy as np\n","import matplotlib.pyplot as plt"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"SwodE5dTiHEa","colab_type":"text"},"source":["### 讀入資料-IMDB電影數據"]},{"cell_type":"markdown","metadata":{"id":"b9BZwpL2iaqi","colab_type":"text"},"source":["- 讀取資料,並限制只選用1萬字。\n","- 如果要客製化下載結果,建議一定要看官方文件唷,譬如[tf.keras.datasets.imdb](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data?hl=zh-TW&version=stable)"]},{"cell_type":"code","metadata":{"id":"nh6JE5eaiPNX","colab_type":"code","colab":{"base_uri":"https://localhost:8080/","height":51},"outputId":"04db5623-9369-4ffc-e614-9f715b3a8cb5","executionInfo":{"status":"ok","timestamp":1577511491188,"user_tz":-480,"elapsed":14946,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}}},"source":["from tensorflow.keras.datasets import imdb\n","(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)"],"execution_count":3,"outputs":[{"output_type":"stream","text":["Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz\n","17465344/17464789 [==============================] - 1s 0us/step\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"Hr_FLz9pioQO","colab_type":"code","outputId":"744f8913-1e7c-4fb5-951f-9a7c80a8860d","executionInfo":{"status":"ok","timestamp":1577511498291,"user_tz":-480,"elapsed":1048,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":51}},"source":["print('訓練總筆數:', len(x_train))\n","print('測試總筆數:', len(x_test))"],"execution_count":4,"outputs":[{"output_type":"stream","text":["訓練總筆數: 25000\n","測試總筆數: 25000\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"01IxuLcnjKQz","colab_type":"text"},"source":["#### 觀察資料"]},{"cell_type":"markdown","metadata":{"id":"YqbvXs5mjp_1","colab_type":"text"},"source":["- 資料內容為影評文字,而儲存資料型態是list而不是array,原因是每筆資料 (每段影評) 長度不一樣。\n","- 在每筆輸入資料的數字都代表英文的一個單字。編號方式是在我們資料庫裡所有文字的排序: 也就是出現頻率越高, 代表的數字就越小。"]},{"cell_type":"code","metadata":{"id":"V0W3RCBZjN0V","colab_type":"code","colab":{"base_uri":"https://localhost:8080/","height":1000},"outputId":"2c8d4cc6-eb54-466d-d067-affc3a6bae34","executionInfo":{"status":"ok","timestamp":1577511765671,"user_tz":-480,"elapsed":948,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}}},"source":["x_train[18763]"],"execution_count":6,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[1,\n"," 6915,\n"," 4272,\n"," 4,\n"," 6352,\n"," 2846,\n"," 9,\n"," 209,\n"," 6,\n"," 824,\n"," 381,\n"," 23,\n"," 86,\n"," 650,\n"," 94,\n"," 6,\n"," 2821,\n"," 821,\n"," 948,\n"," 5,\n"," 1193,\n"," 168,\n"," 83,\n"," 4,\n"," 465,\n"," 499,\n"," 7,\n"," 406,\n"," 876,\n"," 14,\n"," 20,\n"," 214,\n"," 208,\n"," 8,\n"," 4,\n"," 213,\n"," 25,\n"," 203,\n"," 30,\n"," 536,\n"," 51,\n"," 213,\n"," 4,\n"," 213,\n"," 9,\n"," 8,\n"," 4787,\n"," 2,\n"," 7,\n"," 43,\n"," 1572,\n"," 567,\n"," 5,\n"," 599,\n"," 14,\n"," 20,\n"," 47,\n"," 49,\n"," 599,\n"," 53,\n"," 42,\n"," 329,\n"," 43,\n"," 8621,\n"," 6,\n"," 372,\n"," 8850,\n"," 50,\n"," 26,\n"," 66,\n"," 64,\n"," 342,\n"," 2,\n"," 15,\n"," 100,\n"," 30,\n"," 1192,\n"," 599,\n"," 637,\n"," 376,\n"," 25,\n"," 31,\n"," 155,\n"," 151,\n"," 6915,\n"," 4272,\n"," 4,\n"," 6352,\n"," 2846,\n"," 166,\n"," 7772,\n"," 168,\n"," 40,\n"," 2,\n"," 890,\n"," 48,\n"," 25,\n"," 197,\n"," 7772,\n"," 16,\n"," 6,\n"," 932,\n"," 1770,\n"," 1193,\n"," 1789,\n"," 509,\n"," 95,\n"," 25,\n"," 2808,\n"," 110,\n"," 4,\n"," 320,\n"," 7,\n"," 12,\n"," 366,\n"," 874,\n"," 110,\n"," 6915,\n"," 4272,\n"," 4,\n"," 6352,\n"," 2846,\n"," 10,\n"," 10,\n"," 20,\n"," 675,\n"," 2241,\n"," 457,\n"," 599,\n"," 2241,\n"," 158,\n"," 10,\n"," 10,\n"," 6915,\n"," 4272,\n"," 4,\n"," 6352,\n"," 2846,\n"," 5444,\n"," 693]"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"code","metadata":{"id":"yiNn1tbpje_S","colab_type":"code","outputId":"81c0888a-7c94-4905-bb4e-49580b69bd1e","executionInfo":{"status":"ok","timestamp":1577511788735,"user_tz":-480,"elapsed":902,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["len(x_train[18763])"],"execution_count":10,"outputs":[{"output_type":"execute_result","data":{"text/plain":["140"]},"metadata":{"tags":[]},"execution_count":10}]},{"cell_type":"code","metadata":{"id":"6moMYkpykYL3","colab_type":"code","outputId":"195146c1-9875-4be7-b1d7-34333bee2957","executionInfo":{"status":"ok","timestamp":1577511814997,"user_tz":-480,"elapsed":937,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["# 1為正評、0為負評\n","y_train[:10]"],"execution_count":12,"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([1, 0, 0, 1, 0, 0, 1, 0, 1, 0])"]},"metadata":{"tags":[]},"execution_count":12}]},{"cell_type":"code","metadata":{"id":"8nGcR8bykQfK","colab_type":"code","outputId":"31dbaf93-6ac2-4b28-dbf2-3dc380f4cd6d","executionInfo":{"status":"ok","timestamp":1577285931274,"user_tz":-480,"elapsed":1753,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["y_train[18763]"],"execution_count":0,"outputs":[{"output_type":"execute_result","data":{"text/plain":["1"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"markdown","metadata":{"id":"uAPX96dTqaOI","colab_type":"text"},"source":["### 輸入至RNN"]},{"cell_type":"markdown","metadata":{"id":"b3XqSZIRqjr5","colab_type":"text"},"source":["- RNN可以處理不同長度的輸入,但還是要注意以下原則\n"," - 設定輸入文字長度上限\n"," - 每段文字要一樣長,短的後面補0"]},{"cell_type":"code","metadata":{"id":"TsGAEOzcq18C","colab_type":"code","colab":{}},"source":["from tensorflow.keras.preprocessing import sequence"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"iBJet6xcq2r3","colab_type":"code","colab":{}},"source":["#以tf.keras.preprocessing.sequence.pad_sequences將序列填充到相同的長度。\n","\n","x_train = sequence.pad_sequences(x_train, maxlen=150)\n","x_test = sequence.pad_sequences(x_test, maxlen=150)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"EYuingDNq56H","colab_type":"code","outputId":"6f5c7211-f14c-4dfb-fd5d-da34acdcd23b","executionInfo":{"status":"ok","timestamp":1577512443154,"user_tz":-480,"elapsed":915,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["x_train.shape"],"execution_count":17,"outputs":[{"output_type":"execute_result","data":{"text/plain":["(25000, 150)"]},"metadata":{"tags":[]},"execution_count":17}]},{"cell_type":"markdown","metadata":{"id":"a5hRs_GJrA1y","colab_type":"text"},"source":["### 建立RNN"]},{"cell_type":"markdown","metadata":{"id":"dczRXK9arFet","colab_type":"text"},"source":["這裡我們選用 LSTM,決定神經網路架構\n","- 先將 10000 維的文字壓到 N 維\n","- 然後用 K 個 LSTM 神經元做隱藏層\n","- 最後一個 output, 直接用 sigmoid 送出\n"]},{"cell_type":"markdown","metadata":{"id":"K6bHB5dBrdeu","colab_type":"text"},"source":["建構我們的神經網路\n","- 文字我們用 1-hot 表示是很標準的方式,不過指定要 1 萬個字, 所以每個字是用 1 萬維的向量表示! 這一來很浪費記憶空間, 二來字和字間基本上是沒有關係的。我們可以用某種「合理」的方式, 把字壓到比較小的維度, 這些向量又代表某些意思 (比如說兩個字代表的向量角度小表相關程度大) 等等。\n","\n","- 這聽來很複雜的事叫 \"word embedding\", 而事實上 Keras 會幫我們做。我們只需告訴 Keras 原來最大的數字是多少 (10000), 還有我們打算壓到幾維 (N)。"]},{"cell_type":"code","metadata":{"id":"W-MNfPFNrwWD","colab_type":"code","colab":{}},"source":["N = 3 # 文字要壓到 N 維\n","K = 4 # LSTM 有 K 個神經元"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"QSjFqVwOrynw","colab_type":"code","colab":{}},"source":["from tensorflow.keras.models import Sequential\n","from tensorflow.keras.layers import Dense, Embedding\n","from tensorflow.keras.layers import LSTM\n","\n","model = Sequential()\n","model.add(Embedding(10000, N))\n","model.add(LSTM(K))\n","model.add(Dense(1, activation='sigmoid'))\n","model.compile(loss='binary_crossentropy',\n"," optimizer='adam',\n"," metrics=['accuracy'])"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"yFyfgg4zsLI7","colab_type":"code","outputId":"28be6a89-7f66-4364-ce5f-6f3b4adab48e","executionInfo":{"status":"ok","timestamp":1577512573352,"user_tz":-480,"elapsed":1049,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":255}},"source":["model.summary()"],"execution_count":20,"outputs":[{"output_type":"stream","text":["Model: \"sequential\"\n","_________________________________________________________________\n","Layer (type) Output Shape Param # \n","=================================================================\n","embedding (Embedding) (None, None, 3) 30000 \n","_________________________________________________________________\n","lstm (LSTM) (None, 4) 128 \n","_________________________________________________________________\n","dense (Dense) (None, 1) 5 \n","=================================================================\n","Total params: 30,133\n","Trainable params: 30,133\n","Non-trainable params: 0\n","_________________________________________________________________\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"zfRwhgDWs7kT","colab_type":"text"},"source":["### 訓練模型"]},{"cell_type":"markdown","metadata":{"id":"CsCvYqT9tAg9","colab_type":"text"},"source":["我們用的 embedding 中, 會被 batch_size 影響輸入。輸入的 shape 會是\n","\n","(batch_size, 每筆上限)\n","也就是 (32,100) 輸出是 (32,100,128), 其中 128 是我們決定要壓成幾維的向量。"]},{"cell_type":"code","metadata":{"id":"GintFrbLtKPz","colab_type":"code","outputId":"5d1aec3d-0817-4f03-8a82-22a615d082aa","executionInfo":{"status":"ok","timestamp":1577512664150,"user_tz":-480,"elapsed":53253,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":221}},"source":["model.fit(x_train, y_train,\n"," batch_size=32,\n"," epochs=5)"],"execution_count":21,"outputs":[{"output_type":"stream","text":["Train on 25000 samples\n","Epoch 1/5\n","25000/25000 [==============================] - 17s 662us/sample - loss: 0.5243 - accuracy: 0.7430\n","Epoch 2/5\n","25000/25000 [==============================] - 9s 357us/sample - loss: 0.3154 - accuracy: 0.8781\n","Epoch 3/5\n","25000/25000 [==============================] - 9s 359us/sample - loss: 0.2422 - accuracy: 0.9122\n","Epoch 4/5\n","25000/25000 [==============================] - 9s 355us/sample - loss: 0.2013 - accuracy: 0.9308\n","Epoch 5/5\n","25000/25000 [==============================] - 9s 362us/sample - loss: 0.1725 - accuracy: 0.9436\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/plain":[""]},"metadata":{"tags":[]},"execution_count":21}]},{"cell_type":"markdown","metadata":{"id":"rS9yjBjvtZ6a","colab_type":"text"},"source":["### 檢視結果"]},{"cell_type":"code","metadata":{"id":"DLZfgieWtfbZ","colab_type":"code","outputId":"11a845be-f4eb-458a-a697-3e837b519556","executionInfo":{"status":"ok","timestamp":1577512760260,"user_tz":-480,"elapsed":5997,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":34}},"source":["score = model.evaluate(x_test, y_test)"],"execution_count":22,"outputs":[{"output_type":"stream","text":["25000/25000 [==============================] - 5s 188us/sample - loss: 0.3997 - accuracy: 0.8476\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"qX6b_F05tlZi","colab_type":"code","outputId":"66a9bb1b-e618-47fe-86b9-c4b09b2571c3","executionInfo":{"status":"ok","timestamp":1577512763355,"user_tz":-480,"elapsed":948,"user":{"displayName":"陳宇威","photoUrl":"","userId":"00090422027206355406"}},"colab":{"base_uri":"https://localhost:8080/","height":51}},"source":["print(f'測試資料的 loss = {score[0]}')\n","print(f'測試資正確率 = {score[1]}')"],"execution_count":23,"outputs":[{"output_type":"stream","text":["測試資料的 loss = 0.3996563444042206\n","測試資正確率 = 0.8475599884986877\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"E2_iZuF-tnrU","colab_type":"text"},"source":["### 儲存結果"]},{"cell_type":"code","metadata":{"id":"sADazycattBx","colab_type":"code","colab":{}},"source":["model_json = model.to_json()\n","open('imdb_model_arch.json','w').write(model_json)\n","model.save_weights('imdb_model_weights.h5')"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"trusted":true,"id":"Anby1dWVEq6s","colab_type":"text"},"source":["### refecence\n","- [手把手教你如何用 Python 做情感分析](https://www.itread01.com/articles/1498721884.html)\n","- [Python 網路爬蟲與資料分析入門實戰 ](https://www.tenlong.com.tw/products/9789864343386)\n","- [RNN做情意分析.ipynb](https://github.com/yenlung/AI_Math/blob/master/09.%20%E7%94%A8RNN%E5%81%9A%E6%83%85%E6%84%8F%E5%88%86%E6%9E%90.ipynb)\n","- [利用Keras建構LSTM模型,以Stock Prediction 為例](https://medium.com/@daniel820710/%E5%88%A9%E7%94%A8keras%E5%BB%BA%E6%A7%8Blstm%E6%A8%A1%E5%9E%8B-%E4%BB%A5stock-prediction-%E7%82%BA%E4%BE%8B-1-67456e0a0b)"]}]} -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 willismax 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /MQTT_Demo_Publisher.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "MQTT_Demo_Publisher.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [ 9 | "Xu73azdMRLz8" 10 | ], 11 | "authorship_tag": "ABX9TyPC6dptnaUw4uRCvMEN8lpb", 12 | "include_colab_link": true 13 | }, 14 | "kernelspec": { 15 | "name": "python3", 16 | "display_name": "Python 3" 17 | }, 18 | "language_info": { 19 | "name": "python" 20 | } 21 | }, 22 | "cells": [ 23 | { 24 | "cell_type": "markdown", 25 | "metadata": { 26 | "id": "view-in-github", 27 | "colab_type": "text" 28 | }, 29 | "source": [ 30 | "\"Open" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "source": [ 36 | "|[簡報](https://hackmd.io/@wiimax/ict_mqtt#/)|[Publisher](https://colab.research.google.com/drive/1DAYzTMHmv0X0FqIknvelkcKoVjZgCA3W)|[Subscriber](https://colab.research.google.com/drive/1UcUqnFHTx_DhmJd80WTjwcRql2KVt63W)\n", 37 | "|:-:|:-:|:-:\n", 38 | "|![](https://i.imgur.com/TFkKeuO.png)|![](https://i.imgur.com/uJdlGqo.png)|![](https://i.imgur.com/WVlVek0.png)" 39 | ], 40 | "metadata": { 41 | "id": "pw4jPKxPObAz" 42 | } 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "source": [ 47 | "## MQTT 實作\n", 48 | "\n", 49 | "- 需要 MQTT中間的**Broker**: 公用的\n", 50 | "- 需要發布訊息的**Publisher**\n", 51 | "- 需要接收訊息的**Subscribe**" 52 | ], 53 | "metadata": { 54 | "id": "OXqP3upSREed" 55 | } 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "source": [ 60 | "#### 熱門的公共 MQTT 伺服器:\n", 61 | "\n", 62 | "![](https://i.imgur.com/GboMca8.png)" 63 | ], 64 | "metadata": { 65 | "id": "cgWSnkP2P8CI" 66 | } 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "source": [ 71 | "## Publishe" 72 | ], 73 | "metadata": { 74 | "id": "1etdBS3dRSCz" 75 | } 76 | }, 77 | { 78 | "cell_type": "code", 79 | "source": [ 80 | "!pip install paho-mqtt" 81 | ], 82 | "metadata": { 83 | "id": "HuvDIuzz-tX2" 84 | }, 85 | "execution_count": null, 86 | "outputs": [] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "source": [ 91 | "%%writefile mqtt_pub.py\n", 92 | "# Publishe\n", 93 | "\n", 94 | "import paho.mqtt.client as mqtt\n", 95 | "from random import randint\n", 96 | "import json\n", 97 | "import datetime\n", 98 | "import time\n", 99 | "\n", 100 | "client = mqtt.Client()\n", 101 | "\n", 102 | "# broker = 'broker.hivemq.com'\n", 103 | "# broker = '172.20.10.14'\n", 104 | "broker = 'broker.emqx.io'\n", 105 | "\n", 106 | "client.connect(broker, 1883, keepalive=60)\n", 107 | "\n", 108 | "client.loop_start()\n", 109 | "\n", 110 | "while True:\n", 111 | " topic = \"tcnr/class/n01/sensor01\"\n", 112 | " temp = randint(18,32)\n", 113 | " now_time = datetime.datetime.now().strftime('%m/%d %H:%M:%S')\n", 114 | " message = {'Temperature': temp, 'Time': now_time}\n", 115 | "\n", 116 | " client.publish(topic, json.dumps(message), qos=0)\n", 117 | " print(f'topic: {topic}, message: {message}')\n", 118 | " time.sleep(5)\n" 119 | ], 120 | "metadata": { 121 | "id": "8UOT-Ld1ROty", 122 | "colab": { 123 | "base_uri": "https://localhost:8080/" 124 | }, 125 | "outputId": "0c6dcf98-cecb-446c-a561-356a1e3e5a8d" 126 | }, 127 | "execution_count": null, 128 | "outputs": [ 129 | { 130 | "output_type": "stream", 131 | "name": "stdout", 132 | "text": [ 133 | "Writing mqtt_pub.py\n" 134 | ] 135 | } 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "source": [ 141 | "# !python mqtt_pub.py" 142 | ], 143 | "metadata": { 144 | "id": "K_26ARXMkyCo", 145 | "colab": { 146 | "base_uri": "https://localhost:8080/" 147 | }, 148 | "outputId": "fa2c1d6c-f39d-4b88-af8d-b001e6bf957b" 149 | }, 150 | "execution_count": null, 151 | "outputs": [ 152 | { 153 | "output_type": "stream", 154 | "name": "stdout", 155 | "text": [ 156 | "topic: tcnr/class/n01/sensor01, message: {'Temperature': 20, 'Time': '03/22 06:35:52'}\n", 157 | "topic: tcnr/class/n01/sensor01, message: {'Temperature': 23, 'Time': '03/22 06:35:57'}\n", 158 | "\n" 159 | ] 160 | } 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "source": [ 166 | "### 練習\n", 167 | "- 修改`topic`主題,改成`\"tcnr/class/你的座號/sensor01\"`\n", 168 | "- 修改Broker為hivemq,記得Publisher與Subscribe要同步修改。\n", 169 | "- 修改訊息的QoS為1,意思是Publisher送出會讓Broker確認是否收到1次。\n", 170 | "- 在此筆記本(`*.ipynb`)以`%%writefile mqtt_pub.py`存為`.py`檔案,並且用命令列執行`python`腳本。" 171 | ], 172 | "metadata": { 173 | "id": "U4KaMYREcWrX" 174 | } 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "source": [ 179 | "---" 180 | ], 181 | "metadata": { 182 | "id": "-W_109d3USPn" 183 | } 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "source": [ 188 | "## Subscribe" 189 | ], 190 | "metadata": { 191 | "id": "Xu73azdMRLz8" 192 | } 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": { 198 | "id": "jQiBWVyi-dH9" 199 | }, 200 | "outputs": [], 201 | "source": [ 202 | "# Subscribe\n", 203 | "\n", 204 | "import paho.mqtt.client as mqtt\n", 205 | "\n", 206 | "def on_connect(client, userdata, flags, rc):\n", 207 | " print(f'connected with reqult code {rc}')\n", 208 | " client.subscribe(\"tcnr/#\")\n", 209 | "\n", 210 | "def on_message(client, userdata, msg):\n", 211 | " print(f\"{msg.topic} {msg.payload}\")\n", 212 | "\n", 213 | "client = mqtt.Client()\n", 214 | "client.on_connect = on_connect\n", 215 | "client.on_message = on_message\n", 216 | "\n", 217 | "broker = 'broker.emqx.io'\n", 218 | "client.connect(broker, 1883, keepalive=60)\n", 219 | "\n", 220 | "client.loop_forever()" 221 | ] 222 | } 223 | ] 224 | } -------------------------------------------------------------------------------- /MQTT_Demo_Subscriber.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "MQTT_Demo_Subscriber.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [ 9 | "1etdBS3dRSCz" 10 | ], 11 | "authorship_tag": "ABX9TyO80twy5Doqx8BzJ8JzW86v", 12 | "include_colab_link": true 13 | }, 14 | "kernelspec": { 15 | "name": "python3", 16 | "display_name": "Python 3" 17 | }, 18 | "language_info": { 19 | "name": "python" 20 | } 21 | }, 22 | "cells": [ 23 | { 24 | "cell_type": "markdown", 25 | "metadata": { 26 | "id": "view-in-github", 27 | "colab_type": "text" 28 | }, 29 | "source": [ 30 | "\"Open" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "source": [ 36 | "|[簡報](https://hackmd.io/@wiimax/ict_mqtt#/)|[Publisher](https://colab.research.google.com/drive/1DAYzTMHmv0X0FqIknvelkcKoVjZgCA3W)|[Subscriber](https://colab.research.google.com/drive/1UcUqnFHTx_DhmJd80WTjwcRql2KVt63W)\n", 37 | "|:-:|:-:|:-:\n", 38 | "|![](https://i.imgur.com/TFkKeuO.png)|![](https://i.imgur.com/uJdlGqo.png)|![](https://i.imgur.com/WVlVek0.png)" 39 | ], 40 | "metadata": { 41 | "id": "pw4jPKxPObAz" 42 | } 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "source": [ 47 | "## MQTT 實作\n", 48 | "\n", 49 | "- 需要 MQTT中間的**Broker**: 公用的\n", 50 | "- 需要發布訊息的**Publisher**\n", 51 | "- 需要接收訊息的**Subscribe**" 52 | ], 53 | "metadata": { 54 | "id": "OXqP3upSREed" 55 | } 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "source": [ 60 | "#### 熱門的公共 MQTT 伺服器:\n", 61 | "\n", 62 | "![](https://i.imgur.com/GboMca8.png)" 63 | ], 64 | "metadata": { 65 | "id": "cgWSnkP2P8CI" 66 | } 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "source": [ 71 | "## Subscribe" 72 | ], 73 | "metadata": { 74 | "id": "Xu73azdMRLz8" 75 | } 76 | }, 77 | { 78 | "cell_type": "code", 79 | "source": [ 80 | "!pip install paho-mqtt" 81 | ], 82 | "metadata": { 83 | "id": "HuvDIuzz-tX2" 84 | }, 85 | "execution_count": null, 86 | "outputs": [] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "metadata": { 92 | "id": "jQiBWVyi-dH9" 93 | }, 94 | "outputs": [], 95 | "source": [ 96 | "# %%writefile mqtt_sub.py\n", 97 | "# Subscribe\n", 98 | "\n", 99 | "import paho.mqtt.client as mqtt\n", 100 | "import json\n", 101 | "\n", 102 | "def on_connect(client, userdata, flags, rc):\n", 103 | " print(f'connected with reqult code {rc}')\n", 104 | " client.subscribe(\"tcnr/#\")\n", 105 | "\n", 106 | "def on_message(client, userdata, msg):\n", 107 | " print(f\"{msg.topic} {msg.payload.decode('utf-8')}\")\n", 108 | " temp = json.loads(msg.payload.decode('utf-8'))[\"Temperature\"]\n", 109 | " if temp >= 30:\n", 110 | " print(f\"溫度偵測{temp}大於30度,該開冷氣啦!\")\n", 111 | "\n", 112 | "client = mqtt.Client()\n", 113 | "client.on_connect = on_connect\n", 114 | "client.on_message = on_message\n", 115 | "\n", 116 | "# broker = 'broker.hivemq.com'\n", 117 | "# broker = '172.20.10.14'\n", 118 | "broker = 'broker.emqx.io'\n", 119 | "client.connect(broker, 1883, keepalive=60)\n", 120 | "\n", 121 | "client.loop_forever()\n" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "source": [ 127 | "# !python mqtt_sub.py" 128 | ], 129 | "metadata": { 130 | "id": "aFfJ-KZQk888", 131 | "colab": { 132 | "base_uri": "https://localhost:8080/" 133 | }, 134 | "outputId": "e50ed6a5-050e-4661-fa53-c1ed3413ad9c" 135 | }, 136 | "execution_count": null, 137 | "outputs": [ 138 | { 139 | "output_type": "stream", 140 | "name": "stdout", 141 | "text": [ 142 | "connected with reqult code 0\n", 143 | "tcnr/class/n13/sensor01 {\"Temperature\": 28, \"Time\": \"03/22 06:35:05\"}\n", 144 | "tcnr/class/n01/sensor01 {\"Temperature\": 26, \"Time\": \"03/22 06:35:06\"}\n", 145 | "tcnr/class/n10/sensor01 {\"Temperature\": 32, \"Time\": \"03/22 06:35:08\"}\n", 146 | "溫度偵測32大於30度,該開冷氣啦!\n", 147 | "tcnr/class/n04/sensor01 {\"Temperature\": 16, \"Time\": \"03/22 06:35:08\"}\n", 148 | "tcnr/class/n13/sensor01 {\"Temperature\": 20, \"Time\": \"03/22 06:35:10\"}\n", 149 | "tcnr/class/n01/sensor01 {\"Temperature\": 31, \"Time\": \"03/22 06:35:11\"}\n", 150 | "溫度偵測31大於30度,該開冷氣啦!\n", 151 | "tcnr/class/n10/sensor01 {\"Temperature\": 30, \"Time\": \"03/22 06:35:13\"}\n", 152 | "溫度偵測30大於30度,該開冷氣啦!\n", 153 | "tcnr/class/n04/sensor01 {\"Temperature\": 25, \"Time\": \"03/22 06:35:13\"}\n", 154 | "tcnr/class/n13/sensor01 {\"Temperature\": 22, \"Time\": \"03/22 06:35:15\"}\n", 155 | "tcnr/class/n01/sensor01 {\"Temperature\": 26, \"Time\": \"03/22 06:35:16\"}\n", 156 | "tcnr/class/n10/sensor01 {\"Temperature\": 22, \"Time\": \"03/22 06:35:18\"}\n", 157 | "tcnr/class/n04/sensor01 {\"Temperature\": 27, \"Time\": \"03/22 06:35:18\"}\n", 158 | "tcnr/class/n13/sensor01 {\"Temperature\": 30, \"Time\": \"03/22 06:35:20\"}\n", 159 | "溫度偵測30大於30度,該開冷氣啦!\n", 160 | "tcnr/class/n01/sensor01 {\"Temperature\": 28, \"Time\": \"03/22 06:35:21\"}\n", 161 | "tcnr/class/n10/sensor01 {\"Temperature\": 19, \"Time\": \"03/22 06:35:23\"}\n", 162 | "tcnr/class/n04/sensor01 {\"Temperature\": 30, \"Time\": \"03/22 06:35:23\"}\n", 163 | "溫度偵測30大於30度,該開冷氣啦!\n", 164 | "tcnr/class/n13/sensor01 {\"Temperature\": 21, \"Time\": \"03/22 06:35:25\"}\n", 165 | "\n" 166 | ] 167 | } 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "source": [ 173 | "### 練習\n", 174 | "- 修改Broker為hivemq,記得Publisher與Subscribe要同步修改。\n", 175 | "- 修改`message`處理條件,如果溫度`temp`低於20,列印\"該蓋棉被啦\"。\n", 176 | "- 在此筆記本(`*.ipynb`)以`%%writefile mqtt_sub.py`存為`.py`檔案,並且用命令列執行`python`腳本。\n" 177 | ], 178 | "metadata": { 179 | "id": "6M7F18agd1KS" 180 | } 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "source": [ 185 | "## Publishe" 186 | ], 187 | "metadata": { 188 | "id": "1etdBS3dRSCz" 189 | } 190 | }, 191 | { 192 | "cell_type": "code", 193 | "source": [ 194 | "# Publishe\n", 195 | "\n", 196 | "import paho.mqtt.client as mqtt\n", 197 | "from random import randint\n", 198 | "import time\n", 199 | "\n", 200 | "client = mqtt.Client()\n", 201 | "\n", 202 | "broker = 'broker.emqx.io'\n", 203 | "client.connect(broker, 1883, keepalive=60)\n", 204 | "\n", 205 | "client.loop_start()\n", 206 | "\n", 207 | "while True:\n", 208 | " temp = randint(18,32)\n", 209 | " topic = \"tcnr/class/n01/temp\"\n", 210 | " client.publish(topic, temp)\n", 211 | " print(f'topic: {topic}, temperature: {temp}')\n", 212 | " time.sleep(5)\n" 213 | ], 214 | "metadata": { 215 | "id": "8UOT-Ld1ROty" 216 | }, 217 | "execution_count": null, 218 | "outputs": [] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "source": [], 223 | "metadata": { 224 | "id": "kYzjLOKX-ypj" 225 | }, 226 | "execution_count": null, 227 | "outputs": [] 228 | } 229 | ] 230 | } -------------------------------------------------------------------------------- /data/handwriting_model_architecture.json: -------------------------------------------------------------------------------- 1 | {"config": [{"config": {"units": 500, "activity_regularizer": null, "activation": "linear", "bias_constraint": null, "bias_regularizer": null, "kernel_regularizer": null, "trainable": true, "dtype": "float32", "kernel_constraint": null, "kernel_initializer": {"config": {"distribution": "uniform", "scale": 1.0, "mode": "fan_avg", "seed": null}, "class_name": "VarianceScaling"}, "bias_initializer": {"config": {}, "class_name": "Zeros"}, "name": "dense_1", "batch_input_shape": [null, 784], "use_bias": true}, "class_name": "Dense"}, {"config": {"name": "activation_1", "activation": "sigmoid", "trainable": true}, "class_name": "Activation"}, {"config": {"units": 500, "kernel_initializer": {"config": {"distribution": "uniform", "scale": 1.0, "mode": "fan_avg", "seed": null}, "class_name": "VarianceScaling"}, "name": "dense_2", "bias_constraint": null, "bias_regularizer": null, "kernel_regularizer": null, "bias_initializer": {"config": {}, "class_name": "Zeros"}, "kernel_constraint": null, "activity_regularizer": null, "trainable": true, "activation": "linear", "use_bias": true}, "class_name": "Dense"}, {"config": {"name": "activation_2", "activation": "sigmoid", "trainable": true}, "class_name": "Activation"}, {"config": {"units": 10, "kernel_initializer": {"config": {"distribution": "uniform", "scale": 1.0, "mode": "fan_avg", "seed": null}, "class_name": "VarianceScaling"}, "name": "dense_3", "bias_constraint": null, "bias_regularizer": null, "kernel_regularizer": null, "bias_initializer": {"config": {}, "class_name": "Zeros"}, "kernel_constraint": null, "activity_regularizer": null, "trainable": true, "activation": "linear", "use_bias": true}, "class_name": "Dense"}, {"config": {"name": "activation_3", "activation": "softmax", "trainable": true}, "class_name": "Activation"}], "class_name": "Sequential", "keras_version": "2.0.8", "backend": "tensorflow"} -------------------------------------------------------------------------------- /data/handwriting_model_weights.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/willismax/ICT-Python-101/2a86bc979d5869cfa665022a4ad0350ad66e7708/data/handwriting_model_weights.h5 -------------------------------------------------------------------------------- /data/model01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/willismax/ICT-Python-101/2a86bc979d5869cfa665022a4ad0350ad66e7708/data/model01.png -------------------------------------------------------------------------------- /data/mov_neg.csv: -------------------------------------------------------------------------------- 1 | title,href 2 | [負雷]黑豹-失衡的烏托邦,/bbs/movie/M.1529245622.A.AF5.html 3 | [負雷] 四不像的黑豹,/bbs/movie/M.1527918611.A.56C.html 4 | [微負雷] 黑豹有點讓人失望....,/bbs/movie/M.1527839684.A.EF0.html 5 | [負雷] 是不是有精神分裂的黑豹,/bbs/movie/M.1526734538.A.5E2.html 6 | [負雷] 黑豹,/bbs/movie/M.1520322497.A.E24.html 7 | [負雷]黑豹 最差漫威電影,/bbs/movie/M.1519931938.A.207.html 8 | [ 微負雷] 黑豹,/bbs/movie/M.1519721961.A.471.html 9 | [負雷] 黑豹 Black Panther,暴力與分享,/bbs/movie/M.1518961786.A.79E.html 10 | [ 負雷] 隨便拍拍隨便買單的「黑豹」,/bbs/movie/M.1518934190.A.F7B.html 11 | [負雷] 只有政治正確的 黑豹,/bbs/movie/M.1518809750.A.4E0.html 12 | [負雷]黑豹:除了美術視覺,其餘毫無魅力,/bbs/movie/M.1518808279.A.FC9.html 13 | [睡負無雷] 看黑豹看到睡著,/bbs/movie/M.1518774686.A.889.html 14 | [小負雷] 黑豹-請修邊幅可以嗎?(內有雷),/bbs/movie/M.1518761310.A.05C.html 15 | [爛無雷] 千萬不要期待的黑豹,/bbs/movie/M.1518758578.A.8F0.html 16 | [普負雷] 輸不起的黑豹,/bbs/movie/M.1518714311.A.957.html 17 | [ 負雷] 黑豹-鋪陳復3的跳板,/bbs/movie/M.1518684255.A.489.html 18 | [ 負雷] 黑豹,/bbs/movie/M.1518670825.A.A31.html 19 | [普負雷] 黑豹出戲點,/bbs/movie/M.1518642702.A.497.html 20 | [負雷] 黑豹特遣隊,/bbs/movie/M.1518609915.A.835.html 21 | [負雷] 家天下黑豹,/bbs/movie/M.1518596627.A.3E0.html 22 | [負雷] 失望的黑豹(補個優點),/bbs/movie/M.1518577527.A.4BB.html 23 | [負雷] 真的過譽的黑豹.........,/bbs/movie/M.1518536086.A.B1E.html 24 | [負雷] 黑豹天下,/bbs/movie/M.1518534403.A.ECB.html 25 | [ 負雷] 讓人失望的黑豹,/bbs/movie/M.1518529744.A.586.html 26 | [普負雷] 一片歐罵罵的黑豹,/bbs/movie/M.1518513719.A.462.html 27 | [微負雷] 黑豹 相較其他marvel片有點可惜,/bbs/movie/M.1518510319.A.1DC.html 28 | [負雷]黑豹,/bbs/movie/M.1518495315.A.E27.html 29 | -------------------------------------------------------------------------------- /data/mov_pos.csv: -------------------------------------------------------------------------------- 1 | title,href 2 | [好雷] 黑豹 --其實蠻好看的,/bbs/movie/M.1552276420.A.F6F.html 3 | [好雷?] 黑豹 自慰片的新高度,/bbs/movie/M.1545317279.A.EFC.html 4 | [二刷好雷] 水行俠真的不是黑豹,/bbs/movie/M.1545065816.A.46A.html 5 | [好雷] 盲點:關於《黑豹》的奧克蘭也關於你我的故事,/bbs/movie/M.1538060300.A.8FD.html 6 | [好雷] 瘋狂亞洲富豪─絕不是新加坡黑豹,/bbs/movie/M.1536676591.A.C8F.html 7 | [微好雷]《雷神索爾3諸神黃昏》&《黑豹》,/bbs/movie/M.1536217925.A.B6E.html 8 | [好雷] 比黑豹好看的蟻人與黃蜂女,/bbs/movie/M.1531217457.A.6C8.html 9 | [好雷]黑豹 — 符合台灣政治的隱喻分析,/bbs/movie/M.1525369009.A.CFA.html 10 | [好雷] 黑豹:一部政治寓言,/bbs/movie/M.1521128014.A.19B.html 11 | [好雷]黑豹,好看但可惜的反派,/bbs/movie/M.1520467155.A.8E5.html 12 | [好雷] 黑豹心得,/bbs/movie/M.1519920385.A.14A.html 13 | [ 好雷] 唯一缺憾的黑豹,/bbs/movie/M.1519908058.A.B99.html 14 | [好雷] 黑豹 感想 微負評,/bbs/movie/M.1519841715.A.3BF.html 15 | [核心好雷] 黑豹-科技與部落的反差萌 人權與人道議題,/bbs/movie/M.1519812809.A.DC3.html 16 | [普好雷] 立體的世界,被一掌拍平,淺談【黑豹】,/bbs/movie/M.1519787493.A.BFD.html 17 | [有意見好雷] 比上沒得比,比下有餘的黑豹,/bbs/movie/M.1519734682.A.979.html 18 | [好雷] 黑豹 ~ 屬於黑人的童話故事,/bbs/movie/M.1519618436.A.EA0.html 19 | [ 好 雷] 黑豹,/bbs/movie/M.1519572758.A.D3C.html 20 | [好雷]黑豹 王者的抉擇,/bbs/movie/M.1519544126.A.E20.html 21 | [好雷] 黑豹就真的很好看咩~,/bbs/movie/M.1519356550.A.A17.html 22 | [ 普好雷] 黑豹,/bbs/movie/M.1519142286.A.7CD.html 23 | [好雷] 黑豹,/bbs/movie/M.1519070229.A.DE9.html 24 | [好雷] 黑豹:王者路大不易,/bbs/movie/M.1519042966.A.803.html 25 | [微好雷]《黑豹》 春秋五霸的基本套路,/bbs/movie/M.1518991149.A.BDD.html 26 | [好雷]黑豹 不太一樣的超級英雄,/bbs/movie/M.1518963410.A.A2E.html 27 | [好雷] 黑豹 其實我比較喜歡反派@@,/bbs/movie/M.1518939706.A.D48.html 28 | [好雷] 黑豹 Black Panther,非洲未來主義,/bbs/movie/M.1518893529.A.3AD.html 29 | [好雷] 黑豹: 智者造橋,愚者築牆,/bbs/movie/M.1518860486.A.B45.html 30 | [微好雷] 黑豹的電影配樂,/bbs/movie/M.1518848376.A.BBB.html 31 | [好無雷] 黑豹,/bbs/movie/M.1518802967.A.D32.html 32 | [好雷] 黑豹 沒有大家說的那麼糟啦,/bbs/movie/M.1518793606.A.036.html 33 | [好雷] 黑豹,/bbs/movie/M.1518778797.A.C52.html 34 | [普好雷] 黑豹:Do you know da way?,/bbs/movie/M.1518769142.A.229.html 35 | [普好雷] 黑豹 來談談角色動機吧,/bbs/movie/M.1518703695.A.0DC.html 36 | [好雷]YOU SHOW OFF?——《黑豹》無&有雷推薦,/bbs/movie/M.1518641125.A.57A.html 37 | [普好雷] 覺得黑豹的編劇陷入一種兩難的情況,/bbs/movie/M.1518632731.A.55E.html 38 | [好笑雷] 黑豹,/bbs/movie/M.1518627430.A.ADF.html 39 | [好雷] 黑豹 耳目一新的英雄電影,/bbs/movie/M.1518617014.A.223.html 40 | [好雷] 黑豹 最沒有在看爽的漫威片,/bbs/movie/M.1518584186.A.CA8.html 41 | [偏好雷] 看完黑豹我再重看一次預告片發現...,/bbs/movie/M.1518541335.A.C93.html 42 | [好雷] 黑豹 國王成長日記,/bbs/movie/M.1518538495.A.43A.html 43 | [好雷] 黑豹 世界觀科技感超讚 但故事有些許bug,/bbs/movie/M.1518537228.A.F8C.html 44 | [超爆幹好雷] 黑豹-名副其實漫威最佳,/bbs/movie/M.1518536561.A.C1C.html 45 | [普好雷] 黑豹,/bbs/movie/M.1518535690.A.7CD.html 46 | [好雷]【黑豹】非洲先進文明卓然而立,/bbs/movie/M.1518521887.A.244.html 47 | [普好雷] 充滿新風格的黑豹,/bbs/movie/M.1518517571.A.DD1.html 48 | [好雷] 不太懂有關黑豹反派的幾件事情,/bbs/movie/M.1518512928.A.747.html 49 | [好雷] 小細節怪怪的黑豹,/bbs/movie/M.1518511893.A.636.html 50 | [好雷] 黑豹,/bbs/movie/M.1518507928.A.ABD.html 51 | [好無雷] 黑豹,/bbs/movie/M.1518505340.A.C9A.html 52 | [好雷] 黑豹觀後討論,/bbs/movie/M.1518501262.A.9AC.html 53 | [普好雷] 黑豹 優秀的漫威宇宙擴展片,/bbs/movie/M.1518500618.A.932.html 54 | [極好無雷] 這就是我要的黑豹!,/bbs/movie/M.1518499054.A.561.html 55 | [ 好無雷] 黑豹 絕對值得一看,/bbs/movie/M.1518496440.A.BE1.html 56 | [好無雷] 大推推黑豹電影,/bbs/movie/M.1518495524.A.882.html 57 | [好雷] 黑豹,劇情普通,美術太完美,/bbs/movie/M.1518495424.A.F0B.html 58 | [好雷] 黑豹,/bbs/movie/M.1518494060.A.826.html 59 | [ 好雷] 美國隊長3 黑豹疑問,/bbs/movie/M.1462012717.A.D23.html 60 | [好雷]美隊3-關於黑豹以及各英雄的立場,/bbs/movie/M.1461921066.A.E77.html 61 | [好雷] 美國隊長3 黑豹家世,/bbs/movie/M.1461748087.A.515.html 62 | -------------------------------------------------------------------------------- /data/my-plot.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/willismax/ICT-Python-101/2a86bc979d5869cfa665022a4ad0350ad66e7708/data/my-plot.pdf -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | WDA-ICT通識課程 2 | === 3 | :::danger 4 | 課程文件已改為書本模式,點選以下連結進入 5 | - 課程連結: https://hackmd.io/@wiimax/WDA-ICT101/ 6 | ::: 7 | 8 | --- 9 | > 以下為書本目錄 10 | 11 | ### [課程介紹](/XcIAkgKyTP-9sLGixHuPCg) 12 | - [課程資源(簡報、程式碼)](/X0Q6jb39R0CO2MWgNxXg0w) 13 | - [社群資源](/cctGMCCHR-mG7XFe5AcEGA) 14 | ### [Part1. 資通訊技術與趨勢](/YgSCE1L-TuehI3UgJXfEiQ) 15 | - [ICT趨勢](/G4okOxMeRrWrhwxCsEjBvg) 16 | - [技術主題](/bDEo7P85SO2t09tkkk9W0Q) 17 | - [平台介紹](/Xpz-LzfERkWPb17TplsKtg) 18 | ### [Part2. Python程式設計基礎](/ZdOcxA2mT2CKGjRgficMeQ) 19 | ### [Part3. AI技術入門](/nGPcMtzRTpKcmYUw23ykeg) 20 | ### [課程之後...](/Rgc4mKbDTdKHKI81XrM9gA) 21 | --------------------------------------------------------------------------------