├── README.md ├── best movie ├── README.md ├── expanddouban.py └── install_chromedriver.md └── investigate texts and calls ├── EN ├── README.md ├── Task0.py ├── Task1.py ├── Task2.py ├── Task3.py ├── Task4.py ├── calls.csv └── texts.csv └── ZH ├── README.md ├── Task0.py ├── Task1.py ├── Task2.py ├── Task3.py ├── Task4.py ├── calls.csv └── texts.csv /README.md: -------------------------------------------------------------------------------- 1 | # cn-python-foundation 2 | # Archival Note 3 | This repository is deprecated; therefore, we are going to archive it. However, learners will be able to fork it to their personal Github account but cannot submit PRs to this repository. If you have any issues or suggestions to make, feel free to: 4 | - Utilize the https://knowledge.udacity.com/ forum to seek help on content-specific issues. 5 | - Submit a support ticket along with the link to your forked repository if (learners are) blocked for other reasons. Here are the links for the [retail consumers](https://udacity.zendesk.com/hc/en-us/requests/new) and [enterprise learners](https://udacityenterprise.zendesk.com/hc/en-us/requests/new?ticket_form_id=360000279131). -------------------------------------------------------------------------------- /best movie/README.md: -------------------------------------------------------------------------------- 1 | 项目2 豆瓣上最好的电影 2 | ===== 3 | 4 | 项目概述 5 | ---- 6 | 7 | 在这个项目中, 你将会从豆瓣电影的网页中获取你最爱的三个类别,各个地区的高评分电影,收集他们的名称、评分、电影页面的链接和电影海报的链接。最后对收集的数据进行简单的统计。 8 | 9 | 这个项目不会提供任何 python 代码,你应该新建文件 `DoubanCrawler.py`, 并在其中逐个完成每个任务。注意这些任务并不是并列关系,后面的任务很可能需要用到前面任务的代码或函数,前面任务的对错也很可能会影响后面任务的对错。你可能会需要多次来回修改才能完成项目。 10 | 11 | 当然,即使你还没有全部完成,也可以提交项目来获取一些建议和反馈。 12 | 13 | 14 | 任务1:获取每个地区、每个类型页面的URL 15 | ---- 16 | 你可以从下面这个网址,按照分类和地区查看电影列表。 17 | 18 | ``` 19 | https://movie.douban.com/tag/#/?sort=S&range=9,10&tags=电影 20 | ``` 21 | 分解 URL 可以看到其中包含 22 | 23 | - `https://movie.douban.com/tag/#/`: 豆瓣电影分类页面 24 | - `sort=S`: 按评分排序 25 | - `range=9,10`: 评分范围 9 ~ 10 26 | - `tags=电影`: 标签为电影 27 | 28 | 其中参数tags可以包含多个以逗号分隔的标签,你可以分别选取类型和地区来进行进一步的筛选,例如选择类型为`剧情`,地区为`美国`, 那么 URL 为 29 | 30 | ``` 31 | https://movie.douban.com/tag/#/?sort=S&range=9,10&tags=电影,剧情,美国 32 | ``` 33 | 34 | 实现函数构造对应类型和地区的URL地址 35 | 36 | ``` 37 | """ 38 | return a string corresponding to the URL of douban movie lists given category and location. 39 | """ 40 | def getMovieUrl(category, location) 41 | url = None 42 | return url 43 | ``` 44 | 任务2: 获取电影页面 HTML 45 | ----- 46 | 获得URL后,我们可以获取 URL 对应页面的 HTML 47 | 48 | 在课程中,我们使用库 `requests` get 函数。 49 | 50 | ``` 51 | import requests 52 | response = requests.get(url) 53 | html = response.text 54 | ``` 55 | 56 | 这样的做法对大多数豆瓣电影列表页面来说没什么问题。但有些列表需要多页显示,我们需要不断模拟点击**加载更多**按钮来显示这个列表上的全部电影。 57 | 58 | 这个任务虽然不难,但并不是课程的重点。因此我们已经为你完成了这个任务。你只需要导入我们已经写好的文件,并调用库就可以了 59 | 60 | ``` 61 | import expanddouban 62 | html = expanddouban.getHtml(url) 63 | ``` 64 | 65 | getHtml 还有两个可选参数,你 **很有可能** 需要传入非默认的值。 66 | 67 | 要使用这个写好的函数,你需要安装 selenium 和 chromedriver,你可以参考[这份指南](https://github.com/udacity/cn-python-foundation/blob/master/best%20movie/install_chromedriver.md) 68 | 69 | 任务3: 定义电影类 70 | ----- 71 | 电影类应该包含以下成员变量 72 | 73 | - 电影名称 74 | - 电影评分 75 | - 电影类型 76 | - 电影地区 77 | - 电影页面链接 78 | - 电影海报图片链接 79 | 80 | 同时,你应该实现电影类的构造函数。 81 | 82 | ``` 83 | name = “肖申克的救赎” 84 | rate = 9.6 85 | location = "美国" 86 | category = "剧情" 87 | info_link = "https://movie.douban.com/subject/1292052/" 88 | cover_link = “https://img3.doubanio.com/view/movie_poster_cover/lpst/public/p480747492.jpg” 89 | 90 | m = Movie(name, rate, location, category, info_link, cover_link) 91 | ``` 92 | 93 | 任务4: 获得豆瓣电影的信息 94 | ----- 95 | 通过URL返回的HTML,我们可以获取网页中所有电影的名称,评分,海报图片链接和页面链接,同时我们在任务1构造URL时,也有类型和地区的信息,因为我们可以完整的构造每一个电影,并得到一个列表。 96 | 97 | 实现以下函数 98 | 99 | ``` 100 | """ 101 | return a list of Movie objects with the given category and location. 102 | """ 103 | def getMovies(category, location) 104 | return [] 105 | ``` 106 | 107 | 提示:你可能需要在这个任务中,使用前三个任务的代码或函数。 108 | 109 | 任务5: 构造电影信息数据表 110 | ----- 111 | 从网页上选取你最爱的三个电影类型,然后获取每个地区的电影信息后,我们可以获得一个包含三个类型、所有地区,评分超过9分的完整电影对象的列表。将列表输出到文件 `movies.csv`,格式如下: 112 | ``` 113 | 肖申克的救赎,9.6,美国,剧情,https://movie.douban.com/subject/1292052/,https://img3.doubanio.com/view/movie_poster_cover/lpst/public/p480747492.jpg 114 | 霍伊特团队,9.0,香港,动作,https://movie.douban.com/subject/1307914/,https://img3.doubanio.com/view/movie_poster_cover/lpst/public/p2329853674.jpg 115 | .... 116 | ``` 117 | 118 | 任务6: 统计电影数据 119 | ----- 120 | 统计你所选取的每个电影类别中,数量排名前三的地区有哪些,分别占此类别电影总数的百分比为多少? 121 | 122 | 你可能需要自己把这个任务拆分成多个步骤,统计每个类别的电影个数,统计每个类别每个地区的电影个数,排序找到最大值,做一定的数学运算等等,相信你一定可以的! 123 | 124 | 请将你的结果输出文件 `output.txt` 125 | 126 | 项目提交 127 | ----- 128 | 在提交之前,根据项目评审标准检查你的项目。Udacity 的项目评审员会根据这个标准对你的项目给予反馈,并对你的代码给出有用的指导。 129 | 130 | | 标准| 符合要求| 131 | | ------------- |:-------------| 132 | | 代码质量 | 你的代码应具有良好的结构与可读性,请遵循本课程所指出的最佳规范。 | 133 | | 不打印任何内容 | 你的代码应不会打印任何内容,而只是创建或修改两个文件 `movies.csv` 和 `output.txt` | 134 | | 完成任务1 | 实现函数`getMovieUrl`| 135 | | 完成任务2 | 通过 URL 获得豆瓣电影页面的 HTML | 136 | | 完成任务3 | 定义电影类,并实现其构造函数| 137 | | 完成任务4 | 通过类型和地区构造URL,并获取对应的HTML。解析 HTML 中的每个电影元素,并构造电影对象列表 | 138 | | 完成任务5 | 将电影信息输出到 `movies.csv`。 包含类别、地区以及对应的电影信息| 139 | | 完成任务6 | 将电影的统计结果输出到 `output.txt`。包含你选取的每个电影类别中,数量排名前三的地区有哪些,分别占此类别电影总数的百分比为多少。| 140 | 141 | 请提交submit.zip, 包含以下文件: 142 | 143 | - DoubanCrawler.py 144 | - movies.csv 145 | - output.txt 146 | - 不要提交其他任何文件 147 | 148 | 注意运行 `python DoubanCrawler.py` 后,脚本应该会在同一个文件夹生成 `movies.csv` 和 `output.txt` 两个文件。 -------------------------------------------------------------------------------- /best movie/expanddouban.py: -------------------------------------------------------------------------------- 1 | from selenium import webdriver 2 | import time 3 | 4 | """ 5 | url: the douban page we will get html from 6 | loadmore: whether or not click load more on the bottom 7 | waittime: seconds the browser will wait after initial load and 8 | """ 9 | def getHtml(url, loadmore = False, waittime = 2): 10 | browser = webdriver.Chrome('chromedriver') 11 | browser.get(url) 12 | time.sleep(waittime) 13 | if loadmore: 14 | while True: 15 | try: 16 | next_button = browser.find_element_by_class_name("more") 17 | next_button.click() 18 | time.sleep(waittime) 19 | except: 20 | break 21 | html = browser.page_source 22 | browser.quit() 23 | return html 24 | 25 | # for test 26 | #url = "https://movie.douban.com/tag/#/?sort=S&range=9,10&tags=电影,剧情,美国" 27 | #html = getHtml(url) 28 | #print(html) 29 | -------------------------------------------------------------------------------- /best movie/install_chromedriver.md: -------------------------------------------------------------------------------- 1 | ## 一、在windows上安装 2 | 3 | ## 1 安装selenium 4 | 1. 启动**cmd命令行**(同时按住**windows键**和**R键**,在对话框内输入*cmd*,然后回车) 5 | 6 | 2. 安装**selenium**,在弹出的窗口内输入 7 | >pip install selenium 8 | 9 | ## 2 下载chromedrive 10 | 前往 [**chromedriver 下载页面**](https://sites.google.com/a/chromium.org/chromedriver/downloads) 11 | 12 | ![](https://i.imgur.com/2JoXcSU.png) 13 | 14 | (如果官网下载网页无法访问,你可以访问[**Udacity**](https://cn.udacity.com)的[**GitHub项目**](https://github.com/DaemonFG/IntrotoPython-Think-Tank/blob/master/P2/ChromeDriver_Download.md)) 15 | 16 | 选择最新的win版本**chromedriver_win32.zip** 17 | 18 | ![](https://i.imgur.com/3UBl9pD.png) 19 | 20 | ## 3 解压文件 21 | 将下载好的**chromedriver_win32.zip**解压后是一个exe文件,将其复制到Python安装目录 22 | 23 | ### 注意 安装目录找不到怎么办 24 | 1. 打开控制面板下的高级系统设置,点击**环境变量** 25 | 26 | ![](https://i.imgur.com/axYVodW.png) 27 | 28 | 2. 双击path,里面带python那个就是了(例如**C:\Users\Big Geng\AppData\Local\Programs\Python\Python36**) 29 | 30 | ## 二、Mac版快速配置指南 31 | 32 | ## 1 下载同Windows 33 | 34 | 前往 [**chromedriver 下载页面**](https://sites.google.com/a/chromium.org/chromedriver/downloads) 35 | 36 | ![](https://i.imgur.com/2JoXcSU.png) 37 | 38 | (如果官网下载网页无法访问,你可以访问[**Udacity**](https://cn.udacity.com)的[**GitHub项目**](https://github.com/DaemonFG/IntrotoPython-Think-Tank/blob/master/P2/ChromeDriver_Download.md)) 39 | 40 | ## 2 找到包含lib关键字的路径 41 | 42 | ![](https://i.imgur.com/Ka1Qf0K.jpg) 43 | 44 | ## 3 放到跟lib同一级目录下的bin里 45 | 46 | ![](https://i.imgur.com/AUW2BcG.jpg) 47 | 48 | 同理,如果使用pyenv,放在自己建立的环境下的bin目录中即可 49 | 50 | ![](https://i.imgur.com/Rm8aOL3.jpg) 51 | -------------------------------------------------------------------------------- /investigate texts and calls/EN/README.md: -------------------------------------------------------------------------------- 1 | # Investigate texts and calls 2 | 3 | Project Overview 4 | ================ 5 | In this Project, you will complete five tasks based on a fabricated set of calls and texts exchanged during September 2016. You will use Python to analyze and answer questions about the texts and calls contained in the dataset. 6 | 7 | What will I learn? 8 | --------------------------- 9 | After completing this Proejct, you'll be able to: 10 | 11 | * Design and write your own Python programs to perform real-life tasks, 12 | * program in Python on your own computer, 13 | * write Python code that is readable and conforms to [PEP8](https://www.python.org/dev/peps/pep-0008/) guidelines, 14 | * process string and numerical data in Python, 15 | * choose and use Python's built-in data structures, 16 | * use Internet resources to help you, 17 | * use Python's built-in functions and write your own functions and 18 | * use loops and conditional statements in Python. 19 | 20 | Why this Project? 21 | ------------- 22 | You'll apply the skills you've learned so far in a realistic scenario. The five tasks are structured to give you experience with a variety of programming problems. You will receive code review of your work; this personal feedback will help you to improve your Python programming. 23 | 24 | How to prepare for the Project 25 | ------ 26 | To prepare for this Project, first complete the course materials of Part 1 of Introduction to Python Programming. If you're already confident that you have the skills listed above, feel free to try the project and jump back into the course materials if you find you need to refresh your knowledge. 27 | 28 | Project Details 29 | ================ 30 | In this Lab, you'll complete five tasks to answer questions about telephone calls and text messages from a fabricated data for a set of calls and texts exchanged during September 2016. 31 | 32 | Step 1 33 | -------- 34 | Download and open the zipped folder [here](https://github.com/udacity/cn-python-foundation.git). Under folder `investigate texts and calls` You will find five python files `Task0.py` ~ `Task4.py` and two csv files `calls.csv` and `texts.csv` 35 | 36 | About the data 37 | --------- 38 | The text and call data are provided in csv files. In each file, the data is already read and stored as lists of lists. 39 | 40 | Each sub-list of the list of texts is structured in this way: 41 | 42 | ``` 43 | a text = [ Sending telephone number (string), 44 | receiving telephone number (string), 45 | timestamp of text message (string)] 46 | ``` 47 | Each element in the list of phone calls is structured in this way: 48 | 49 | ``` 50 | a call = [ Calling telephone number (string), 51 | receiving telephone number (string), 52 | start timestamp of telephone call (string), 53 | duration of telephone call in seconds (string)] 54 | ``` 55 | 56 | All telephone numbers are 10 numerical digits long. Each telephone number starts with a code indicating the location and/or type of the telephone number. 57 | There are three different kinds of telephone numbers, each with a different format: 58 | 59 | * Fixed lines start with an area code enclosed in brackets. The area codes vary in length but always begin with 0. Example: `(022)40840621`. 60 | * Mobile numbers have no parentheses, but have a space in the middle of the number to help readability. The mobile code of a mobile number is its first four digits and they always start with 7, 8 or 9. Example: `93412 66159`. 61 | * Telemarketers' numbers have no parentheses or space, but start with the code 140. Example: `1402316533`. 62 | 63 | Step 2 64 | ---------- 65 | Complete the five tasks by the instructions in the files. 66 | 67 | Do not change the data or instructions, simply add your code below what has been provided. 68 | 69 | Include all the code that you need for each task in that file. 70 | 71 | The solution outputs for each file should be the print statements described in the instructions. Feel free to use other print statements during the development process, but remember to remove them for submission - the submitted files should print only the solution outputs. 72 | 73 | Step 3 74 | --------- 75 | Use the rubric to check your work before submission. A Udacity Reviewer will give feedback on your work based on this rubric and will leave helpful comments on your code. 76 | 77 | | Criteria| Meets Specifications| 78 | | ------------- |:-------------| 79 | | code quality | Your code should be well-structured and readable, following best practices explained in the course. | 80 | | Print only the solutin outputs | Feel free to use other print statements during the development process, but remember to remove them for submission | 81 | | Task 0 successful | The script correctly prints out the infomation of first record of texts and last record of calls | 82 | | Task 1 successful | The script correctly prints number of distinct telephone numbers in the dataset. | 83 | | Task 2 successful |The script correctly prints the telephone number that spent the longest time on the phone and the the total time in seconds they spend on phone call.| 84 | | Task 3 successful |The script correctly prints the telephone codes called by fixed-line numbers in Bangalore and the proportion of calls from fixed lines in Bangalore that are to fixed lines in Bangalore. | 85 | | Task 4 successful |The script correctly prints the list of numbers that could be telemarketers. | 86 | 87 | Submission 88 | ====== 89 | Submit a zip file named as **submit.zip** with **ONLY** the following files: 90 | 91 | - Task0.py 92 | - Task1.py 93 | - Task2.py 94 | - Task3.py 95 | - Task4.py 96 | -------------------------------------------------------------------------------- /investigate texts and calls/EN/Task0.py: -------------------------------------------------------------------------------- 1 | """ 2 | Intro to Python Project 1, Task 0 3 | 4 | Complete each task in the file for that task. Submit the whole folder 5 | as a zip file or GitHub repo. 6 | Full submission instructions are available on the Project Preparation page. 7 | """ 8 | 9 | 10 | """ 11 | Read file into texts and calls. 12 | It's ok if you don't understand how to read files 13 | You will learn more about reading files in future lesson 14 | """ 15 | import csv 16 | with open('texts.csv', 'r') as f: 17 | reader = csv.reader(f) 18 | texts = list(reader) 19 | 20 | with open('calls.csv', 'r') as f: 21 | reader = csv.reader(f) 22 | calls = list(reader) 23 | 24 | 25 | """ 26 | TASK 0: 27 | what is the first record of texts and what is the last record of calls 28 | Print messages: 29 | "First record of texts, texts at time