├── README.md ├── Week-1 ├── Peer Review Installing and Running Python Screen Shots.txt ├── ScreenShot-1.png └── ScreenShot-2.png ├── Week-2 └── Extracting Data With Regular Expressions.py ├── Week-3 └── Understanding the RequestResponse Cycle.py ├── Week-4 ├── Following Links in HTML Using BeautifulSoup.py └── Scraping HTML Data with BeautifulSoup.py ├── Week-5 └── Extracting Data from XML.py └── Week-6 ├── Extracting Data from JSON.py └── Using the GeoJSON API.py /README.md: -------------------------------------------------------------------------------- 1 | # Coursera---Using-Python-to-Access-Web-Data 2 | Here is All Weeks Assignment for Using Python to Access Web Data Course on Coursera 3 | -------------------------------------------------------------------------------- /Week-1/Peer Review Installing and Running Python Screen Shots.txt: -------------------------------------------------------------------------------- 1 | ''' 2 | Peer Review: Installing and Running Python Screen Shots 3 | 4 | Install Python and a programming text editor and write a program that prints one line other than 'hello world', 5 | then take two screen shots and upload them below. You should use the command line to execute the Python program 6 | you wrote in the text editor. Please do *not* use the IDLE Python Shell, the Python Interpreter (>>>), or a 7 | shortcut in your text editor to run the code. Later in the class when we start reading files, we will need to 8 | be able to run Python programs from particular directories. See the videos for details. 9 | 10 | This is a relatively simple assignment. The goal is to simply show that each student has Python installed on 11 | their desktop or laptop and can take screen shots. Please make your comments to help the student 12 | that you are reviewing. 13 | Assignment specification: http://www.pythonlearn.com/install 14 | ''' 15 | -------------------------------------------------------------------------------- /Week-1/ScreenShot-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PragneshRamani/Coursera---Using-Python-to-Access-Web-Data/df69fee1ba9ddc5e943d6dc00ff64ffdf8d62bff/Week-1/ScreenShot-1.png -------------------------------------------------------------------------------- /Week-1/ScreenShot-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PragneshRamani/Coursera---Using-Python-to-Access-Web-Data/df69fee1ba9ddc5e943d6dc00ff64ffdf8d62bff/Week-1/ScreenShot-2.png -------------------------------------------------------------------------------- /Week-2/Extracting Data With Regular Expressions.py: -------------------------------------------------------------------------------- 1 | ''' 2 | In this assignment you will read through and parse a file with text and numbers. 3 | You will extractall the numbers in the file and compute the sum of the numbers. 4 | ''' 5 | import re 6 | 7 | fname = raw_input('Enter File name :') 8 | 9 | handle = open(fname) 10 | 11 | sum=0 12 | 13 | count = 0 14 | 15 | for line in handle: 16 | 17 | f = re.findall('[0-9]+',line) 18 | 19 | for num in f: 20 | 21 | if num >= [0]: 22 | 23 | count = count + 1 24 | sum = sum + int(num) 25 | 26 | print 'There are',count,'values with a sum =',sum 27 | -------------------------------------------------------------------------------- /Week-3/Understanding the RequestResponse Cycle.py: -------------------------------------------------------------------------------- 1 | ''' 2 | 3 | You are to retrieve the following document using the HTTP protocol 4 | in a way that you can examine the HTTP Response headers. 5 | 6 | http://data.pr4e.org/intro-short.txt 7 | ''' 8 | import socket 9 | 10 | mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 11 | mysock.connect(('data.pr4e.orgcmd', 80)) 12 | mysock.send('GET http://data.pr4e.org/intro-short.txt HTTP/1.0\n\n') 13 | 14 | while True: 15 | data = mysock.recv(512) 16 | if ( len(data) < 1 ) : 17 | break 18 | print data; 19 | 20 | mysock.close() 21 | -------------------------------------------------------------------------------- /Week-4/Following Links in HTML Using BeautifulSoup.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Following Links in Python 3 | 4 | In this assignment you will write a Python program that expands on http://www.pythonlearn.com/code/urllinks.py. The program will use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find. 5 | 6 | We provide two files for this assignment. One is a sample file where we give you the name for your testing and the other is the actual data you need to process for the assignment 7 | 8 | Sample problem: Start at http://python-data.dr-chuck.net/known_by_Fikret.html 9 | Find the link at position 3 (the first name is 1). Follow that link. Repeat this process 4 times. The answer is the last name that you retrieve. 10 | Sequence of names: Fikret Montgomery Mhairade Butchi Anayah 11 | Last name in sequence: Anayah 12 | Actual problem: Start at: http://python-data.dr-chuck.net/known_by_Blanka.html 13 | Find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve. 14 | Hint: The first character of the name of the last page that you will load is: L 15 | ''' 16 | 17 | import urllib 18 | from bs4 import BeautifulSoup 19 | url = raw_input('Enter Url: ') 20 | count = int(raw_input("Enter count: ")) 21 | position = int(raw_input("Enter position:")) 22 | for i in range(count): 23 | html = urllib.urlopen(url).read() 24 | soup = BeautifulSoup(html) 25 | 26 | tags = soup('a') 27 | s = [] 28 | t = [] 29 | for tag in tags: 30 | x = tag.get('href', None) 31 | s.append(x) 32 | y = tag.text 33 | t.append(y) 34 | 35 | print s[position-1] 36 | print t[position-1] 37 | url = s[position-1] 38 | 39 | -------------------------------------------------------------------------------- /Week-4/Scraping HTML Data with BeautifulSoup.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Scraping Numbers from HTML using BeautifulSoup 3 | In this assignment you will write a Python program 4 | similar to http://www.pythonlearn.com/code/urllink2.py. 5 | The program will use urllib to read the HTML from the data files below, 6 | and parse the data, extracting numbers and compute the 7 | sum of the numbers in the file. 8 | 9 | We provide two files for this assignment. 10 | One is a sample file where we give you the sum for your testing and 11 | the other is the actual data you need to process for the assignment. 12 | 13 | Sample data: http://python-data.dr-chuck.net/comments_42.html (Sum=2553) 14 | Actual data: http://python-data.dr-chuck.net/comments_353539.html (Sum ends with 63) 15 | You do not need to save these files to your folder since your program 16 | will read the data directly from the URL. Note: Each student will have a 17 | distinct data url for the assignment - so only use your own data url for analysis. 18 | ''' 19 | import urllib 20 | from bs4 import BeautifulSoup 21 | 22 | url = raw_input('Enter - ') 23 | 24 | html = urllib.urlopen(url).read() 25 | soup = BeautifulSoup(html) 26 | tag = soup("span") 27 | count=0 28 | sum=0 29 | for i in tag: 30 | x=int(i.text) 31 | count+=1 32 | sum = sum + x 33 | print count 34 | print sum 35 | -------------------------------------------------------------------------------- /Week-5/Extracting Data from XML.py: -------------------------------------------------------------------------------- 1 | ''' 2 | In this assignment you will write a Python program somewhat similar to http://www.pythonlearn.com/code/geoxml.py. The program will prompt for a URL, read the XML data from that URL using urllib and then parse and extract the comment counts from the XML data, compute the sum of the numbers in the file. 3 | 4 | We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment. 5 | 6 | Sample data: http://python-data.dr-chuck.net/comments_42.xml (Sum=2553) 7 | Actual data: http://python-data.dr-chuck.net/comments_353536.xml (Sum ends with 90) 8 | You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis. 9 | ''' 10 | import urllib 11 | import xml.etree.ElementTree as ET 12 | 13 | url = raw_input("Enter - ") 14 | uh = urllib.urlopen(url) 15 | data = uh.read() 16 | 17 | tree = ET.fromstring(data) 18 | results = tree.findall('comments/comment') 19 | count =0 20 | sum=0 21 | for item in results: 22 | x = int(item.find('count').text) 23 | count =count+1 24 | sum = sum+x 25 | 26 | print "Count : ",count 27 | print "Sum : ",sum 28 | -------------------------------------------------------------------------------- /Week-6/Extracting Data from JSON.py: -------------------------------------------------------------------------------- 1 | ''' 2 | In this assignment you will write a Python program somewhat similar to http://www.pythonlearn.com/code/json2.py. The program will prompt for a URL, read the JSON data from that URL using urllib and then parse and extract the comment counts from the JSON data, compute the sum of the numbers in the file and enter the sum below: 3 | We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment. 4 | 5 | Sample data: http://python-data.dr-chuck.net/comments_42.json (Sum=2553) 6 | Actual data: http://python-data.dr-chuck.net/comments_353540.json (Sum ends with 71) 7 | You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis. 8 | ''' 9 | import json 10 | import urllib 11 | 12 | count = 0 13 | sum = 0 14 | url = raw_input("Enter Url - ") 15 | 16 | data = urllib.urlopen(url).read() 17 | 18 | print data 19 | 20 | info = json.loads(str(data)) 21 | 22 | for i in info['comments']: 23 | count = count+1 24 | sum = sum + i['count'] 25 | print "Sum : ",sum 26 | print "count : ",count 27 | 28 | -------------------------------------------------------------------------------- /Week-6/Using the GeoJSON API.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Calling a JSON API 3 | 4 | In this assignment you will write a Python program somewhat similar to http://www.pythonlearn.com/code/geojson.py. The program will prompt for a location, contact a web service and retrieve JSON for the web service and parse that data, and retrieve the first place_id from the JSON. A place ID is a textual identifier that uniquely identifies a place as within Google Maps. 5 | API End Points 6 | 7 | To complete this assignment, you should use this API endpoint that has a static subset of the Google Data: 8 | 9 | http://python-data.dr-chuck.net/geojson 10 | This API uses the same parameters (sensor and address) as the Google API. This API also has no rate limit so you can test as often as you like. If you visit the URL with no parameters, you get a list of all of the address values which can be used with this API. 11 | To call the API, you need to provide a sensor=false parameter and the address that you are requesting as the address= parameter that is properly URL encoded using the urllib.urlencode() fuction as shown in http://www.pythonlearn.com/code/geojson.py 12 | ''' 13 | import urllib 14 | import json 15 | 16 | serviceurl = "http://python-data.dr-chuck.net/geojson?" 17 | 18 | while True: 19 | 20 | address = raw_input("Enter location: ") 21 | 22 | if len(address) < 1 : break 23 | 24 | url = serviceurl + urllib.urlencode({'sensor':'false','address':address}) 25 | 26 | print 'Retrieving',url 27 | 28 | uh =urllib.urlopen(url) 29 | data = uh.read() 30 | print 'Retrived',len(data),'characters' 31 | 32 | try: js = json.loads(str(data)) 33 | except: js = None 34 | if 'status' not in js or js['status'] != 'OK': 35 | print '==== Failure To Retrieve ====' 36 | print data 37 | continue 38 | 39 | placeid = js["results"][0]['place_id'] 40 | print "Place id",placeid 41 | 42 | --------------------------------------------------------------------------------