├── .DS_Store
├── 1-Programming-for-Everybody-Getting-Started-with-Python
    ├── .DS_Store
    ├── Assignment
    │   ├── Assignment_2.1.txt
    │   ├── Assignment_2.2.txt
    │   ├── Assignment_2.3.txt
    │   ├── Assignment_3.1.txt
    │   ├── Assignment_3.3.txt
    │   ├── Assignment_4.6.txt
    │   └── Assignment_5.2.txt
    └── Quiz
    │   ├── .DS_Store
    │   ├── Week 3 Chapter 1.txt
    │   ├── Week 4 Chapter 2.txt
    │   ├── Week 5 Chapter 3.txt
    │   ├── Week 6 Chapter 4.txt
    │   └── Week 7 Chapter 5.txt
├── 2-Python-Data-Structure
    ├── .DS_Store
    ├── Assignment
    │   ├── Assignment 10.2.txt
    │   ├── Assignment 6.5.txt
    │   ├── Assignment 7.1.txt
    │   ├── Assignment 7.2.txt
    │   ├── Assignment 8.4.txt
    │   ├── Assignment 8.5.txt
    │   └── Assignment 9.4.txt
    └── Quiz
    │   ├── .DS_Store
    │   ├── Week 1 Chapter 6.txt
    │   ├── Week 3 Chapter 7.txt
    │   ├── Week 4 Chapter 8.txt
    │   ├── Week 5 Chapter 9.txt
    │   └── Week 6 Chapter 10.txt
├── 3-Using-Python-To-Access_Web-Data
    ├── .DS_Store
    ├── Assignment
    │   ├── .DS_Store
    │   ├── Assignment 2 Extracting Data With Regular Expressions.txt
    │   ├── Assignment 3 Understanding the Request : Response Cycle.txt
    │   ├── Assignment 4.1 Scraping HTML Data with BeautifulSoup.txt
    │   ├── Assignment 4.2 Following Links in HTML Using BeautifulSoup.txt
    │   ├── Assignment 5 Extracting Data from XML.txt
    │   ├── Assignment 6.1 Extracting Data from JSON.txt
    │   └── Assignment 6.2 Using the GeoJSON API.txt
    └── Quiz
    │   ├── .DS_Store
    │   ├── Week 2 Regular Expressions.txt
    │   ├── Week 3 Networks and Sockets.txt
    │   ├── Week 4 Reading Web Data From Python.txt
    │   ├── Week 5 eXtensible Markup Language.txt
    │   └── Week 6 Rest, Json, and APIs.txt
├── 4-Using-Database-With_Python
    ├── .DS_Store
    ├── Assignment
    │   ├── .DS_Store
    │   ├── Week 2 Assignment 2.1 Our First Database.py
    │   ├── Week 2 Assignment 2.2 (Counting Email In database)
    │   │   ├── .mbox.txt.icloud
    │   │   ├── Week 2 Assignment 2.2 (Counting Email In database).py
    │   │   └── emaildb.sqlite
    │   ├── Week 3 Assignment Multi-Table Database - Tracks
    │   │   ├── Week 3 Multi-Table Database - Tracks.py
    │   │   ├── tracks.sqlite
    │   │   └── tracks
    │   │   │   ├── Library.xml
    │   │   │   ├── README.txt
    │   │   │   └── tracks.py
    │   ├── Week 4 Assignment Many Students in Many Courses
    │   │   ├── .DS_Store
    │   │   ├── Week 4 Assignment Many Students in Many Courses.py
    │   │   └── roster_data.json
    │   └── Week 5 Assignment Databases and Visualization (peer-graded)
    │   │   ├── .DS_Store
    │   │   ├── A.1.1. - Geoload running.PNG
    │   │   ├── A.1.2. - Geodump running.PNG
    │   │   ├── A.1.3. - My location.PNG
    │   │   └── geodata
    │   │       ├── README.txt
    │   │       ├── geodata.sqlite
    │   │       ├── geodump.py
    │   │       ├── geoload.py
    │   │       ├── where.data
    │   │       ├── where.html
    │   │       └── where.js
    └── Quiz
    │   ├── .DS_Store
    │   ├── Week 1.1 Using Encoded Data in Python 3.txt
    │   ├── Week 1.2 Object Oriented Programming-72.txt
    │   ├── Week 2 Single-Table SQL.txt
    │   ├── Week 3 Multi-Table Relational SQL.txt
    │   └── Week 4 Many-to-Many Relationships and Python.txt
└── 5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python
    ├── .DS_Store
    ├── Assignment
        ├── .DS_Store
        ├── Assignment 1 pagerank
        │   ├── .DS_Store
        │   ├── .idea
        │   │   ├── inspectionProfiles
        │   │   │   └── Project_Default.xml
        │   │   ├── misc.xml
        │   │   ├── modules.xml
        │   │   ├── pagerank.iml
        │   │   └── workspace.xml
        │   ├── BeautifulSoup.py
        │   ├── BeautifulSoup.pyc
        │   ├── LICENSE
        │   ├── README.txt
        │   ├── d3.v2.js
        │   ├── force.css
        │   ├── force.html
        │   ├── force.js
        │   ├── outputImages
        │   │   ├── .DS_Store
        │   │   ├── force.png
        │   │   ├── force_oth.png
        │   │   ├── gmainC.png
        │   │   └── spdump.png
        │   ├── pageRank.txt
        │   ├── spdump.py
        │   ├── spider.js
        │   ├── spider.py
        │   ├── spider.sqlite
        │   ├── spjson.py
        │   ├── sprank.py
        │   └── spreset.py
        ├── Assignment 2 Spidering and Modeling Email Data
        │   ├── .DS_Store
        │   ├── OutputImages
        │   │   ├── gbasic.png
        │   │   ├── gline.png
        │   │   ├── gmain.png
        │   │   ├── gmodel.png
        │   │   └── gword.png
        │   ├── README.txt
        │   ├── content.sqlite
        │   ├── d3.layout.cloud.js
        │   ├── d3.v2.js
        │   ├── email .txt
        │   ├── gbasic.py
        │   ├── gline.htm
        │   ├── gline.js
        │   ├── gline.py
        │   ├── gline2.htm
        │   ├── gmane.py
        │   ├── gmodel.py
        │   ├── gword.htm
        │   ├── gword.js
        │   ├── gword.py
        │   ├── gyear.py
        │   ├── index.sqlite
        │   └── mapping.sqlite
        └── Assignment 3
        │   ├── .DS_Store
        │   └── outputImages
        │       ├── .DS_Store
        │       ├── blob_serve (1).png
        │       ├── blob_serve (2).png
        │       ├── blob_serve (4).png
        │       └── blob_serve.png
    └── Quiz
        └── Week 1 Using Encoded Data in Python 3.txt


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/.DS_Store


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/1-Programming-for-Everybody-Getting-Started-with-Python/.DS_Store


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Assignment/Assignment_2.1.txt:
--------------------------------------------------------------------------------
 1 | """You can write any code you like in the window below. There are three files loaded and ready for you to open if you want to do file processing: "mbox-short.txt", "romeo.txt", and "words.txt"."""
 2 | 
 3 | fh = open("words.txt", "r")
 4 | 
 5 | count = 0
 6 | for line in fh:
 7 |     print(line.strip())
 8 |     count = count + 1
 9 | 
10 | print(count,"Lines")
11 | 
12 | gh = open("romeo.txt", "r")
13 | 
14 | count = 0
15 | for line in gh:
16 |     print(line.strip())
17 |     count = count + 1
18 | 
19 | print(count,"Lines")
20 | 
21 | kh = open("mbox-short.txt", "r")
22 | 
23 | count = 0
24 | for line in kh:
25 |     print(line.strip())
26 |     count = count + 1
27 | 
28 | print(count,"Lines")


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Assignment/Assignment_2.2.txt:
--------------------------------------------------------------------------------
1 | #"""2.2 Write a program that uses input to prompt a user for their name and then welcomes them. Note that input will pop up a dialog box. Enter Sarah in the pop-up box when you are prompted so your output will match the desired output."""
2 | 
3 | name = input("Enter Your Name: ")
4 | print("Hello "+name)


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Assignment/Assignment_2.3.txt:
--------------------------------------------------------------------------------
 1 | #"""2.3 Write a program to prompt the user for hours and rate per hour using input to compute gross pay. Use 35 hours and a rate of 2.75 per hour to test the program (the pay should be 96.25). You should use input to read a string and float() to convert the string to a number. Do not worry about error checking or bad user data."""
 2 | 
 3 | 
 4 | def computepay(h,r):
 5 |     if h < 0 or r < 0:
 6 |         return None
 7 |     elif h > 40:
 8 |         return (40*r+(h-40)*1.5*r)
 9 |     else:
10 |         return (h*r)
11 |     
12 | try:
13 |     hrs = raw_input("Enter Hours:")
14 |     hour = float(hrs)
15 |     r = raw_input("please input your rate:")
16 |     rate = float(r)
17 |     p = computepay(hour,rate)
18 |     print (p)
19 | except:
20 |     print ("Please,input your numberic")


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Assignment/Assignment_3.1.txt:
--------------------------------------------------------------------------------
 1 | #""" 3.1 Write a program to prompt the user for hours and rate per hour using input to compute gross pay. Pay the hourly rate for the hours up to 40 and 1.5 times the hourly rate for all hours worked above 40 hours. Use 45 hours and a rate of 10.50 per hour to test the program (the pay should be 498.75). You should use input to read a string and float() to convert the string to a number. Do not worry about error checking the user input - assume the user types numbers properly.
 2 |    Grade updated on server. """
 3 |  
 4 | 
 5 | def computepay(h,r):
 6 |     if h < 0 or r < 0:
 7 |         return None
 8 |     elif h > 40:
 9 |         return (40*r+(h-40)*1.5*r)
10 |     else:
11 |         return (h*r)
12 |     
13 | try:
14 |     hrs = raw_input("Enter Hours:")
15 |     hour = float(hrs)
16 |     r = raw_input("please input your rate:")
17 |     rate = float(r)
18 |     p = computepay(hour,rate)
19 |     print (p)
20 | except:
21 |     print ("Please,input your numberic")


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Assignment/Assignment_3.3.txt:
--------------------------------------------------------------------------------
 1 | #"""3.3 Write a program to prompt for a score between 0.0 and 1.0. If the score is out of range, print an error. If the score is between 0.0 and 1.0, print a grade using the following table:
 2 | Score Grade
 3 | >= 0.9 A
 4 | >= 0.8 B
 5 | >= 0.7 C
 6 | >= 0.6 D
 7 | < 0.6 F
 8 | If the user enters a value out of range, print a suitable error message and exit. For the test, enter a score of 0.85."""
 9 | 
10 | 
11 | try:
12 |     s = raw_input("please input your score:")
13 |     score = float(s)
14 |     if score > 1.0:
15 |         print ("value out of range")
16 |     elif 1.0 >= score>=.9:
17 |         print ("A")
18 |     elif .9 > score>=.8:
19 |         print ("B")
20 |     elif .8 >score>=.7:
21 |         print ("D")    
22 |     elif .7 >score>=.6:
23 |         print ("D")
24 | except:
25 |     print ("Error , please input is numeric")


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Assignment/Assignment_4.6.txt:
--------------------------------------------------------------------------------
 1 | #"""4.6 Write a program to prompt the user for hours and rate per hour using input to compute gross pay. Award time-and-a-half for the hourly rate for all hours worked above 40 hours. Put the logic to do the computation of time-and-a-half in a function called computepay() and use the function to do the computation. The function should return a value. Use 45 hours and a rate of 10.50 per hour to test the program (the pay should be 498.75). You should use input to read a string and float() to convert the string to a number. Do not worry about error checking the user input unless you want to - you can assume the user types numbers properly. Do not name your variable sum or use the sum() function."""
 2 | 
 3 | def computepay(h,r):
 4 |     if h < 0 or r < 0:
 5 |         return None
 6 |     elif h > 40:
 7 |         return (40*r+(h-40)*1.5*r)
 8 |     else:
 9 |         return (h*r)
10 |     
11 | try:
12 |     hrs = input("Enter Hours:")
13 |     hour = float(hrs)
14 |     r = input("please input your rate:")
15 |     rate = float(r)
16 |     p = computepay(hour,rate)
17 |     print(p)
18 | except:
19 |     print("Please,input your numberic")


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Assignment/Assignment_5.2.txt:
--------------------------------------------------------------------------------
 1 | #"""5.2 Write a program that repeatedly prompts a user for integer numbers until the user enters 'done'. Once 'done' is entered, print out the largest and smallest of the numbers. If the user enters anything other than a valid number catch it with a try/except and put out an appropriate message and ignore the number. Enter 7, 2, bob, 10, and 4 and match the output below."""
 2 | 
 3 | largest = None
 4 | smallest = None
 5 | 
 6 | while True:
 7 |     inp = input("Enter a number: ")
 8 |     if inp == "done" : break
 9 |     try:
10 |         num = float(inp)
11 |     except:
12 |         print("Invalid input")
13 |         continue
14 |     if smallest is None:
15 |         smallest = num 
16 |     if num > largest :
17 |         largest = num
18 |     elif num < smallest :
19 |         smallest = num
20 | 
21 | def done(largest,smallest):
22 |     print("Maximum is", int(largest))
23 |     print("Minimum is", int(smallest))
24 | 
25 | done(largest,smallest)


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Quiz/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/1-Programming-for-Everybody-Getting-Started-with-Python/Quiz/.DS_Store


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Quiz/Week 3 Chapter 1.txt:
--------------------------------------------------------------------------------
 1 | 1.When Python is running in the interactive mode and displaying the chevron prompt (>>>) - what question is Python asking you?
 2 | ==> What Python statement would you like me to run?
 3 | 
 4 | 2.What will the following program print out:
 5 | >>> x = 15
 6 | >>> x = x + 5
 7 | >>> print(x)
 8 | ==> 20
 9 | 
10 | 3.Python scripts (files) have names that end with:
11 | ==>.py
12 | 
13 | 4.Which of these words are reserved words in Python ?
14 | ==>
15 | — break
16 | — if
17 | 
18 | 5.What is the proper way to say “good-bye” to Python?
19 | ==>quit()
20 | 
21 | 6.Which of the parts of a computer actually executes the program instructions?
22 | ==> Central Processing Unit
23 | 
24 | 7.What is "code" in the context of this course?
25 | ==> A sequence of instructions in a programming language
26 | 
27 | 8.A USB memory stick is an example of which of the following components of computer architecture?
28 | ==> Secondary Memory
29 | 
30 | 9.What is the best way to think about a "Syntax Error" while programming?
31 | ==> The computer did not understand the statement that you entered
32 | 
33 | 10.Which of the following is not one of the programming patterns covered in Chapter 1?
34 | ==> Random steps


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Quiz/Week 4 Chapter 2.txt:
--------------------------------------------------------------------------------
 1 | 1.Which of the following is a comment in Python?
 2 | ==> # This is a test
 3 | 
 4 | 2.What does the following code print out?
 5 | print("123" + "abc")
 6 | ==> 123abc
 7 | 
 8 | 3.Which of the following variables is the "most mnemonic"?
 9 | ==> x
10 | 
11 | 4.Which of the following is not a Python reserved word?
12 | ==> spam
13 | 
14 | 5.Assume the variable x has been initialized to an integer value (e.g., x = 3). What does the following statement do?
15 | x = x + 2
16 | ==> Retrieve the current value for x, add two to it, and put the sum back into x
17 | 
18 | 6.Which of the following elements of a mathematical expression in Python is evaluated first?
19 | ==> Parentheses ( )
20 | 
21 | 7.What is the value of the following expression
22 | 42 % 10
23 | ==> 2
24 | 
25 | 8.What will be the value of x after the following statement executes:
26 | x = 1 + 2 * 3 - 8 / 4
27 | ==> 5.0
28 | 
29 | 9.What will be the value of x when the following statement is executed:
30 | x = int(98.6)
31 | ==> 98
32 | 
33 | 10.What does the Python input() function do?
34 | ==> Pause the program and read data from the user
35 | 
36 | 11.In the following code, print(98.6) What is “98.6”?
37 | ==> A constant
38 | 
39 | 12.Which of the following is a bad Python variable name?
40 | ==> spam.23


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Quiz/Week 5 Chapter 3.txt:
--------------------------------------------------------------------------------
 1 | 1.What do we do to a Python statement that is immediately after an if statement to indicate that the statement is to be executed only when the if statement is true?
 2 | ==> Indent the line below the if statement
 3 | 
 4 | 2.Which of these operators is not a comparison / logical operator?
 5 | ==> =
 6 | 
 7 | 3.What is true about the following code segment:
 8 | if  x == 5 :
 9 |     print('Is 5')
10 |     print('Is Still 5')
11 |     print('Third 5')
12 | ==> Depending on the value of x, either all three of the print statements will execute or none of the statements will execute
13 | 
14 | 4.When you have multiple lines in an if block, how do you indicate the end of the if block?
15 | ==> You de-indent the next line past the if block to the same level of indent as the original if statement
16 | 
17 | 5.You look at the following text:
18 | if x == 6 :
19 |     print('Is 6')
20 |     print('Is Still 6')
21 |     print('Third 6')
22 | It looks perfect but Python is giving you an 'Indentation Error' on the second print statement. What is the most likely reason?
23 | ==> You have mixed tabs and spaces in the file
24 | 
25 | 6.What is the Python reserved word that we use in two-way if tests to indicate the block of code that is to be executed if the logical test is false?
26 | ==>else
27 | 
28 | 7.What will the following code print out?
29 | x = 0
30 | if x < 2 :
31 |     print('Small')
32 | elif x < 10 :
33 |     print('Medium')
34 | else :
35 |     print('LARGE')
36 | print('All done')
37 | ==> Small
38 |     All done
39 | 
40 | 8.For the following code,
41 | if x < 2 :
42 |     print('Below 2')
43 | elif x >= 2 :
44 |      print('Two or more')
45 | else :
46 |     print('Something else')
47 | What value of 'x' will cause 'Something else' to print out?
48 | ==>This code will never print 'Something else' regardless of the value for 'x'
49 | 
50 | 9.In the following code (numbers added) - which will be the last line to execute successfully?
51 | (1)   astr = 'Hello Bob'
52 | (2)   istr = int(astr)
53 | (3)   print('First', istr)
54 | (4)   astr = '123'
55 | (5)   istr = int(astr)
56 | (6)   print('Second', istr)
57 | ==> 1
58 | 
59 | 10.For the following code:
60 | astr = 'Hello Bob'
61 | istr = 0
62 | try:
63 |     istr = int(astr)
64 | except:
65 |     istr = -1
66 | What will the value be for istr after this code executes?
67 | ==>-1


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Quiz/Week 6 Chapter 4.txt:
--------------------------------------------------------------------------------
 1 | 1.Which Python keyword indicates the start of a function definition?
 2 | ==> def
 3 | 
 4 | 2.In Python, how do you indicate the end of the block of code that makes up the function?
 5 | ==> You de-indent a line of code to the same indent level as the def keyword
 6 | 
 7 | 3.In Python what is the input() feature best described as?
 8 | ==> A built-in function
 9 | 
10 | 4.What does the following code print out?
11 | 	def thing():
12 |     	print('Hello')
13 | 
14 | 	print('There')
15 | ==> There
16 | 
17 | 5.In the following Python code, which of the following is an "argument" to a function?
18 | x = 'banana'
19 | y = max(x)
20 | print(y)
21 | ==> x
22 | 
23 | 6.What will the following Python code print out?
24 | def func(x) :
25 |     print(x)
26 | 
27 | func(10)
28 | func(20)
29 | ==>	10
30 | 	20
31 | 	
32 | 7.Which line of the following Python program will never execute?
33 | def stuff():
34 |     print('Hello')
35 |     return
36 |     print('World')
37 | 
38 | stuff()
39 | ==> print ('World')
40 | 
41 | 8.What will the following Python program print out?
42 | def greet(lang):
43 |     if lang == 'es':
44 |         return 'Hola'
45 |     elif lang == 'fr':
46 |         return 'Bonjour'
47 |     else:
48 |         return 'Hello'
49 | 
50 | print(greet('fr'),'Michael')
51 | ==>Bonjour Michael
52 | 
53 | 9.What does the following Python code print out? (Note that this is a bit of a trick question and the code has what many would consider to be a flaw/bug - so read carefully).
54 | def addtwo(a, b):
55 |     added = a + b
56 |     return a
57 | 
58 | x = addtwo(2, 7)
59 | print(x)
60 | ==>2
61 | 
62 | 10.What is the most important benefit of writing your own functions?
63 | ==>Avoiding writing the same non-trivial code more than once in your program


--------------------------------------------------------------------------------
/1-Programming-for-Everybody-Getting-Started-with-Python/Quiz/Week 7 Chapter 5.txt:
--------------------------------------------------------------------------------
 1 | 1.What is wrong with this Python loop:
 2 | n = 5
 3 | while n > 0 :
 4 |     print(n)
 5 | print('All done')
 6 | ==> This loop will run forever
 7 | 
 8 | 2.What does the break statement do?
 9 | ==> Exits the currently executing loop
10 | 
11 | 3.What does the continue statement do?
12 | ==> Jumps to the "top" of the loop and starts the next iteration
13 | 
14 | 4.What does the following Python program print out?
15 | tot = 0 
16 | for i in [5, 4, 3, 2, 1] :
17 |     tot = tot + 1
18 | print(tot)
19 | ==> 5
20 | 
21 | 5.What is the iteration variable in the following Python code:
22 | friends = ['Joseph', 'Glenn', 'Sally']
23 | for friend in friends :
24 |      print('Happy New Year:',  friend)
25 | print('Done!')
26 | ==> friend
27 | 
28 | 6.What is a good description of the following bit of Python code?
29 | zork = 0
30 | for thing in [9, 41, 12, 3, 74, 15] :
31 |     zork = zork + thing
32 | print('After', zork)
33 | ==> Sum all the elements of a list
34 | 
35 | 7.What will the following code print out?
36 | smallest_so_far = -1
37 | for the_num in [9, 41, 12, 3, 74, 15] :
38 |    if the_num < smallest_so_far :
39 |       smallest_so_far = the_num
40 | print(smallest_so_far)
41 | ==> -1
42 | 
43 | 8.What is a good statement to describe the is operator as used in the following if statement:
44 | if smallest is None :
45 |      smallest = value
46 | ==> matches both type and value
47 | 
48 | 9.Which reserved word indicates the start of an "indefinite" loop in Python?
49 | ==> while
50 | 
51 | 10.How many times will the body of the following loop be executed?
52 | n = 0
53 | while n > 0 :
54 |     print('Lather')
55 |     print('Rinse')
56 | print('Dry off!')
57 | ==> 0


--------------------------------------------------------------------------------
/2-Python-Data-Structure/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/2-Python-Data-Structure/.DS_Store


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Assignment/Assignment 10.2.txt:
--------------------------------------------------------------------------------
 1 | #"""10.2 Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ' line by finding the time and then splitting the string a second time using a colon.
 2 | From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
 3 | Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below."""
 4 | 
 5 | 
 6 | #Use mbox-short.txt File name
 7 | 
 8 | name = input("Enter file:")
 9 | f = open(name)
10 | dic = {}
11 | for i in f:
12 |     if i.startswith("From") and len(i.split()) > 2:
13 |         line = i.split()
14 |         if not dic.has_key(line[5][:2]):
15 |             dic[line[5][:2]] = 1
16 |         else:
17 |             dic[line[5][:2]] += 1
18 |                 
19 | key = sorted(dic)
20 | for i in key:
21 |     print (i, dic[i])


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Assignment/Assignment 6.5.txt:
--------------------------------------------------------------------------------
 1 | #"""6.5 Write code using find() and string slicing (see section 6.10) to extract the number at the end of the line below. Convert the extracted value to a floating point number and print it out."""
 2 | 
 3 | 
 4 | 
 5 | text = "X-DSPAM-Confidence:    0.8475";
 6 | 
 7 | spacePos = text.find(" ")
 8 | number = text[spacePos::1]
 9 | #not really necessary but since we are just learning and playing
10 | strippedNumber = number.lstrip();
11 | result = float(strippedNumber)
12 | 
13 | def reprint(printed):
14 |     print(printed) 
15 | 
16 | reprint(result)
17 | 
18 | 


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Assignment/Assignment 7.1.txt:
--------------------------------------------------------------------------------
 1 | #"""7.1 Write a program that prompts for a file name, then opens that file and reads through the file, and print the contents of the file in upper case. Use the file words.txt to produce the output below.
 2 | You can download the sample data at http://www.py4e.com/code3/words.txt"""
 3 | 
 4 | 
 5 | # Use words.txt as the file name
 6 | fname = input("Enter file name: ")
 7 | fh = open(fname)
 8 | for line in fname:
 9 | 	line = line.rstrip()
10 | inp = fh.read()
11 | print(inp.upper())


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Assignment/Assignment 7.2.txt:
--------------------------------------------------------------------------------
 1 | #"""7.2 Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
 2 | X-DSPAM-Confidence:    0.8475
 3 | Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
 4 | You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name."""
 5 | 
 6 | 
 7 | 
 8 | # Use the file name mbox-short.txt as the file name
 9 | fname = input("Enter file name: ")
10 | fh = open(fname)
11 | count = 0
12 | s = 0
13 | for line in fh:
14 |     if not line.startswith("X-DSPAM-Confidence:") :
15 |         continue
16 |     pos = line.find('0')
17 |     s += float(line[pos:pos+6])
18 |     count += 1
19 |     average = s / count
20 | print("Average spam confidence:", average)
21 | 
22 | 


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Assignment/Assignment 8.4.txt:
--------------------------------------------------------------------------------
 1 | #"""8.4 Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.
 2 | You can download the sample data at http://www.py4e.com/code3/romeo.txt"""
 3 | 
 4 | 
 5 | # File name is "romeo.txt"
 6 | fajl = raw_input("unesite ime fajla: ")
 7 | fajlOpen = open(fajl)
 8 | listica = []
 9 | linije = [line.split() for line in fajlOpen]
10 | for i in linije:
11 |     for j in i:
12 |         if j not in listica:
13 |             listica.append(j)
14 | listica.sort()
15 | print(listica)


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Assignment/Assignment 8.5.txt:
--------------------------------------------------------------------------------
 1 | #"""8.5 Open the file mbox-short.txt and read it line by line. When you find a line that starts with 'From ' like the following line:
 2 | From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
 3 | You will parse the From line using split() and print out the second word in the line (i.e. the entire address of the person who sent the message). Then print out a count at the end.
 4 | Hint: make sure not to include the lines that start with 'From:'.
 5 | 
 6 | You can download the sample data at http://www.py4e.com/code3/mbox-short.txt"""
 7 | 
 8 | 
 9 | 
10 | #Use  mbox-short.txt as File Name
11 | 
12 | fname = input("Enter file name: ")
13 | 
14 | tekst = open(fname)
15 | count = 0
16 | for linija in tekst:
17 |     if linija.startswith("From "):
18 |         rijeci = linija.rstrip().split()
19 |         email = rijeci[1]
20 |         print(email)
21 |         count +=1
22 |     else:
23 |         continue
24 | 
25 | print("There were", count, "lines in the file with From as the first word")


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Assignment/Assignment 9.4.txt:
--------------------------------------------------------------------------------
 1 | #"""9.4 Write a program to read through the mbox-short.txt and figure out who has the sent the greatest number of mail messages. The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail. The program creates a Python dictionary that maps the sender's mail address to a count of the number of times they appear in the file. After the dictionary is produced, the program reads through the dictionary using a maximum loop to find the most prolific committer."""
 2 | 
 3 | 
 4 | 
 5 | #Use mbox-short.txt as File name
 6 | name = input("Enter file:")
 7 | tekst = open(name)
 8 | dic = {}
 9 | 
10 | for lines in tekst:
11 |     if lines.startswith("From "):
12 |         words = lines.split()
13 |         email = words[1]
14 |         dic[email] = dic.get(email, 0)+1
15 |                    
16 | i = None
17 | j = None
18 | 
19 | for k, v in dic.items():
20 |     if j is None or j < v:
21 |         j = v
22 |         i = k
23 | print(i, j)


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Quiz/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/2-Python-Data-Structure/Quiz/.DS_Store


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Quiz/Week 1 Chapter 6.txt:
--------------------------------------------------------------------------------
 1 | 1.What does the following Python Program print out?
 2 | str1 = "Hello"
 3 | str2 = 'there'
 4 | bob = str1 + str2
 5 | print(bob)
 6 | ==>Hellothere
 7 | 
 8 | 2.What does the following Python program print out?
 9 | x = '40'
10 | y = int(x) + 2
11 | print(y)
12 | ==>42
13 | 
14 | 3.How would you use the index operator [] to print out the letter q from the following string?
15 | x = 'From marquard@uct.ac.za'
16 | ==>print(x[8])
17 | 
18 | 4.How would you use string slicing [:] to print out 'uct' from the following string?
19 | x = 'From marquard@uct.ac.za'
20 | ==>print(x[14:17])
21 | 
22 | 5.What is the iteration variable in the following Python code?
23 | for letter in 'banana' :
24 |     print(letter)
25 | ==>letter
26 | 
27 | 6.What does the following Python code print out?
28 | print(len('banana')*7)
29 | ==>42
30 | 
31 | 7.How would you print out the following variable in all upper case in Python?
32 | greet = 'Hello Bob'
33 | ==>print(greet.upper())
34 | 
35 | 8.Which of the following is not a valid string method in Python?
36 | ==>boldface()
37 | 
38 | 9.What will the following Python code print out?
39 | data = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'
40 | pos = data.find('.')
41 | print(data[pos:pos+3])
42 | ==>.ma
43 | 
44 | 10.Question 10
45 | Which of the following string methods removes whitespace from both the beginning and end of a string?
46 | ==>strip()


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Quiz/Week 3 Chapter 7.txt:
--------------------------------------------------------------------------------
 1 | 1.Given the architecture and terminology we introduced in Chapter 1, where are files stored?
 2 | ==>Secondary memory
 3 | 
 4 | 2.What is stored in a "file handle" that is returned from a successful open() call?
 5 | ==> The handle is a connection to the file's data
 6 | 
 7 | 3.What do we use the second parameter of the open() call to indicate?
 8 | ==> Whether we want to read data from the file or write data to the file
 9 | 
10 | 4.What Python function would you use if you wanted to prompt the user for a file name to open?
11 | ==>input()
12 | 
13 | 5.What is the purpose of the newline character in text files?
14 | ==>It indicates the end of one line of text and the beginning of another line of text
15 | 
16 | 6.If we open a file as follows: xfile = open('mbox.txt'). What statement would we use to read the file one line at a time?
17 | ==>for line in xfile:
18 | 
19 | 7.What is the purpose of the following Python code? fhand = open('mbox.txt'); x = 0; for line in fhand: x = x + 1; print x
20 | ==> Count the lines in the file 'mbox.txt'
21 | 
22 | 8.If you write a Python program to read a text file and you see extra blank lines in the output that are not present in the file input as shown below, what Python string function will likely solve the problem?. 
23 | From: stephen.marquard@uct.ac.za; 
24 | From: louis@media.berkeley.edu; 
25 | From: zqian@umich.edu; 
26 | From: rjlowe@iupui.edu ...
27 | ==> rstrip()
28 | 
29 | 9.The following code sequence fails with a traceback when the user enters a file that does not exist. How would you avoid the traceback and make it so you could print out your own error message when a bad file name was entered? 
30 | fname = raw_input('Enter the file name: '); 
31 | fhand = open(fname)
32 | ==> try / except
33 | 
34 | 10.What does the following Python code do? 
35 | fhand = open('mbox-short.txt'); 
36 | inp = fhand.read()
37 | ==>Reads the entire file into the variable inp as a string
38 | 


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Quiz/Week 4 Chapter 8.txt:
--------------------------------------------------------------------------------
 1 | 1.How are "collection" variables different from normal variables?
 2 | ==> Collection variables can store multiple values in a single variable
 3 | 
 4 | 2.What are the Python keywords used to construct a loop to iterate through a list?
 5 | ==> for/in
 6 | 3.For the following list, how would you print out 'Sally'? 
 7 | friends = [ 'Joseph', 'Glenn', 'Sally']
 8 | ==> print(friends[2])
 9 | 4.	fruit = 'Banana' 
10 | 	fruit[0] = 'b'; 
11 | 	print fruit
12 | ==> Nothing would print the program fails with a traceback
13 | 
14 | 5.Which of the following Python statements would print out the length of a list stored in the variable data?
15 | ==> print(len(data))
16 | 
17 | 6.What type of data is produced when you call the range() function? 
18 | x = range(5)
19 | ==> A list of integers
20 | 
21 | 7.What does the following Python code print out? 
22 | a = [1, 2, 3]; 
23 | b = [4, 5, 6]; 
24 | c = a + b; 
25 | print(len(c))
26 | ==> 6
27 | 
28 | 8.Which of the following slicing operations will produce the list [12, 3]? 
29 | t = [9, 41, 12, 3, 74, 15]
30 | ==> t[2:4]
31 | 
32 | 9.What list method adds a new item to the end of an existing list?
33 | ==> append()
34 | 
35 | 10.What will the following Python code print out? 
36 | friends = [ 'Joseph', 'Glenn', 'Sally' ]; 
37 | friends.sort(); 
38 | print(friends[0])
39 | ==> Glenn


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Quiz/Week 5 Chapter 9.txt:
--------------------------------------------------------------------------------
 1 | 1.How are Python dictionaries different from Python lists?
 2 | ==> Python lists are indexed using integers and dictionaries can use strings as indexes
 3 | 
 4 | 2.What is a term commonly used to describe the Python dictionary feature in other programming languages?
 5 | ==> Associative arrays
 6 | 
 7 | 3.What would the following Python code print out? 
 8 | stuff = dict(); 
 9 | print(stuff['candy'])
10 | ==> The program would fail with a traceback
11 | 
12 | 4.What would the following Python code print out? stuff = dict(); print stuff.get('candy',-1)
13 | ==> -1
14 | 
15 | 5.(T/F)When you add items to a dictionary they remain in the order in which you added them.
16 | ==> False
17 | 
18 | 6.What is a common use of Python dictionaries in a program?
19 | ==> Building a histogram counting the occurrences of various strings in a file
20 | 
21 | 7.Which of the following lines of Python is equivalent to the following sequence of statements assuming that counts is a dictionary? 
22 | if key in counts: 
23 | 	counts[key] = counts[key] + 1 
24 | else: 
25 | 	counts[key] = 1
26 | ==> counts[key] = counts.get(key,0) + 1
27 | 
28 | 8.In the following Python, what does the for loop iterate through? 
29 | x = dict() ... 
30 | for y in x : ...
31 | ==> It loops through the keys in the dictionary
32 | 
33 | 9.Which method in a dictionary object gives you a list of the values in the dictionary?
34 | ==> values()
35 | 
36 | 10.What is the purpose of the second parameter of the get() method for Python dictionaries?
37 | ==> To provide a default value if the key is not found


--------------------------------------------------------------------------------
/2-Python-Data-Structure/Quiz/Week 6 Chapter 10.txt:
--------------------------------------------------------------------------------
 1 | 1.What is the difference between a Python tuple and Python list?
 2 | ==> Lists are mutable and tuples are not mutable
 3 | 
 4 | 2.Which of the following methods work both in Python lists and Python tuples?
 5 | ==> index()
 6 | 
 7 | 3.What will end up in the variable y after this code is executed? 
 8 | x , y = 3, 4
 9 | ==> 4
10 | 
11 | 4.In the following Python code, what will end up in the variable y? 
12 | x = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}; 
13 | y = x.items()
14 | ==> A list of tuples
15 | 
16 | 5.Which of the following tuples is greater than x in the following Python sequence? 
17 | x = (5, 1, 3); 
18 | if ??? > x :
19 | 	 ...
20 | ==> (6, 0, 0)
21 | 
22 | 6.What does the following Python code accomplish, assuming the c is a non-empty dictionary? 
23 | tmp = list(); 
24 | for k, v in c.items(): 
25 | 	tmp.append( (v, k))
26 | ==> It creates a list of tuples where each tuple is a value, key pair
27 | 
28 | 7.If the variable data is a Python list, how do we sort it in reverse order?
29 | ==> data.sort(reverse=True)
30 | 
31 | 8.Using the following tuple, how would you print 'Wed'? 
32 | days = ('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')
33 | ==> print(days[2])
34 | 
35 | 9.In the following Python loop, why are there two iteration variables (k and v)? 
36 | c = {'a':10, 'b':1, 'c':22}; 
37 | for k, v in c.items() : 
38 | 	...
39 | ==> Because the items() method in dictionaries returns a list of tuples
40 | 
41 | 10.Given that Python lists and Python tuples are quite similar - when might you prefer to use a tuple over a list?
42 | ==> For a temporary variable that you will use and discard without modifying


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/3-Using-Python-To-Access_Web-Data/.DS_Store


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Assignment/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/3-Using-Python-To-Access_Web-Data/Assignment/.DS_Store


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Assignment/Assignment 2 Extracting Data With Regular Expressions.txt:
--------------------------------------------------------------------------------
 1 | #"""Finding Numbers in a Haystack
 2 | 
 3 | In this assignment you will read through and parse a file with text and numbers. You will extract all the numbers in the file and compute the sum of the numbers.
 4 | 
 5 | Data Files
 6 | We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.
 7 | 
 8 | Sample data: http://py4e-data.dr-chuck.net/regex_sum_42.txt (There are 90 values with a sum=445833)
 9 | Actual data: http://py4e-data.dr-chuck.net/regex_sum_97406.txt (There are 67 values and the sum ends with 785)
10 | These links open in a new window. Make sure to save the file into the same folder as you will be writing your Python program. Note: Each student will have a distinct data file for the assignment - so only use your own data file for analysis."""
11 | 
12 | 
13 | 
14 | #Answer of this Question is 
15 | #305785
16 | #copy all content from "http://py4e-data.dr-chuck.net/regex_sum_97406.txt" file and create text file then run following code
17 | 
18 | 
19 | 
20 | import re
21 | 
22 | sum = 0
23 | 
24 | file = open('regex_sum_97406', 'r')
25 | for line in file:
26 |     numbers = re.findall('[0-9]+', line)
27 |     if not numbers:
28 |         continue
29 |     else:
30 |         for number in numbers:
31 |             sum += int(number)
32 | 
33 | print(sum)
34 | 
35 | 
36 | 


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Assignment/Assignment 3 Understanding the Request : Response Cycle.txt:
--------------------------------------------------------------------------------
 1 | #"""Exploring the HyperText Transport Protocol
 2 | 
 3 | You are to retrieve the following document using the HTTP protocol in a way that you can examine the HTTP Response headers.
 4 | 
 5 | http://data.pr4e.org/intro-short.txt
 6 | There are three ways that you might retrieve this web page and look at the response headers:
 7 | 
 8 | Preferred: Modify the socket1.py program to retrieve the above URL and print out the headers and data. Make sure to change the code to retrieve the above URL - the values are different for each URL.
 9 | Open the URL in a web browser with a developer console or FireBug and manually examine the headers that are returned.
10 | Use the telnet program as shown in lecture to retrieve the headers and content.
11 | Enter the header values in each of the fields below and press "Submit"."""
12 | 
13 | 
14 | #Server: Apache/2.4.18 (Ubuntu)
15 | #Last-Modified: Sat, 13 May 2017 11:22:22 GMT
16 | #ETag: "1d3-54f6609240717"
17 | #Content-Length: 467
18 | #Cache-Control: max-age=0, no-cache, no-store, must-revalidate
19 | #Content-Type: text/plain
20 | 
21 | 
22 | import socket
23 | 
24 | mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
25 | mysock.connect(('data.pr4e.org', 80))
26 | # cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
27 | 
28 | cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.0\r\n\r\n'.encode()
29 | 
30 | mysock.send(cmd)
31 | 
32 | while True:
33 |     data = mysock.recv(512)
34 |     if (len(data) < 1):
35 |         break
36 |     print(data.decode(),end='')
37 | 
38 | mysock.close()
39 | 


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Assignment/Assignment 4.1 Scraping HTML Data with BeautifulSoup.txt:
--------------------------------------------------------------------------------
 1 | #"""Scraping Numbers from HTML using BeautifulSoup In this assignment you will write a Python program similar to http://www.py4e.com/code3/urllink2.py. The program will use urllib to read the HTML from the data files below, and parse the data, extracting numbers and compute the sum of the numbers in the file.
 2 | 
 3 | We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.
 4 | 
 5 | Sample data: http://py4e-data.dr-chuck.net/comments_42.html (Sum=2553)
 6 | Actual data: http://py4e-data.dr-chuck.net/comments_97408.html (Sum ends with 93)
 7 | You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis."""
 8 | 
 9 | 
10 | #Enter the url to scrape -  http://py4e-data.dr-chuck.net/comments_97408.html
11 | #Count  50
12 | #Sum  2893
13 | 
14 | 
15 | import urllib.request as ur
16 | from bs4 import *
17 | 
18 | url = input('Enter the url to scrape - ')
19 | 
20 | html = ur.urlopen(url).read()
21 | soup = BeautifulSoup(html, 'html.parser')
22 | 
23 | count_of_spans = 0
24 | sum = 0
25 | 
26 | spans = soup('span')
27 | for span in spans:
28 |     sum += int(span.contents[0])
29 |     count_of_spans += 1
30 | 
31 | print('Count ', count_of_spans)
32 | print('Sum ', sum)
33 | 


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Assignment/Assignment 4.2 Following Links in HTML Using BeautifulSoup.txt:
--------------------------------------------------------------------------------
 1 | #"""Following Links in Python
 2 | 
 3 | In this assignment you will write a Python program that expands on http://www.py4e.com/code3/urllinks.py. The program will use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.
 4 | 
 5 | We provide two files for this assignment. One is a sample file where we give you the name for your testing and the other is the actual data you need to process for the assignment
 6 | 
 7 | Sample problem: Start at http://py4e-data.dr-chuck.net/known_by_Fikret.html 
 8 | Find the link at position 3 (the first name is 1). Follow that link. Repeat this process 4 times. The answer is the last name that you retrieve.
 9 | Sequence of names: Fikret Montgomery Mhairade Butchi Anayah 
10 | Last name in sequence: Anayah
11 | Actual problem: Start at: http://py4e-data.dr-chuck.net/known_by_Annick.html 
12 | Find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.
13 | Hint: The first character of the name of the last page that you will load is: M"""
14 | 
15 | 
16 | # Enter URL: http://py4e-data.dr-chuck.net/known_by_Annick.html
17 | # Enter count: 7
18 | # Enter position: 18
19 | # Retrieving:  http://py4e-data.dr-chuck.net/known_by_Annick.html
20 | # Retrieving:  http://py4e-data.dr-chuck.net/known_by_Nicki.html
21 | # Retrieving:  http://py4e-data.dr-chuck.net/known_by_Peebles.html
22 | # Retrieving:  http://py4e-data.dr-chuck.net/known_by_Chantelle.html
23 | # Retrieving:  http://py4e-data.dr-chuck.net/known_by_Kamila.html
24 | # Retrieving:  http://py4e-data.dr-chuck.net/known_by_Domenico.html
25 | # Retrieving:  http://py4e-data.dr-chuck.net/known_by_Nassir.html
26 | # Last Url:  http://py4e-data.dr-chuck.net/known_by_Mhea.html
27 | 
28 | ###########Final Answer is:
29 | #Name: Mhea
30 | 
31 | import urllib.request as ur
32 | from bs4 import *
33 | 
34 | current_repeat_count = 0
35 | url = input('Enter URL: ')
36 | repeat_count = int(input('Enter count: '))
37 | position = int(input('Enter position: '))
38 | 
39 | 
40 | def parse_html(url):
41 |     html = ur.urlopen(url).read()
42 |     soup = BeautifulSoup(html, 'html.parser')
43 |     tags = soup('a')
44 |     return tags
45 | 
46 | while current_repeat_count < repeat_count:
47 |     print('Retrieving: ', url)
48 |     tags = parse_html(url)
49 |     for index, item in enumerate(tags):
50 |         if index == position - 1:
51 |             url = item.get('href', None)
52 |             name = item.contents[0]
53 |             break
54 |         else:
55 |             continue
56 |     current_repeat_count += 1
57 | print('Last Url: ', url)


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Assignment/Assignment 5 Extracting Data from XML.txt:
--------------------------------------------------------------------------------
 1 | #"""Extracting Data from XML
 2 | 
 3 | In this assignment you will write a Python program somewhat similar to http://www.py4e.com/code3/geoxml.py. The program will prompt for a URL, read the XML data from that URL using urllib and then parse and extract the comment counts from the XML data, compute the sum of the numbers in the file.
 4 | 
 5 | We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.
 6 | 
 7 | Sample data: http://py4e-data.dr-chuck.net/comments_42.xml (Sum=2553)
 8 | Actual data: http://py4e-data.dr-chuck.net/comments_97410.xml (Sum ends with 59)
 9 | You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis."""
10 | 
11 | 
12 | 
13 | 
14 | #Enter location:  http://py4e-data.dr-chuck.net/comments_97410.xml
15 | #Retrieving  http://py4e-data.dr-chuck.net/comments_97410.xml
16 | #Retrieved 4220 characters
17 | #Count: 50
18 | #Sum: 2259
19 | 
20 | 
21 | import urllib.request as ur
22 | import xml.etree.ElementTree as et
23 | 
24 | url = input('Enter location: ')
25 | # 'http://python-data.dr-chuck.net/comments_42.xml'
26 | 
27 | total_number = 0
28 | sum = 0
29 | 
30 | print('Retrieving', url)
31 | xml = ur.urlopen(url).read()
32 | print('Retrieved', len(xml), 'characters')
33 | 
34 | tree = et.fromstring(xml)
35 | counts = tree.findall('.//count')
36 | for count in counts:
37 |     sum += int(count.text)
38 |     total_number += 1
39 | 
40 | print('Count:', total_number)
41 | print('Sum:', sum)


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Assignment/Assignment 6.1 Extracting Data from JSON.txt:
--------------------------------------------------------------------------------
 1 | #"""Extracting Data from JSON
 2 | 
 3 | In this assignment you will write a Python program somewhat similar to http://www.py4e.com/code3/json2.py. The program will prompt for a URL, read the JSON data from that URL using urllib and then parse and extract the comment counts from the JSON data, compute the sum of the numbers in the file and enter the sum below:
 4 | We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.
 5 | 
 6 | Sample data: http://py4e-data.dr-chuck.net/comments_42.json (Sum=2553)
 7 | Actual data: http://py4e-data.dr-chuck.net/comments_97411.json (Sum ends with 65)
 8 | You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis.
 9 | """
10 | 
11 | 
12 | #Enter location: http://py4e-data.dr-chuck.net/comments_97411.json
13 | #Retrieving  http://py4e-data.dr-chuck.net/comments_97411.json
14 | #Retrieved 2711 characters
15 | #Count: 50
16 | #Sum: 2365
17 | 
18 | 
19 | 
20 | import urllib.request as ur
21 | import json
22 | 
23 | # json_url = 'http://python-data.dr-chuck.net/comments_42.json'
24 | 
25 | json_url = input("Enter location: ")
26 | print("Retrieving ", json_url)
27 | data = ur.urlopen(json_url).read().decode('utf-8')
28 | print('Retrieved', len(data), 'characters')
29 | json_obj = json.loads(data)
30 | 
31 | sum = 0
32 | total_number = 0
33 | 
34 | for comment in json_obj["comments"]:
35 |     sum += int(comment["count"])
36 |     total_number += 1
37 | 
38 | print('Count:', total_number)
39 | print('Sum:', sum)


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Assignment/Assignment 6.2 Using the GeoJSON API.txt:
--------------------------------------------------------------------------------
 1 | #"""Calling a JSON API
 2 | 
 3 | In this assignment you will write a Python program somewhat similar to http://www.py4e.com/code3/geojson.py. The program will prompt for a location, contact a web service and retrieve JSON for the web service and parse that data, and retrieve the first place_id from the JSON. A place ID is a textual identifier that uniquely identifies a place as within Google Maps.
 4 | API End Points
 5 | 
 6 | To complete this assignment, you should use this API endpoint that has a static subset of the Google Data:
 7 | 
 8 | http://py4e-data.dr-chuck.net/geojson?
 9 | This API uses the same parameter (address) as the Google API. This API also has no rate limit so you can test as often as you like. If you visit the URL with no parameters, you get a list of all of the address values which can be used with this API.
10 | To call the API, you need to provide address that you are requesting as the address= parameter that is properly URL encoded using the urllib.urlencode() fuction as shown in http://www.py4e.com/code3/geojson.py"""
11 | 
12 | 
13 | 
14 | #Enter location: University of Twente
15 | #Retrieving  http://python-data.dr-chuck.net/geojson?#sensor=false&address=University+of+Twente
16 | #Retrieved 2124 characters
17 | #Place id ChIJPZ9qp0tvv4cRb5oLVI9wra8
18 | 
19 | 
20 | 
21 | import urllib.request as ur
22 | import urllib.parse as up
23 | import json
24 | 
25 | serviceurl = "http://python-data.dr-chuck.net/geojson?"
26 | 
27 | address_input = input("Enter location: ")
28 | params = {"sensor": "false", "address": address_input}
29 | url = serviceurl + up.urlencode(params)
30 | print("Retrieving ", url)
31 | data = ur.urlopen(url).read().decode('utf-8')
32 | print('Retrieved', len(data), 'characters')
33 | json_obj = json.loads(data)
34 | 
35 | place_id = json_obj["results"][0]["place_id"]
36 | print("Place id", place_id)


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Quiz/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/3-Using-Python-To-Access_Web-Data/Quiz/.DS_Store


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Quiz/Week 2 Regular Expressions.txt:
--------------------------------------------------------------------------------
 1 | 1. Which of the following regular expressions would extract 'uct.ac.za' from this string using re.findall?
 2 | From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
 3 | ==> @(\S+)
 4 | 
 5 | 2.Which of the following is the way we match the "start of a line" in a regular expression?
 6 | ==> ^
 7 | 
 8 | 3.What would the following mean in a regular expression? [a-z0-9]
 9 | ==> Match a lowercase letter or a digit
10 | 
11 | 4.What is the type of the return value of the re.findall() method?
12 | ==> A list of strings
13 | 
14 | 5.What is the "wild card" character in a regular expression (i.e., the character that matches any character)?
15 | ==> .
16 | 
17 | 6.What is the difference between the "+" and "*" character in regular expressions?
18 | ==> The "+" matches at least one character and the "*" matches zero or more characters
19 | 
20 | 7.What does the "[0-9]+" match in a regular expression?
21 | ==> One or more digits
22 | 
23 | 8.What does the following Python sequence print out?
24 | x = 'From: Using the : character'
25 | y = re.findall('^F.+:', x)
26 | print(y)
27 | ==> [‘From: Using the :']
28 | 
29 | 9.What character do you add to the "+" or "*" to indicate that the match is to be done in a non-greedy manner?
30 | ==> ?
31 | 
32 | 10.Given the following line of text:
33 | From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
34 | What would the regular expression '\S+?@\S+' match?
35 | ==> stephen.marquard@uct.ac.za
36 | 
37 | 11.Which of the following best describes "Regular Expressions"?
38 | ==> A small programming language unto itself
39 | 
40 | 12.What will the '\$' regular expression match?
41 | ==> A new line at the end of a line(wrong)
42 | ==> The end of a line(wrong)


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Quiz/Week 3 Networks and Sockets.txt:
--------------------------------------------------------------------------------
 1 | 1.What do we call it when a browser uses the HTTP protocol to load a file or page from a server and display it in the browser?
 2 | ==> The Request/Response Cycle
 3 | 
 4 | 2.Which of the following is most similar to a TCP port number?
 5 | ==> A telephone extension
 6 | 
 7 | 3.What must you do in Python before opening a socket?
 8 | ==> import socket
 9 | 
10 | 4.Which of the following TCP sockets is most commonly used for the web protocol (HTTP)?
11 | ==> 80
12 | 
13 | 5.Which of the following is most like an open socket in an application?
14 | ==> An "in-progress" phone conversation
15 | 
16 | 6.What does the "H" of HTTP stand for?
17 | ==> HyperText
18 | 
19 | 7.What is an important aspect of an Application Layer protocol like HTTP?
20 | ==> Which application talks first? The client or server?
21 | 
22 | 8.What are the three parts of this URL (Uniform Resource Locator)?
23 | http://www.dr-chuck.com/page1.htm
24 | ==> Protocol, host, and document
25 | 
26 | 9.When you click on an anchor tag in a web page like below, what HTTP request is sent to the server?
27 | <p>Please click <a href="page1.htm">here</a>.</p>
28 | ==> GET
29 | 
30 | 10.Which organization publishes Internet Protocol Standards?
31 | ==> IETF
32 | 
33 | 11. In a client-server application on the web using sockets, which must come up first?
34 | ==> server
35 | 
36 | 12.What do we call it when a browser uses the HTTP protocol to load a file or page from a server and display it in the browser?
37 | ==>The Request/Response Cycle


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Quiz/Week 4 Reading Web Data From Python.txt:
--------------------------------------------------------------------------------
 1 | 1.Which of the following Python data structures is most similar to the value returned in this line of Python:
 2 | x = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
 3 | ==>file handle
 4 | 
 5 | 2.In this Python code, which line actually reads the data?
 6 | import socket
 7 | 
 8 | mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 9 | mysock.connect(('data.pr4e.org', 80))
10 | cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
11 | mysock.send(cmd)
12 | 
13 | while True:
14 |     data = mysock.recv(512)
15 |     if (len(data) < 1):
16 |         break
17 |     print(data.decode())
18 | mysock.close()
19 | 
20 | ==>mysock.recv()
21 | 
22 | 3.Which of the following regular expressions would extract the URL from this line of HTML:
23 | <p>Please click <a href="http://www.dr-chuck.com">here</a></p>
24 | ==> href="(.+)"
25 | 
26 | 4.In this Python code, which line is most like the open() call to read a file:
27 | import socket
28 | 
29 | mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
30 | mysock.connect(('data.pr4e.org', 80))
31 | cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
32 | mysock.send(cmd)
33 | 
34 | while True:
35 |     data = mysock.recv(512)
36 |     if (len(data) < 1):
37 |         break
38 |     print(data.decode())
39 | mysock.close()
40 | 
41 | ==>mysock.connect()
42 | 
43 | 5.Which HTTP header tells the browser the kind of document that is being returned?
44 | ==>Content-Type:
45 | 
46 | 6.What should you check before scraping a web site?
47 | ==> That the web site allows scraping
48 | 
49 | 7.What is the purpose of the BeautifulSoup Python library?
50 | ==>It repairs and parses HTML to make it easier for a program to understand
51 | 
52 | 8.What ends up in the "x" variable in the following code:
53 | html = urllib.request.urlopen(url).read()
54 | soup = BeautifulSoup(html, 'html.parser')
55 | x = soup('a')
56 | ==> A list of all the anchor tags (<a..) in the HTML from the URL
57 | 
58 | 9.What is the most common Unicode encoding when moving data between systems?
59 | ==> UTF-8
60 | 
61 | 10.What is the decimal (Base-10) numeric value for the upper case letter "G" in the ASCII character set?
62 | ==> 71
63 | 
64 | 11.What word does the following sequence of numbers represent in ASCII:
65 | 108, 105, 110, 101
66 | ==> line
67 | 
68 | 12.How are strings stored internally in Python 3?
69 | ==>Unicode
70 | 
71 | 13.Question 13
72 | When reading data across the network (i.e. from a URL) in Python 3, what method must be used to convert it to the internal format used by strings?
73 | ==> decode()
74 | 
75 | 14. What is the ASCII character that is associated with the decimal value 42?
76 | ==> *


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Quiz/Week 5 eXtensible Markup Language.txt:
--------------------------------------------------------------------------------
 1 | 1.What is the name of the Python 2.x library to parse XML data?
 2 | ==>xml.etree.ElementTree
 3 | 
 4 | 2.What is the method to cause Python to parse XML that is stored in a string?
 5 | ==> fromstring()
 6 | 
 7 | 3.In this XML, which are the "complex elements"?
 8 | <people>
 9 |     <person>
10 |        <name>Chuck</name>
11 |        <phone>303 4456</phone>
12 |     </person>
13 |     <person>
14 |        <name>Noah</name>
15 |        <phone>622 7421</phone>
16 |     </person>
17 | </people>
18 | ==> 
19 | -phone
20 | -name
21 | 
22 | 4.In the following XML, which are attributes?
23 | <person>
24 |   <name>Chuck</name>
25 |   <phone type="intl">
26 |      +1 734 303 4456
27 |   </phone>
28 |   <email hide="yes" />
29 | </person>
30 | ==>
31 | -hide
32 | -type
33 | 
34 | 5.In the following XML, which node is the parent node of node e
35 | <a>
36 |   <b>X</b>
37 |   <c>
38 |     <d>Y</d>
39 |     <e>Z</e>
40 |   </c>
41 | </a>
42 | ==>c
43 | 
44 | 6.Looking at the following XML, what text value would we find at path "/a/c/e"
45 | <a>
46 |   <b>X</b>
47 |   <c>
48 |     <d>Y</d>
49 |     <e>Z</e>
50 |   </c>
51 | </a>
52 | ==> Z
53 | 
54 | 7.What is the purpose of XML Schema?
55 | ==> To establish a contract as to what is valid XML
56 | 
57 | 8.If you were building an XML Schema and wanted to limit the values allowed in an xs:string field to only those in a particular list, what XML tag would you use in your XML Schema definition?
58 | ==>xs:element (wrong)
59 | ==> xs:sequence
60 | 
61 | 9.What is a good time zone to use when computers are exchanging data over APIs?
62 | ==> Universal Time / GMT
63 | 
64 | 10.Which of the following dates is in ISO8601 format?
65 | ==>2002-05-30T09:30:10Z
66 | 
67 | 11.What is "serialization" when we are talking about web services?
68 | ==> The act of taking data stored in a program and formatting it so it can be sent across the network
69 | 
70 | 12. Which of the following are not commonly used serialization formats?
71 | ==>
72 | -HTTP
73 | -dictionaries
74 | -TCP
75 | 
76 | 13.For this XML Schema:
77 | <xs:complexType name=”person”>
78 |   <xs:sequence>
79 |     <xs:element name="lastname" type="xs:string"/>
80 |     <xs:element name="age" type="xs:integer"/>
81 |     <xs:element name="dateborn" type="xs:date"/>
82 |   </xs:sequence>
83 | </xs:complexType>
84 | And this XML,
85 | <person>
86 |    <lastname>Severance</lastname>
87 |    <Age>17</Age>
88 |    <dateborn>2001-04-17</dateborn>
89 | </person>
90 | 
91 | Which tag is incorrect?
92 | ==> Age
93 | 


--------------------------------------------------------------------------------
/3-Using-Python-To-Access_Web-Data/Quiz/Week 6 Rest, Json, and APIs.txt:
--------------------------------------------------------------------------------
 1 | 1.Who is credited with getting the JSON movement started?
 2 | ==>Douglas Crockford
 3 | 
 4 | 2.What Python library do you have to import to parse and handle JSON?
 5 | ==>import json
 6 | 
 7 | 3.What is the method used to parse a string containing JSON data so that you can work with the data in Python?
 8 | ==>json.loads()
 9 | 
10 | 4.What kind of variable will you get in Python when the following JSON is parsed:
11 | { "id" : "001",
12 |   "x" : "2",
13 |   "name" : "Chuck"
14 | }
15 | ==>A dictionary with three key / value pairs
16 | 
17 | 5.Which of the following is not true about the service-oriented approach?
18 | ==>An application runs together all in one place
19 | 
20 | 6.Which of these two web service approaches is preferred in most modern service-oriented applications?
21 | ==>REST - Representational state transfer
22 | 
23 | 7.What library call do you make to append properly encoded parameters to the end of a URL like the following:
24 | http://maps.googleapis.com/maps/api/geocode/json?sensor=false&address=Ann+Arbor%2C+MI
25 | ==>urllib.parse.urlencode()
26 | 
27 | 8.What happens when you exceed the Google geocoding API rate limit?
28 | ==>You cannot use the API for 24 hours
29 | 
30 | 9.What protocol does Twitter use to protect its API?
31 | ==>OAuth
32 | 
33 | 10.Question 10
34 | What header does Twitter use to tell you how many more API requests you can make before you will be rate limited?
35 | ==>x-rate-limit-remaining
36 | 
37 | 11.Which of the following is true about an API?
38 | ==>An API is a contract that defines how to use a software library
39 | 
40 | 12.Which of the following is a web services approach used by the Twitter API?
41 | ==>REST


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/.DS_Store


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/.DS_Store


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 2 Assignment 2.1 Our First Database.py:
--------------------------------------------------------------------------------
 1 | """
 2 | To get credit for this assignment, perform the instructions below and enter the code you get here:
 3 | 
 4 | (Hint: starts with 467)
 5 | 
 6 | Instructions
 7 | create a SQLITE database or use an existing database and create a table in the database called "Ages":
 8 | 
 9 | CREATE TABLE Ages (
10 |   name VARCHAR(128),
11 |   age INTEGER
12 | )
13 | 
14 | Then make sure the table is empty by deleting any rows that you previously inserted, and insert these rows and only these rows with the following commands:
15 | 
16 | DELETE FROM Ages;
17 | INSERT INTO Ages (name, age) VALUES ('Mara', 28);
18 | INSERT INTO Ages (name, age) VALUES ('Otto', 33);
19 | INSERT INTO Ages (name, age) VALUES ('Fyn', 31);
20 | INSERT INTO Ages (name, age) VALUES ('Neshawn', 17);
21 | 
22 | Once the inserts are done, run the following SQL command:
23 | SELECT hex(name || age) AS X FROM Ages ORDER BY X
24 | 
25 | Find the first row in the resulting record set and enter the long string that looks like 53656C696E613333.
26 | 
27 | Answer ==> The first row in the resulting record set : 46796E3331
28 | """
29 | import sqlite3
30 | 
31 | # Create a database in RAM
32 | db = sqlite3.connect(':memory:')
33 | 
34 | # Get a cursor object
35 | cursor = db.cursor()
36 | cursor.execute('''
37 |     CREATE TABLE Ages ( 
38 |   name VARCHAR(128), 
39 |   age INTEGER
40 | )''')
41 | 
42 | 
43 | cursor.execute('''DELETE FROM Ages''')
44 | 
45 | # Insert users
46 | cursor.execute('''INSERT INTO Ages (name, age) VALUES ('Mara', 28)''')
47 | cursor.execute('''INSERT INTO Ages (name, age) VALUES ('Otto', 33)''')
48 | cursor.execute('''INSERT INTO Ages (name, age) VALUES ('Fyn', 31)''')
49 | cursor.execute('''INSERT INTO Ages (name, age) VALUES ('Neshawn', 17)''')
50 | 
51 | #Select user
52 | cursor.execute('''SELECT hex(name || age) AS X FROM Ages ORDER BY X''')
53 | 
54 | #retrieve the first row
55 | user1 = cursor.fetchone()
56 | #Print the first column retrieved(user's name)
57 | print("The first row in the resulting record set : "+user1[0])
58 | 
59 | #Commit changes into database
60 | db.commit()
61 | #Close database
62 | db.close()
63 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 2 Assignment 2.2 (Counting Email In database)/.mbox.txt.icloud:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 2 Assignment 2.2 (Counting Email In database)/.mbox.txt.icloud


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 2 Assignment 2.2 (Counting Email In database)/Week 2 Assignment 2.2 (Counting Email In database).py:
--------------------------------------------------------------------------------
 1 | """
 2 | To get credit for this assignment, perform the instructions below and upload your SQLite3 database here:
 3 | 
 4 | (Must have a .sqlite suffix)
 5 | 
 6 | Hint: The top organizational count is 536.
 7 | You do not need to export or convert the database - simply upload the .sqlite file that your program creates. See the example code for the use of the connect() statement.
 8 | 
 9 | Counting Organizations
10 | This application will read the mailbox data (mbox.txt) and count the number of email messages per organization (i.e. domain name of the email address) using a database with the following schema to maintain the counts.
11 | 
12 | CREATE TABLE Counts (org TEXT, count INTEGER)
13 | When you have run the program on mbox.txt upload the resulting database file above for grading.
14 | If you run the program multiple times in testing or with dfferent files, make sure to empty out the data before each run.
15 | 
16 | You can use this code as a starting point for your application: http://www.py4e.com/code3/emaildb.py.
17 | 
18 | The data file for this application is the same as in previous assignments: http://www.py4e.com/code3/mbox.txt.
19 | 
20 | Because the sample code is using an UPDATE statement and committing the results to the database as each record is read in the loop, it might take as long as a few minutes to process all the data. The commit insists on completely writing all the data to disk every time it is called.
21 | 
22 | The program can be speeded up greatly by moving the commit operation outside of the loop. In any database program, there is a balance between the number of operations you execute between commits and the importance of not losing the results of operations that have not yet been committed.
23 | """
24 | 
25 | 
26 | import sqlite3
27 | 
28 | conn = sqlite3.connect('emaildb.sqlite')
29 | cur = conn.cursor()
30 | 
31 | cur.execute('''
32 | DROP TABLE IF EXISTS Counts''')
33 | 
34 | cur.execute('''
35 | CREATE TABLE Counts (org TEXT, count INTEGER)''')
36 | 
37 | fname = input('Enter file name: ')
38 | if ( len(fname) < 1 ) : fname = 'mbox.txt'
39 | fh = open(fname)
40 | for line in fh:
41 |     if not line.startswith('From: ') : continue
42 |     pieces = line.split()
43 |     email = pieces[1]
44 |     parts = email.split('@')
45 |     org = parts[-1]
46 |     cur.execute('SELECT count FROM Counts WHERE org = ? ', (org, ))
47 |     row = cur.fetchone()
48 |     if row is None:
49 |         cur.execute('''INSERT INTO Counts (org, count) 
50 |                 VALUES ( ?, 1 )''', ( org, ) )
51 |     else :
52 |         cur.execute('UPDATE Counts SET count=count+1 WHERE org = ?',
53 |             (org, ))
54 |     conn.commit()
55 | 
56 | sqlstr = 'SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'
57 | 
58 | print
59 | print("Counts:")
60 | for row in cur.execute(sqlstr) :
61 |     print (str(row[0]), row[1])
62 | 
63 | cur.close()
64 | 
65 | 
66 | """
67 | Enter file name: mbox.txt
68 | Counts:
69 | iupui.edu 536
70 | umich.edu 491
71 | indiana.edu 178
72 | caret.cam.ac.uk 157
73 | vt.edu 110
74 | uct.ac.za 96
75 | media.berkeley.edu 56
76 | ufp.pt 28
77 | gmail.com 25
78 | et.gatech.edu 17
79 | """
80 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 2 Assignment 2.2 (Counting Email In database)/emaildb.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 2 Assignment 2.2 (Counting Email In database)/emaildb.sqlite


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 3 Assignment Multi-Table Database - Tracks/Week 3 Multi-Table Database - Tracks.py:
--------------------------------------------------------------------------------
  1 | """
  2 | To get credit for this assignment, perform the instructions below and upload your SQLite3 database here:
  3 | 
  4 | (Must have a .sqlite suffix)
  5 | You do not need to export or convert the database - simply upload the .sqlite file that your program creates. See the example code for the use of the connect() statement.
  6 | 
  7 | Musical Track Database
  8 | This application will read an iTunes export file in XML and produce a properly normalized database with this structure:
  9 | 
 10 | CREATE TABLE Artist (
 11 |     id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
 12 |     name    TEXT UNIQUE
 13 | );
 14 | 
 15 | CREATE TABLE Genre (
 16 |     id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
 17 |     name    TEXT UNIQUE
 18 | );
 19 | 
 20 | CREATE TABLE Album (
 21 |     id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
 22 |     artist_id  INTEGER,
 23 |     title   TEXT UNIQUE
 24 | );
 25 | 
 26 | CREATE TABLE Track (
 27 |     id  INTEGER NOT NULL PRIMARY KEY
 28 |         AUTOINCREMENT UNIQUE,
 29 |     title TEXT  UNIQUE,
 30 |     album_id  INTEGER,
 31 |     genre_id  INTEGER,
 32 |     len INTEGER, rating INTEGER, count INTEGER
 33 | );
 34 | If you run the program multiple times in testing or with different files, make sure to empty out the data before each run.
 35 | 
 36 | You can use this code as a starting point for your application: http://www.py4e.com/code3/tracks.zip. The ZIP file contains the Library.xml file to be used for this assignment. You can export your own tracks from iTunes and create a database, but for the database that you turn in for this assignment, only use the Library.xml data that is provided.
 37 | 
 38 | To grade this assignment, the program will run a query like this on your uploaded database and look for the data it expects to see:
 39 | 
 40 | SELECT Track.title, Artist.name, Album.title, Genre.name
 41 |     FROM Track JOIN Genre JOIN Album JOIN Artist
 42 |     ON Track.genre_id = Genre.ID and Track.album_id = Album.id
 43 |         AND Album.artist_id = Artist.id
 44 |     ORDER BY Artist.name LIMIT 3
 45 | The expected result of the modified query on your database is:
 46 | Select Language​▼
 47 | Track	Artist	Album	Genre
 48 | Chase the Ace	AC/DC	Who Made Who	Rock
 49 | D.T.	AC/DC	Who Made Who	Rock
 50 | For Those About To Rock (We Salute You)	AC/DC	Who Made Who	Rock
 51 | """
 52 | 
 53 | import sqlite3
 54 | import xml.etree.ElementTree as ET
 55 | 
 56 | 
 57 | #Function that we'll use to find the content of a specific field.
 58 | def find_field(track, wanted_field):
 59 |     """This function gets two parameters: track, a dictionary containing all
 60 |     the XML tags of a certain song, and wanted_field, a string representing the
 61 |     title of the tag we want to obtain.
 62 |     It works by finding a key tag with the text {wanted_field}, and
 63 |     returning the content of the following tag. If wanted_field doesn't
 64 |     match any tag, it returns a False"""
 65 | 
 66 |     #Variable we'll use to indicate when we've found wanted_field
 67 |     found = False
 68 | 
 69 |     for tag in track:
 70 |         if not found:
 71 |             #Looking for the wanted field
 72 |             if(tag.tag == "key" and tag.text == wanted_field):
 73 |                 found = True
 74 |         else:
 75 |             #After founding it, we return the content of the following
 76 |             #tag (the one with its value)
 77 |             return tag.text
 78 | 
 79 |     return False
 80 | 
 81 | 
 82 | 
 83 | #PART 1: PREPARING THE DATABASE
 84 | #Connecting to the file in which we want to store our db
 85 | conn = sqlite3.connect('tracks.sqlite')
 86 | cur = conn.cursor()
 87 | 
 88 | #Getting sure it is empty
 89 | #We can use "executescript" to execute several statements at the same time
 90 | cur.executescript("""
 91 |     DROP TABLE IF EXISTS Artist;
 92 |    
 93 |     DROP TABLE IF EXISTS Album; 
 94 |     DROP TABLE IF EXISTS Genre;
 95 |     DROP TABLE IF EXISTS Track
 96 |     """)
 97 | 
 98 | #Creating it
 99 | cur.executescript(''' CREATE TABLE Artist (
100 |     id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
101 |     name    TEXT UNIQUE
102 | );
103 | CREATE TABLE Genre (
104 |     id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
105 |     name    TEXT UNIQUE
106 | );
107 | CREATE TABLE Album (
108 |     id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
109 |     artist_id  INTEGER,
110 |     title   TEXT UNIQUE
111 | );
112 | CREATE TABLE Track (
113 |     id  INTEGER NOT NULL PRIMARY KEY 
114 |         AUTOINCREMENT UNIQUE,
115 |     title TEXT  UNIQUE,
116 |     album_id  INTEGER,
117 |     genre_id  INTEGER,
118 |     len INTEGER, rating INTEGER, count INTEGER
119 | );
120 | ''')
121 | 
122 | 
123 | #PART 2: INSERTING THE DATA
124 | #Getting the data and parsing it
125 | data_source = open("tracks/Library.xml")
126 | data = data_source.read()
127 | xml_data = ET.fromstring(data)
128 | 
129 | #Obtaining every tag with track data
130 | tracks_data = xml_data.findall("dict/dict/dict")
131 | 
132 | #Getting the values of the fields we'll insert
133 | for track in tracks_data:
134 |     title = find_field(track, "Name")
135 |     artist = find_field(track, "Artist")
136 |     genre = find_field(track, "Genre")
137 |     album = find_field(track, "Album")
138 |     length = find_field(track, "Total Time")
139 |     count = find_field(track, "Play Count")
140 |     rating = find_field(track, "Rating")
141 | 
142 |     #Artist
143 |     if (artist): #If it's a filled string, != False
144 |         #If the value hasn't been introduced yet and exists, we'll insert it
145 |         artist_statement = """INSERT INTO Artist(name) SELECT ? WHERE NOT EXISTS 
146 |             (SELECT * FROM Artist WHERE name = ?)"""
147 |         SQLparams = (artist, artist) #Params needed for completing the statement
148 |         cur.execute(artist_statement, SQLparams)
149 | 
150 |     #Genre
151 |     if (genre): #If it's a filled string, != False
152 |         #If the value hasn't been introduced yet and exists, we'll insert it
153 |         genre_statement = """INSERT INTO Genre(name) SELECT ? WHERE NOT EXISTS 
154 |             (SELECT * FROM Genre WHERE name = ?)"""
155 |         SQLparams = (genre, genre)
156 |         cur.execute(genre_statement, SQLparams)
157 | 
158 |     #Album
159 |     if (album): #If it's a filled string, != False
160 |         #First of all, we'll get the artist id
161 |         artistID_statement = "SELECT id from Artist WHERE name = ?"
162 |         cur.execute(artistID_statement, (artist, ))
163 |         #.fetchone() returns a one-element tuple, and we want its content
164 |         artist_id = cur.fetchone()[0]
165 | 
166 | 
167 |         #Now we're going to insert the data
168 |         album_statement = """INSERT INTO Album(title, artist_id) 
169 |             SELECT ?, ? WHERE NOT EXISTS (SELECT * FROM Album WHERE title = ?)"""
170 |         SQLparams = (album, artist_id, album)
171 |         cur.execute(album_statement, SQLparams)
172 | 
173 |     #Track
174 |     if (title): #If it's a filled string, != False
175 |         #Obtaining genre_id
176 |         genreID_statement = "SELECT id from Genre WHERE name = ?"
177 |         cur.execute(genreID_statement, (genre, ))
178 |         try:
179 |             genre_id = cur.fetchone()[0]
180 |         except TypeError:
181 |             genre_id = 0
182 |         #Obtaining album_id
183 |         albumID_statement = "SELECT id from Album WHERE title = ?"
184 |         cur.execute(albumID_statement, (album, ))
185 |         try:
186 |             album_id = cur.fetchone()[0]
187 |         except TypeError:
188 |             album_id = 0
189 | 
190 |         #Inserting data
191 |         track_statement = """INSERT INTO Track(title, album_id, genre_id, len,
192 |             rating, count) SELECT ?, ?, ?, ?, ?, ?
193 |                 WHERE NOT EXISTS (SELECT * FROM Track WHERE title = ?)"""
194 |         SQLparams = (title, album_id, genre_id, length, rating, count, title)
195 |         cur.execute(track_statement, SQLparams)
196 | 
197 | 
198 | conn.commit()
199 | cur.close()
200 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 3 Assignment Multi-Table Database - Tracks/tracks.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 3 Assignment Multi-Table Database - Tracks/tracks.sqlite


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 3 Assignment Multi-Table Database - Tracks/tracks/README.txt:
--------------------------------------------------------------------------------
1 | To export your own Library.xml from iTunes 
2 | 
3 | File -> Library -> Export Library
4 | 
5 | Make sure it is in the correct folder.   Of course iTUnes might change
6 | UI and/or export format any time - so good luck :)
7 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 3 Assignment Multi-Table Database - Tracks/tracks/tracks.py:
--------------------------------------------------------------------------------
 1 | import xml.etree.ElementTree as ET
 2 | import sqlite3
 3 | 
 4 | conn = sqlite3.connect('trackdb.sqlite')
 5 | cur = conn.cursor()
 6 | 
 7 | # Make some fresh tables using executescript()
 8 | cur.executescript('''
 9 | DROP TABLE IF EXISTS Artist;
10 | DROP TABLE IF EXISTS Album;
11 | DROP TABLE IF EXISTS Track;
12 | 
13 | CREATE TABLE Artist (
14 |     id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
15 |     name    TEXT UNIQUE
16 | );
17 | 
18 | CREATE TABLE Album (
19 |     id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
20 |     artist_id  INTEGER,
21 |     title   TEXT UNIQUE
22 | );
23 | 
24 | CREATE TABLE Track (
25 |     id  INTEGER NOT NULL PRIMARY KEY 
26 |         AUTOINCREMENT UNIQUE,
27 |     title TEXT  UNIQUE,
28 |     album_id  INTEGER,
29 |     len INTEGER, rating INTEGER, count INTEGER
30 | );
31 | ''')
32 | 
33 | 
34 | fname = input('Enter file name: ')
35 | if ( len(fname) < 1 ) : fname = 'Library.xml'
36 | 
37 | # <key>Track ID</key><integer>369</integer>
38 | # <key>Name</key><string>Another One Bites The Dust</string>
39 | # <key>Artist</key><string>Queen</string>
40 | def lookup(d, key):
41 |     found = False
42 |     for child in d:
43 |         if found : return child.text
44 |         if child.tag == 'key' and child.text == key :
45 |             found = True
46 |     return None
47 | 
48 | stuff = ET.parse(fname)
49 | all = stuff.findall('dict/dict/dict')
50 | print('Dict count:', len(all))
51 | for entry in all:
52 |     if ( lookup(entry, 'Track ID') is None ) : continue
53 | 
54 |     name = lookup(entry, 'Name')
55 |     artist = lookup(entry, 'Artist')
56 |     album = lookup(entry, 'Album')
57 |     count = lookup(entry, 'Play Count')
58 |     rating = lookup(entry, 'Rating')
59 |     length = lookup(entry, 'Total Time')
60 | 
61 |     if name is None or artist is None or album is None : 
62 |         continue
63 | 
64 |     print(name, artist, album, count, rating, length)
65 | 
66 |     cur.execute('''INSERT OR IGNORE INTO Artist (name) 
67 |         VALUES ( ? )''', ( artist, ) )
68 |     cur.execute('SELECT id FROM Artist WHERE name = ? ', (artist, ))
69 |     artist_id = cur.fetchone()[0]
70 | 
71 |     cur.execute('''INSERT OR IGNORE INTO Album (title, artist_id) 
72 |         VALUES ( ?, ? )''', ( album, artist_id ) )
73 |     cur.execute('SELECT id FROM Album WHERE title = ? ', (album, ))
74 |     album_id = cur.fetchone()[0]
75 | 
76 |     cur.execute('''INSERT OR REPLACE INTO Track
77 |         (title, album_id, len, rating, count) 
78 |         VALUES ( ?, ?, ?, ?, ? )''', 
79 |         ( name, album_id, length, rating, count ) )
80 | 
81 |     conn.commit()
82 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 4 Assignment Many Students in Many Courses/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 4 Assignment Many Students in Many Courses/.DS_Store


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 4 Assignment Many Students in Many Courses/Week 4 Assignment Many Students in Many Courses.py:
--------------------------------------------------------------------------------
  1 | """
  2 | 
  3 | To get credit for this assignment, perform the instructions below and enter the code you get here:
  4 | 
  5 | 
  6 | (Hint: starts with 414)
  7 | Instructions
  8 | This application will read roster data in JSON format, parse the file, and then produce an SQLite database that contains a User, Course, and Member table and populate the tables from the data file.
  9 | 
 10 | You can base your solution on this code: http://www.py4e.com/code3/roster/roster.py - this code is incomplete as you need to modify the program to store the role column in the Member table to complete the assignment.
 11 | 
 12 | Each student gets their own file for the assignment. Download this file and save it as roster_data.json. Move the downloaded file into the same folder as your roster.py program.
 13 | 
 14 | Once you have made the necessary changes to the program and it has been run successfully reading the above JSON data, run the following SQL command:
 15 | 
 16 | SELECT hex(User.name || Course.title || Member.role ) AS X FROM
 17 |     User JOIN Member JOIN Course
 18 |     ON User.id = Member.user_id AND Member.course_id = Course.id
 19 |     ORDER BY X
 20 | Find the first row in the resulting record set and enter the long string that looks like 53656C696E613333.
 21 | The first row in the resulting record set: ('414A736933333430',)
 22 | """
 23 | 
 24 | 
 25 | import json
 26 | import sqlite3
 27 | 
 28 | #PART 1: Creating the database
 29 | dbname = "roster.sqlite"
 30 | conn = sqlite3.connect(dbname)
 31 | cur = conn.cursor()
 32 | 
 33 | cur.executescript('''
 34 | 	DROP TABLE IF EXISTS User;
 35 | 	DROP TABLE IF EXISTS Course;
 36 | 	DROP TABLE IF EXISTS Member;
 37 | 	CREATE TABLE User (
 38 | 		id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
 39 | 		name TEXT UNIQUE 
 40 | 	);
 41 | 	CREATE TABLE Course (
 42 | 		id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
 43 | 		title TEXT UNIQUE
 44 | 	);
 45 | 	CREATE TABLE Member (
 46 | 		user_id INTEGER,
 47 | 		course_id INTEGER,
 48 | 		role INTEGER,
 49 | 		PRIMARY KEY(user_id, course_id)
 50 | 	)
 51 | ''')
 52 | #Note: if we don't add UNIQUE after "User.name" and "Course.title", 
 53 | #the IGNORE statement won't work and therefore we'll have duplicates
 54 | 
 55 | 
 56 | #PART 2: DESERIALIZING THE data
 57 | #The JSON data we're going to process is stored in an array form, with each
 58 | #item being also an array of three elements: one corresponding to the username 
 59 | #one corresponding to the course name, and one indicating if the user is instructor
 60 | #None of them has any field title. 
 61 | 
 62 | filename = "roster_data.json"
 63 | jsondata = open(filename)
 64 | data = json.load(jsondata)
 65 | 
 66 | #PART 3: INSERTING DATA
 67 | for entry in data:
 68 | 	user = entry[0]
 69 | 	course = entry[1]
 70 | 	instructor = entry[2]
 71 | 
 72 | 	#Inserting user
 73 | 	user_statement = """INSERT OR IGNORE INTO User(name) VALUES( ? )"""
 74 | 	SQLparams = (user, )
 75 | 	cur.execute(user_statement, SQLparams)
 76 | 
 77 | 	#Inserting course
 78 | 	course_statement = """INSERT OR IGNORE INTO Course(title) VALUES( ? )"""
 79 | 	SQLparams = (course, )
 80 | 	cur.execute(course_statement, SQLparams)
 81 | 
 82 | 	#Getting user and course id
 83 | 	courseID_statement = """SELECT id FROM Course WHERE title = ?"""
 84 | 	SQLparams = (course, )
 85 | 	cur.execute(courseID_statement, SQLparams)
 86 | 	courseID = cur.fetchone()[0]
 87 | 
 88 | 	userID_statement = """SELECT id FROM User WHERE name = ?"""
 89 | 	SQLparams = (user, )
 90 | 	cur.execute(userID_statement, SQLparams)
 91 | 	userID = cur.fetchone()[0]
 92 | 
 93 | 	#Inserting the entry
 94 | 	member_statement = """INSERT INTO Member(user_id, course_id, role)
 95 | 		VALUES(?, ?, ?)"""
 96 | 	SQLparams = (userID, courseID, instructor)
 97 | 	cur.execute(member_statement, SQLparams)
 98 | 
 99 | #Saving the changes
100 | conn.commit()
101 | 
102 | #PART 4: Testing and obtaining the results
103 | test_statement = """
104 | SELECT hex(User.name || Course.title || Member.role ) AS X FROM 
105 |     User JOIN Member JOIN Course 
106 |     ON User.id = Member.user_id AND Member.course_id = Course.id
107 |     ORDER BY X
108 | """
109 | cur.execute(test_statement)
110 | result = cur.fetchone()
111 | print("The first row in the resulting record set: " + str(result))
112 | 
113 | #Closing the connection
114 | cur.close()
115 | conn.close()
116 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/.DS_Store


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/A.1.1. - Geoload running.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/A.1.1. - Geoload running.PNG


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/A.1.2. - Geodump running.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/A.1.2. - Geodump running.PNG


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/A.1.3. - My location.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/A.1.3. - My location.PNG


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/geodata/README.txt:
--------------------------------------------------------------------------------
  1 | Using the Google Geocoding API with a Database and 
  2 | Visualizing data on Google Map
  3 | 
  4 | In this project, we are using the Google geocoding API
  5 | to clean up some user-entered geographic locations of 
  6 | university names and then placing the data on a Google
  7 | Map.
  8 | 
  9 | You should install the SQLite browser to view and modify 
 10 | the databases from:
 11 | 
 12 | http://sqlitebrowser.org/
 13 | 
 14 | The first problem to solve is that the Google geocoding
 15 | API is rate limited to 2500 requests per day.  So if you have
 16 | a lot of data you might need to stop and restart the lookup
 17 | process several times.  So we break the problem into two
 18 | phases.  
 19 | 
 20 | In the first phase we take our input data in the file
 21 | (where.data) and read it one line at a time, and retreive the
 22 | geocoded response and store it in a database (geodata.sqlite).
 23 | Before we use the geocoding API, we simply check to see if
 24 | we already have the data for that particular line of input.
 25 | 
 26 | You can re-start the process at any time by removing the file
 27 | geodata.sqlite
 28 | 
 29 | Run the geoload.py program.   This program will read the input
 30 | lines in where.data and for each line check to see if it is already
 31 | in the database and if we don't have the data for the location,
 32 | call the geocoding API to retrieve the data and store it in 
 33 | the database.
 34 | 
 35 | Here is a sample run after there is already some data in the 
 36 | database:
 37 | 
 38 | Mac: python geoload.py
 39 | Win: geoload.py
 40 | 
 41 | Found in database  Northeastern University
 42 | 
 43 | Found in database  University of Hong Kong, Illinois Institute of Technology, Bradley University
 44 | 
 45 | Found in database  Technion
 46 | 
 47 | Found in database  Viswakarma Institute, Pune, India
 48 | 
 49 | Found in database  UMD
 50 | 
 51 | Found in database  Tufts University
 52 | 
 53 | Resolving Monash University
 54 | Retrieving http://maps.googleapis.com/maps/api/geocode/json?sensor=false&address=Monash+University
 55 | Retrieved 2063 characters {    "results" : [  
 56 | {u'status': u'OK', u'results': ... }
 57 | 
 58 | Resolving Kokshetau Institute of Economics and Management
 59 | Retrieving http://maps.googleapis.com/maps/api/geocode/json?sensor=false&address=Kokshetau+Institute+of+Economics+and+Management
 60 | Retrieved 1749 characters {    "results" : [  
 61 | {u'status': u'OK', u'results': ... }
 62 | 
 63 | The first five locations are already in the database and so they 
 64 | are skipped.  The program scans to the point where it finds un-retrieved
 65 | locations and starts retrieving them.
 66 | 
 67 | The geoload.py can be stopped at any time, and there is a counter 
 68 | that you can use to limit the number of calls to the geocoding
 69 | API for each run.
 70 | 
 71 | Once you have some data loaded into geodata.sqlite, you can 
 72 | visualize the data using the (geodump.py) program.  This
 73 | program reads the database and writes tile file (where.js)
 74 | with the location, latitude, and longitude in the form of
 75 | executable JavaScript code.   
 76 | 
 77 | A run of the geodump.py program is as follows:
 78 | 
 79 | Mac: python geodump.py
 80 | Win: geodump.py
 81 | 
 82 | Northeastern University, 360 Huntington Avenue, Boston, MA 02115, USA 42.3396998 -71.08975
 83 | Bradley University, 1501 West Bradley Avenue, Peoria, IL 61625, USA 40.6963857 -89.6160811
 84 | ...
 85 | Technion, Viazman 87, Kesalsaba, 32000, Israel 32.7775 35.0216667
 86 | Monash University Clayton Campus, Wellington Road, Clayton VIC 3800, Australia -37.9152113 145.134682
 87 | Kokshetau, Kazakhstan 53.2833333 69.3833333
 88 | ...
 89 | 12 records written to where.js
 90 | Open where.html to view the data in a browser
 91 | 
 92 | The file (where.html) consists of HTML and JavaScript to visualize 
 93 | a Google Map.  It reads the most recent data in where.js to get 
 94 | the data to be visualized.  Here is the format of the where.js file:
 95 | 
 96 | myData = [
 97 | [42.3396998,-71.08975, 'Northeastern University, 360 Huntington Avenue, Boston, MA 02115, USA'],
 98 | [40.6963857,-89.6160811, 'Bradley University, 1501 West Bradley Avenue, Peoria, IL 61625, USA'],
 99 | [32.7775,35.0216667, 'Technion, Viazman 87, Kesalsaba, 32000, Israel'],
100 |    ...
101 | ];
102 | 
103 | This is a JavaScript list of lists.  The syntax for JavaScript 
104 | list constants is very similar to Python so the syntax should 
105 | be familiar to you.
106 | 
107 | Simply open where.html in a browser to see the locations.  You 
108 | can hover over each map pin to find the location that the 
109 | gecoding API returned for the user-entered input.  If you 
110 | cannot see any data when you open the where.html file, you might 
111 | want to check the JavaScript or developer console for your browser.
112 | 
113 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/geodata/geodata.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/geodata/geodata.sqlite


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/geodata/geodump.py:
--------------------------------------------------------------------------------
 1 | import sqlite3
 2 | import json
 3 | import codecs
 4 | 
 5 | conn = sqlite3.connect('geodata.sqlite')
 6 | cur = conn.cursor()
 7 | 
 8 | cur.execute('SELECT * FROM Locations')
 9 | fhand = codecs.open('where.js','w', "utf-8")
10 | fhand.write("myData = [\n")
11 | count = 0
12 | for row in cur :
13 |     data = str(row[1])
14 |     try: js = json.loads(str(data))
15 |     except: continue
16 | 
17 |     if not('status' in js and js['status'] == 'OK') : continue
18 | 
19 |     lat = js["results"][0]["geometry"]["location"]["lat"]
20 |     lng = js["results"][0]["geometry"]["location"]["lng"]
21 |     if lat == 0 or lng == 0 : continue
22 |     where = js['results'][0]['formatted_address']
23 |     where = where.replace("'","")
24 |     try :
25 |         print (where, lat, lng)
26 | 
27 |         count = count + 1
28 |         if count > 1 : fhand.write(",\n")
29 |         output = "["+str(lat)+","+str(lng)+", '"+where+"']"
30 |         fhand.write(output)
31 |     except:
32 |         continue
33 | 
34 | fhand.write("\n];\n")
35 | cur.close()
36 | fhand.close()
37 | print (count, "records written to where.js")
38 | print ("Open where.html to view the data in a browser")
39 | 
40 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/geodata/geoload.py:
--------------------------------------------------------------------------------
 1 | import urllib
 2 | import sqlite3
 3 | import json
 4 | import time
 5 | import ssl
 6 | 
 7 | # If you are in China use this URL:
 8 | # serviceurl = "http://maps.google.cn/maps/api/geocode/json?"
 9 | serviceurl = "http://maps.googleapis.com/maps/api/geocode/json?"
10 | 
11 | # Deal with SSL certificate anomalies Python > 2.7
12 | # scontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
13 | scontext = None
14 | 
15 | conn = sqlite3.connect('geodata.sqlite')
16 | cur = conn.cursor()
17 | 
18 | cur.execute('''
19 | CREATE TABLE IF NOT EXISTS Locations (address TEXT, geodata TEXT)''')
20 | 
21 | fh = open("where.data")
22 | count = 0
23 | for line in fh:
24 |     if count > 200 : break
25 |     address = line.strip()
26 |     print ('')
27 |     cur.execute("SELECT geodata FROM Locations WHERE address= ?", (buffer(address), ))
28 | 
29 |     try:
30 |         data = cur.fetchone()[0]
31 |         print ("Found in database ",address)
32 |         continue
33 |     except:
34 |         pass
35 | 
36 |     print ('Resolving', address)
37 |     url = serviceurl + urllib.urlencode({"sensor":"false", "address": address})
38 |     print ('Retrieving', url)
39 |     uh = urllib.urlopen(url, context=scontext)
40 |     data = uh.read()
41 |     print ('Retrieved',len(data),'characters',data[:20].replace('\n',' '))
42 |     count = count + 1
43 |     try: 
44 |         js = json.loads(str(data))
45 |         # print js  # We print in case unicode causes an error
46 |     except: 
47 |         continue
48 | 
49 |     if 'status' not in js or (js['status'] != 'OK' and js['status'] != 'ZERO_RESULTS') : 
50 |         print ('==== Failure To Retrieve ====')
51 |         print (data)
52 |         break
53 | 
54 |     cur.execute('''INSERT INTO Locations (address, geodata) 
55 |             VALUES ( ?, ? )''', ( buffer(address),buffer(data) ) )
56 |     conn.commit() 
57 |     time.sleep(1)
58 | 
59 | print ("Run geodump.py to read the data from the database so you can visualize it on a map.")
60 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/geodata/where.data:
--------------------------------------------------------------------------------
  1 | Northeastern University
  2 | University of Hong Kong, Illinois Institute of Technology, Bradley University
  3 | Technion
  4 | Viswakarma Institute, Pune, India
  5 | UMD
  6 | Tufts University
  7 | Monash University
  8 | Kokshetau Institute of Economics and Management
  9 | RSU named S.A. Esenin
 10 | Tavrida National V.I. Vernadsky University
 11 | UOC
 12 | Irkutsk State University
 13 | Institute of Technology Telkom
 14 | Shanghai Jiao Tong University
 15 | University of Ilorin, Kwara State. Nigeria
 16 | Monash University Churchill Australia
 17 | UNISA
 18 | Fachhochschule FH Salzburg
 19 | Tampere University of Technology (Tampere, Finland)
 20 | Saint Petersburg State University
 21 | University of São Paulo
 22 | Smolensk State University (Russia)
 23 | Institute of Business Administration, Karachi
 24 | universidad complutense de madrid
 25 | Masdar Institute
 26 | University of London
 27 | University of Oxford
 28 | Tallinn University of Technology
 29 | University of Tartu
 30 | University of Padua
 31 | University of Pune, India
 32 | National Kyiv Shevchenko University
 33 | UC Berkeley
 34 | University of Wisconsin - Madison
 35 | Lodz University of Technology 
 36 | NRU IFMO
 37 | Dniepropetrovsk National University (Ukraine), Applied Math Faculty
 38 | Dokuz Eylul University, Izmir, Turkey
 39 | Beijing normal university
 40 | University of Piraeus, Athens
 41 | Universidad de Buenos Aires (UBA). Argentina.
 42 | SASTRA University
 43 | Nagpur University
 44 | Duke University
 45 | San Francisco State University
 46 | FATEC-SP - Faculdade de Tecnologia do Estado de São Paulo
 47 | University of Texas at Austin
 48 | University of applied sciense of Mikkeli (Finland)
 49 | Troy University
 50 | Universidade do Minho
 51 | National University of Sciences and Technology (NUST)-Pakistan
 52 | Pontificia universidad catolica de chile
 53 | Illinois State University Joliet Junior College
 54 | American University in Cairo (AUC)
 55 | Obninsk Technical University of Nuclear Power Engineering, Russia
 56 | Vyatka State Humanitarian University
 57 | Weizmann Institute of Science (Israel)
 58 | University of Washington
 59 | Kharkiv State Academy of Municipal Economy, Ukraine
 60 | Faculty of Electrical Engineering in Sarajevo, University of Sarajevo
 61 | Universidad de Los Andes Colombia
 62 | University of Colorado at Boulder
 63 | Magnitogorsk State Technical University
 64 | USC
 65 | Simon Fraser University
 66 | Columbia University (New York)
 67 | University of Southern California
 68 | University of Warsaw
 69 | Warsaw University of Technology
 70 | (Some place in New Zealand you haven't heard of.)
 71 | Massey university part-time Distance learning
 72 | University of Oklahoma
 73 | University of Pavia, Italy
 74 | University of Missouri - Columbia
 75 | Czech Technical University in Prague
 76 | Illinois Institute of Technology
 77 | Penn State University
 78 | University of Utah
 79 | Faculty of Science, University of Zagreb - Department of Mathematics
 80 | Universitat Politecnica de Valencia
 81 | University of Vienna
 82 | University of Puerto Rico - Mayaguez Campus
 83 | University "Hyperion" of Bucharest
 84 | University of New Haven
 85 | University of Washington -Bothell
 86 | Drexel University
 87 | University of Texas at Austin
 88 | University of Helsinki
 89 | University of Michigan
 90 | Carnegie Mellon University
 91 | Kazan Federal University
 92 | Pondicherry University
 93 | Far-Eastern State University
 94 | Nanyang Technological University
 95 | Slovak University of Technology
 96 | NYU
 97 | UFABC - Universidade Federal do ABC, Sanso André - SP - Brazil
 98 | University of Debrecen 
 99 | California State University, San Bernardino
100 | National University "Kyiv-Mohyla Academy" (Kyiv, Ukraine)
101 | Laurentian University
102 | Humanities Institute of TV and Radio, Moscow, Russia
103 | University of Cambridge, UK
104 | Payame Noor University, Tehran, Iran
105 | Middle East Technical University
106 | EPFL
107 | Faculty of Technical Sciences, Novi Sad, Serbia
108 | University of Gothenburg, Sweden
109 | Polytechnic University of Timisoara
110 | University of Hawaii (Go, Rainbows!)
111 | Belarusian State University
112 | Haaga-Helia university of applied sciences
113 | JADAVPUR UNIVERSITY
114 | Gauhati University, India
115 | Universidad de Buenos Aires
116 | Università degli Studi di Genova, Genova, Italia
117 | King Mongkut's University of Technology Thonburi
118 | Universidad de la Sabana, Chia, Colombia
119 | State University of New York (SUNY) College at Oswego
120 | Kyrgyz Slavic Russian University
121 | De La Salle University http://www.dlsu.edu.ph
122 | Jawaharlal Nehru Technological University, INDIA
123 | UCL (Université Catholique de Louvain) in Belgium
124 | Boston University
125 | The University of Manchester
126 | Fachhochschule Düsseldorf 
127 | Pine Manor College (AA), Harvard University (BA), Lesley University (MEd)
128 | Simón Bolívar University
129 | Indiana University at Bloomington
130 | RPI
131 | University of Ottawa, Canada
132 | Ural Federal University
133 | BITS Pilani
134 | Transilvania University
135 | IIT(BHU), Varanasi, India
136 | EM Lyon
137 | Universidad Central de Venezuela
138 | NTUU "KPI"
139 | Universidade Federal da Paraiba, Brazil
140 | Budapest University of Technology and Economics
141 | Moscow Institute of Physics & Technology (State University)
142 | Saint Petersburg State University of Aerospace Instrumentation, Russia
143 | North Central College, Naperville, IL
144 | Tech. Uni. Denmark (DTU)
145 | Stanford
146 | "Politehnica" Timisoara
147 | National University of Engineering
148 | Monash
149 | Federal University of Campina Grande (UFCG)
150 | Universidade Federal do Rio Grande do Sul (UFRGS)
151 | Universidad Nacional Autónoma de México
152 | University of New South Wales Harvard Business School
153 | University of Tehran
154 | Old Dominion University
155 | Kyiv Unisersity of Oriental Language
156 | Babcock University
157 | University of Essex
158 | Kharkiv National University of Radio Electronics (Ukraine)
159 | Kaunas Technology University
160 | University of Buenos Aires
161 | University of Jaffna.
162 | R V College of Engineering, Bangalore, India for BE in Instrumentation Technology
163 | Beloit College
164 | UCLA
165 | University of Chicago
166 | University of Sciences and Technology of Oran. Mohamed Boudiaf (USTO-MB).
167 | Zagazig University, Egypt
168 | University of Alberta
169 | Belorussian State University
170 | Jones International University (online) Illinois State Univeristy
171 | University of Florida
172 | Too many to mention.
173 | University of Kerala, India
174 | Politecnico di Milano
175 | Vilnius Gediminas Technical University
176 | Madras university/ Bharthidasan University in India . 
177 | Universidade Tecnica de Lisboa - Instituto Superior Técnico
178 | Does not apply. 
179 | Stellenbosch University
180 | imt ghazIABAD INDIA
181 | University of Pennsylvania
182 | National Institute of Technology, Jalandhar (India)
183 | Universidad ICESI
184 | Virginia Tech
185 | arizona state university
186 | Universidad del Valle de Guatemala
187 | Mykolas Romeris University, Vilnius, Lithuania
188 | BSU
189 | Distance Learning Center at the Technical University of Kaiserslautern in Germany
190 | Ain shams university, Cairo, Egypt
191 | Universidad Nacional de Colombia
192 | Saint-Petersburg Polytechnic Univesity
193 | NAIT (Northern Alberta Institute of Technology)
194 | Wayne State took courses at U of M
195 | Universidad Nacional, Costa Rica
196 | Marietta College (Ohio) Northwestern University
197 | Grandville
198 | Portland State University, Oregon Institute of Technology
199 | Malayer Azad University, Iran
200 | Universitat Politecnia de Valencia
201 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/geodata/where.html:
--------------------------------------------------------------------------------
 1 | <html>
 2 |   <head>
 3 |     <meta name="viewport" content="initial-scale=1.0, user-scalable=no">
 4 |     <meta charset="utf-8">
 5 |     <title>A Map of Information</title>
 6 |     <link href="http://google-developers.appspot.com/maps/documentation/javascript/examples/default.css" rel="stylesheet">
 7 | 
 8 |     <!-- If you are in China, you may need to use theis site for the Google Maps code
 9 |     <script src="http://maps.google.cn/maps/api/js" type="text/javascript"></script> -->
10 |     <script src="http://maps.googleapis.com/maps/api/js?sensor=false"></script>
11 | 
12 |     <script src="http://google-maps-utility-library-v3.googlecode.com/svn/trunk/markerclusterer/src/markerclusterer_compiled.js"></script>
13 |     <script src="where.js"></script>
14 |     <script>
15 | 
16 |       function initialize() {
17 |         alert("To see the title of a marker, hover over the marker but don't click.");
18 |         var myLatlng = new google.maps.LatLng(37.39361,-122.099263)
19 |         var mapOptions = {
20 |           zoom: 3,
21 |           center: myLatlng,
22 |           mapTypeId: google.maps.MapTypeId.ROADMAP
23 |         }
24 |         var map = new google.maps.Map(document.getElementById('map_canvas'), mapOptions);
25 | 
26 |         i = 0;
27 |         var markers = [];
28 |         for ( pos in myData ) {
29 |             i = i + 1;
30 |             var row = myData[pos];
31 | 		    window.console && console.log(row);
32 |             // if ( i < 3 ) { alert(row); }
33 |             var newLatlng = new google.maps.LatLng(row[0], row[1]);
34 |             var marker = new google.maps.Marker({
35 |                 position: newLatlng,
36 |                 map: map,
37 |                 title: row[2]
38 |             });
39 |             markers.push(marker);
40 |         }
41 |       }
42 |     </script>
43 |   </head>
44 |   <body onload="initialize()">
45 | <div id="map_canvas" style="height: 500px"></div>
46 | <p><b>About this Map</b></p>
47 | <p>
48 | This is a cool map from 
49 | <a href="http://www.pythonlearn.com">www.pythonlearn.com</a>.
50 | </p>
51 | </body>
52 | </html>
53 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Assignment/Week 5 Assignment Databases and Visualization (peer-graded)/geodata/where.js:
--------------------------------------------------------------------------------
  1 | myData = [
  2 | [42.340082,-71.0894884, 'Northeastern, Boston, MA 02115, USA'],
  3 | [40.7399972,-74.1775311, 'Bradley Hall, 110 Warren St, Newark, NJ 07102, USA'],
  4 | [32.778949,35.019648, 'Technion/ Sports Building, Haifa'],
  5 | [42.4036848,-71.120482, 'South Hall Tufts University, 30 Lower Campus Rd, Somerville, MA 02144, USA'],
  6 | [-38.1518106,145.1345412, 'Monash University, Frankston VIC 3199, Australia'],
  7 | [53.2948229,69.4047872, 'Kokshetau 020000, Kazakhstan'],
  8 | [40.7127837,-74.0059413, 'New York, NY, USA'],
  9 | [52.2869741,104.3050183, 'Irkutsk, Irkutsk Oblast, Russia'],
 10 | [8.481302,4.611479, 'University Rd, Ilorin, Nigeria'],
 11 | [-25.7688448,28.199104, 'Unisa Observatory Building, Preller St, Pretoria, 0027, South Africa'],
 12 | [47.80949,13.05501, 'Salzburg, Austria'],
 13 | [61.4977524,23.7609535, 'Tampere, Finland'],
 14 | [27.7518284,-82.6267345, 'St. Petersburg, FL, USA'],
 15 | [54.7903112,32.0503663, 'Smolensk, Smolensk Oblast, Russia'],
 16 | [24.8614622,67.0099388, 'Karachi, Pakistan'],
 17 | [40.506934,-3.3458886, 'Ctra. Universidad Complutense, 28805 Alcalá de Henares, Madrid, Spain'],
 18 | [51.5266171,-0.1260773, 'University Of London, 1-11 Cartwright Gardens, Kings Cross, London WC1H 9EB, UK'],
 19 | [39.5069974,-84.745231, 'Oxford, OH 45056, USA'],
 20 | [58.3733281,26.7265098, 'Tartu Ülikooli Füüsikahoone, 50103 Tartu, Estonia'],
 21 | [33.6778327,-117.8151285, 'Padua, Irvine, CA 92614, USA'],
 22 | [18.5544976,73.8257325, 'Pune University, Ganeshkhind, Pune, Maharashtra, India'],
 23 | [37.8805941,-122.2447958, 'Space Sciences Laboratory at University of California, 7 Gauss Way, Berkeley, CA 94720, USA'],
 24 | [43.0765915,-89.4052247, 'William H. Sewell Social Sciences Building, 1180 Observatory Dr, Madison, WI 53706, USA'],
 25 | [39.9622267,116.3659223, 'Bei Jing Shi Fan Da Xue, BeiTaiPingZhuang, Haidian Qu, Beijing Shi, China, 100875'],
 26 | [33.9519347,-83.357567, 'Athens, GA, USA'],
 27 | [10.7295115,79.0196067, 'Sastra University Road, Tirumalaisamudram, Tamil Nadu 613401, India'],
 28 | [41.9197689,-91.649501, 'Duke St SW, Cedar Rapids, IA 52404, USA'],
 29 | [-23.5505199,-46.6333094, 'São Paulo, State of São Paulo, Brazil'],
 30 | [30.2850284,-97.7335226, 'University of Texas at Austin, Austin, TX, USA'],
 31 | [61.6887271,27.2721457, 'Mikkeli, Finland'],
 32 | [32.4204729,-85.0323718, 'H. Curtis Pitts Hall, 3413 S Seale Rd, Phenix City, AL 36869, USA'],
 33 | [41.557583,-8.397568, 'Universidade do Minho, 4710 Braga, Portugal'],
 34 | [51.892316,-8.4951998, 'National Food Biotechnology Centre, Food Science and Technology Building, University College Cork, College Rd, University College, Cork, Ireland'],
 35 | [-33.0444219,-71.6066334, 'Pontificia Universidad Catolica De Valparaiso - Gimpert, Valparaíso, Región de Valparaíso, Chile'],
 36 | [40.6331249,-89.3985283, 'Illinois, USA'],
 37 | [30.0180285,31.5032758, 'AUC Sports Center, Cairo Governorate, Egypt'],
 38 | [55.1170375,36.5970818, 'Obninsk, Kaluga Oblast, Russia'],
 39 | [31.767879,-106.440736, 'Washington, El Paso, TX 79905, USA'],
 40 | [49.9935,36.230383, 'Kharkiv, Kharkiv Oblast, Ukraine'],
 41 | [43.8562586,18.4130763, 'Sarajevo, Bosnia and Herzegovina'],
 42 | [3.4321247,-76.5461709, 'Parqueadero Universidad Del Valle, Cali, Valle del Cauca, Colombia'],
 43 | [40.0082221,-105.2591119, 'Colorado Ave & University Heights, Boulder, CO 80302, USA'],
 44 | [53.4129429,59.0016233, 'Magnitogorsk, Chelyabinsk Oblast, Russia'],
 45 | [27.5695246,-99.4350626, 'Senator Judith Zaffirini Student Success Center, Laredo, TX 78041, USA'],
 46 | [52.124815,-106.589195, 'Simon Fraser Crescent, Saskatoon, SK S7H, Canada'],
 47 | [40.807722,-73.96411, '116 St - Columbia University, New York, NY 10027, USA'],
 48 | [34.1036186,-117.2914463, 'American Heritage University of Southern California, 255 N D St, San Bernardino, CA 92401, USA'],
 49 | [43.1827984,-77.5993071, 'Warsaw St, Rochester, NY 14621, USA'],
 50 | [52.2296756,21.0122287, 'Warsaw, Poland'],
 51 | [-40.900557,174.885971, 'New Zealand'],
 52 | [-40.3850866,175.6140639, 'Massey University, Palmerston North, New Zealand'],
 53 | [35.1924456,-97.4432884, 'University of Oklahoma, Norman, OK 73072, USA'],
 54 | [45.1847248,9.1582069, '27100 Pavia PV, Italy'],
 55 | [38.6598662,-90.3123536, 'Columbia Ave, University City, MO 63130, USA'],
 56 | [50.0755381,14.4378005, 'Prague, Czech Republic'],
 57 | [41.8313852,-87.6272216, 'Iit Tower, 10 W 35th St, Chicago, IL 60616, USA'],
 58 | [40.7933949,-77.8600012, 'State College, PA, USA'],
 59 | [40.7609264,-111.8270486, 'University, Salt Lake City, UT, USA'],
 60 | [39.4813156,-0.3505, 'Universitat Politècnica, 46022 Valencia, Spain'],
 61 | [33.6140008,-117.8440006, 'Vienna, Newport Beach, CA 92660, USA'],
 62 | [44.4267674,26.1025384, 'Bucharest, Romania'],
 63 | [33.7063317,-117.7733121, 'New Haven, Irvine, CA 92620, USA'],
 64 | [47.761605,-122.19303, 'UW Bothell & Cascadia College, Bothell, WA 98011, USA'],
 65 | [38.6679152,-90.3322259, 'Drexel Dr, University City, MO 63130, USA'],
 66 | [42.320138,-83.230993, 'University of Michigan, Dearborn, MI 48128, USA'],
 67 | [40.4432289,-79.9441368, 'Carnegie Mellon University, Pausch Bridge, Pittsburgh, PA 15213, USA'],
 68 | [55.8304307,49.0660806, 'Kazan, Tatarstan, Russia'],
 69 | [12.0263438,79.8492812, 'Pondicherry University, Kalapet, Puducherry 605014, India'],
 70 | [30.7897514,120.7760636, 'Jia Xing Nan Yang Zhi Ye Ji Shu Xue Yuan, Xiuzhou Qu, Jiaxing Shi, Zhejiang Sheng, China, 314000'],
 71 | [35.712815,135.9711705, 'Nyu, Mihama, Mikata District, Fukui Prefecture 919-1201, Japan'],
 72 | [-23.5431786,-46.6291845, 'State of São Paulo, Brazil'],
 73 | [47.5584793,21.620443, 'Debrecen, Debrecen University-Botanical Garden, 4032 Hungary'],
 74 | [34.0705324,-117.2957813, 'San Bernardino Fwy, San Bernardino, CA 92408, USA'],
 75 | [50.4501,30.5234, 'Kiev, Ukraine, 02000'],
 76 | [46.4618977,-80.9664534, 'University Laurentian, Copper Cliff, ON P0M 1N0, Canada'],
 77 | [55.755826,37.6173, 'Moscow, Russia'],
 78 | [52.2016671,0.1177882, 'University Of Cambridge, Cambridge CB2, UK'],
 79 | [35.246756,33.0307541, 'ODTÜ Misafirhane, Kalkanlı'],
 80 | [46.5189865,6.5676007, 'EPFL, 1015 Lausanne, Switzerland'],
 81 | [45.2671352,19.8335496, 'Novi Sad, Serbia'],
 82 | [57.6954209,11.9853213, 'Göteborgs universitetsbibliotek, Renströmsgatan 4, 412 55 Göteborg, Sweden'],
 83 | [22.4828735,88.394867, 'Jadavpur University Lake, Sahid Smirity Colony, Pancha Sayar, Kolkata, West Bengal 700094'],
 84 | [26.1529683,91.6639235, 'Gauhati University, Jalukbari, Guwahati, Assam, India'],
 85 | [-34.5101473,-58.6864035, 'Universidad de Buenos Aires, Villa de Mayo, Buenos Aires, Argentina'],
 86 | [44.4046049,8.9311653, 'Centro servizi bibliotecari di architettura Nino Carboneri dellUniversità degli studi di Genova, Stradone di SantAgostino, 37, 16123 Genova, Italy'],
 87 | [4.8602595,-74.0333032, 'Universidad De La Sabana, Chía, Cundinamarca, Colombia'],
 88 | [43.4553461,-76.5104973, 'Oswego, NY, USA'],
 89 | [16.9785466,82.2406733, 'Jawaharlal Nehru Technological University, Kakinada, Andhra Pradesh 533003, India'],
 90 | [50.503887,4.469936, 'Belgium'],
 91 | [51.4925846,-0.1852592, 'Boston University, 43 Harrington Gardens, Kensington, London SW7 4JU, UK'],
 92 | [64.9078809,-147.7117155, 'Manchester Loop, Fairbanks, AK 99712, USA'],
 93 | [51.1877226,6.7938734, 'Fachhochschule Düsseldorf, 40225 Düsseldorf, Germany'],
 94 | [39.18625,-86.5345967, 'Indiana 45 46 Bypass & N College Ave, Bloomington, IN 47408, USA'],
 95 | [18.9331831,72.8341894, 'KP Shethi Building, Janmabhoomi Marg, Kala Ghoda, Fort, Mumbai, Maharashtra 400001, India'],
 96 | [45.4248599,-75.6828, 'University of Ottawa Press, 542 King Edward Ave, Ottawa, ON K1N 6N5, Canada'],
 97 | [28.3580163,75.5887989, 'BITS, Pilani, Rajasthan 333031, India'],
 98 | [38.0517783,-84.4923513, 'Lucille C. Little Theater, Lexington, KY 40508, USA'],
 99 | [25.25968,82.989115, 'IIT Gymkhana, RR 11, Banaras Hindu University Campus, Varanasi, Uttar Pradesh 221001, India'],
100 | [50.862282,-2.4998561, 'E M Mitchell & Sons, Hermitage, Dorchester DT2 7BB, UK'],
101 | [10.1464162,-64.6955802, 'Universidad Central de Venezuela EUS Educación Barcelona, Av Centurión, Barcelona, Anzoátegui, Venezuela'],
102 | [-9.9541653,-67.8384015, 'Tv. Paraíba - Geraldo Fleming, Rio Branco - AC, Brazil'],
103 | [47.497912,19.040235, 'Budapest, Hungary'],
104 | [55.755826,37.6173, 'Moscow, Russia'],
105 | [27.7518284,-82.6267345, 'St. Petersburg, FL, USA'],
106 | [41.7508391,-88.1535352, 'Naperville, IL, USA'],
107 | [37.424106,-122.1660756, 'Stanford, CA, USA'],
108 | [29.1891714,-81.0469168, 'Lehman Engineering & Technology Center, 600 S Clyde Morris Blvd, Daytona Beach, FL 32114, USA'],
109 | [-35.417,149.1, 'Monash ACT 2904, Australia'],
110 | [19.3188895,-99.1843676, 'National Autonomous University of Mexico, Mexico City, Mexico'],
111 | [35.7058075,51.4020909, 'Tehran University, Tehran, Iran'],
112 | [36.8838957,-76.3040214, 'Old Dominion University, 5115 Hampton Blvd, Norfolk, VA 23508, USA'],
113 | [50.4501,30.5234, 'Kiev, Ukraine, 02000'],
114 | [40.0997009,-88.2209362, 'Babcock Hall, 906 W College Ct, Urbana, IL 61801, USA'],
115 | [40.0024922,-83.0524629, 'Essex Rd, Columbus, OH 43221, USA'],
116 | [49.9935,36.230383, 'Kharkiv, Kharkiv Oblast, Ukraine'],
117 | [27.6027172,-99.4687146, 'Buenos Aires Dr, Laredo, TX 78045, USA'],
118 | [42.5030209,-89.0295642, 'College St, Beloit, WI 53511, USA'],
119 | [40.5382913,-78.3528584, 'Ucla Ln, Altoona, PA 16602, USA'],
120 | [41.7857416,-87.5903039, 'The University of Chicago Press, 1427 E 60th St, Chicago, IL 60637, USA'],
121 | [30.5848529,31.4843221, 'Rd inside Zagazig University, Shaibet an Nakareyah, Markaz El-Zakazik, Ash Sharqia Governorate, Egypt'],
122 | [53.4943212,-113.5490268, 'University of Alberta Farm, Edmonton, AB T6H, Canada'],
123 | [28.0735403,-82.4373589, 'University, FL, USA'],
124 | [8.5053554,76.9484624, 'University of Kerala Senate House Campus, Palayam, Thiruvananthapuram, Kerala, India'],
125 | [45.4723514,9.1964401, 'Via del Vecchio Politecnico, 20121 Milano, Italy'],
126 | [54.6871555,25.2796514, 'Vilnius, Lithuania'],
127 | [20.593684,78.96288, 'India'],
128 | [-33.8812733,18.6264694, 'Stellenbosch University, Cape Town, 7530, South Africa'],
129 | [28.6777345,77.4504666, 'IMT Rd, Block 14, Sector 10, Raj Nagar, Ghaziabad, Uttar Pradesh 201002, India'],
130 | [41.2033216,-77.1945247, 'Pennsylvania, USA'],
131 | [31.3260152,75.5761829, 'Jalandhar, Punjab 144001, India'],
132 | [36.8743583,-76.1745441, 'Virginia Tech Trail, Virginia Beach, VA 23455, USA'],
133 | [33.4205343,-111.9339825, 'Old Main at Arizona State University, 400 E Tyler Mall, Tempe, AZ 85281, USA'],
134 | [22.2567635,-97.8345654, 'Guatemala, Cd Madero, Tamps., Mexico'],
135 | [54.6871555,25.2796514, 'Vilnius, Lithuania'],
136 | [1.2246216,19.7878159, 'Basankusu Airport (BSU), N22, Basankusu, Democratic Republic of the Congo'],
137 | [51.165691,10.451526, 'Germany'],
138 | [27.7518284,-82.6267345, 'St. Petersburg, FL, USA'],
139 | [33.952602,-84.5499327, 'Marietta, GA, USA'],
140 | [42.9097484,-85.7630885, 'Grandville, MI, USA'],
141 | [34.3020001,48.8145943, 'Malayer, Hamadan, Iran'],
142 | [39.4813156,-0.3505, 'Universitat Politècnica, 46022 Valencia, Spain']
143 | ];
144 | 


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Quiz/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/4-Using-Database-With_Python/Quiz/.DS_Store


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Quiz/Week 1.1 Using Encoded Data in Python 3.txt:
--------------------------------------------------------------------------------
 1 | 1.What is the most common Unicode encoding when moving data between systems?
 2 | ==> UTF-8
 3 | 
 4 | 2.What is the decimal (Base-10) numeric value for the upper case letter "G" in the ASCII character set?
 5 | ==> 71
 6 | 
 7 | 3.What word does the following sequence of numbers represent in ASCII:
 8 | 108, 105, 115, 116
 9 | ==> list
10 | 
11 | 4.How are strings stored internally in Python 3?
12 | ==> Unicode
13 | 
14 | 5.When reading data across the network (i.e. from a URL) in Python 3, what method must be used to convert it to the internal format used by strings?
15 | ==> decode()


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Quiz/Week 1.2 Object Oriented Programming-72.txt:
--------------------------------------------------------------------------------
 1 | 1.Which came first, the instance or the class?
 2 | ==> class
 3 | 
 4 | 2.In Object Oriented Programming, what is another name for the "attributes" of an object?
 5 | ==>fields
 6 | 
 7 | 3.At the moment of creation of a new object, Python looks at the _________ definition to define the structure and capabilities of the newly created object.
 8 | ==>class
 9 | 
10 | 4.Which of the following is NOT a good synonym for "class" in Python?
11 | ==>direction
12 | 
13 | 5.What does this Python statement do if PartyAnimal is a class?
14 |   zap = PartyAnimal()
15 | ==>Use the PartyAnimal template to make a new object and assign it to zap
16 | 
17 | 6.What is the syntax to look up the fullname attribute in an object stored in the variable colleen?
18 | ==> colleen.fullname
19 | 
20 | 7.Which of these statements is used to indicate that class A will inherit all the features of class B?
21 | ==>class A(B) :
22 | 
23 | 8.What keyword is used to indicate the start of a method in a Python class?
24 | ==>def
25 | 
26 | 9.What is "self" typically used for in a Python method within a class?
27 | ==>To refer to the instance in which the method is being called
28 | 
29 | 10.What does the Python dir() function show when we pass an object into it as a parameter?
30 | ==> It shows the methods and attributes of the object
31 | 
32 | 11.Which of the following is rarely used in Object Oriented Programming?
33 | ==>Destructor


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Quiz/Week 2 Single-Table SQL.txt:
--------------------------------------------------------------------------------
 1 | 1.Structured Query Language (SQL) is used to (check all that apply)
 2 | ==>
 3 | - Create a table
 4 | - Delete data
 5 | - Insert data
 6 | 
 7 | 2.Which of these is the right syntax to make a new table?
 8 | ==>CREATE TABLE people;
 9 | 
10 | 3.Which SQL command is used to insert a new row into a table?
11 | ==> INSERT INTO
12 | 
13 | 4.Which command is used to retrieve all records from a table?
14 | ==>SELECT * FROM Users
15 | 
16 | 5.Which keyword will cause the results of the query to be displayed in sorted order?
17 | ==>ORDER BY
18 | 
19 | 6.In database terminology, another word for table is
20 | ==>relation
21 | 
22 | 7.In a typical online production environment, who has direct access to the production database?
23 | ==>Database Administrator
24 | 
25 | 8.Which of the following is the database software used in this class?
26 | ==>SQLite
27 | 
28 | 9.What happens if a DELETE command is run on a table without a WHERE clause?
29 | ==>All the rows in the table are deleted
30 | 
31 | 10.Which of the following commands would update a column named "name" in a table named "Users"?
32 | ==>UPDATE Users SET name='new name' WHERE ...
33 | 
34 | 11.What does this SQL command do?
35 | SELECT COUNT(*) FROM Users
36 | Hint: This is not from the lecture
37 | ==>It counts the rows in the table Users


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Quiz/Week 3 Multi-Table Relational SQL.txt:
--------------------------------------------------------------------------------
 1 | 1.What is the primary added value of relational databases over flat files?
 2 | ==>Ability to scan large amounts of data quickly
 3 | 
 4 | 2.What is the purpose of a primary key?
 5 | ==>To look up a particular row in a table very quickly
 6 | 
 7 | 3.Which of the following is NOT a good rule to follow when developing a database model?
 8 | ==>Use a person's email address as their primary key
 9 | 
10 | 4.If our user interface (i.e., like iTunes) has repeated strings on one column of the user interface, how should we model this properly in a database?
11 | ==>Make a table that maps the strings in the column to numbers and then use those numbers in the column
12 | 
13 | 5.Which of the following is the label we give a column that the "outside world" uses to look up a particular row?
14 | ==>Logical key
15 | 
16 | 6.What is the label we give to a column that is an integer and used to point to a row in a different table?
17 | ==>Foreign key
18 | 
19 | 7.What SQLite keyword is added to primary keys in a CREATE TABLE statement to indicate that the database is to provide a value for the column when records are inserted?
20 | ==>AUTOINCREMENT
21 | 
22 | 8.What is the SQL keyword that reconnects rows that have foreign keys with the corresponding data in the table that the foreign key points to?
23 | ==>JOIN
24 | 
25 | 9.What happens when you JOIN two tables together without an ON clause?
26 | ==>The number of rows you get is the number of rows in the first table times the number of rows in the second table
27 | 
28 | 10.When you are doing a SELECT with a JOIN across multiple tables with identical column names, how do you distinguish the column names?
29 | ==>tablename.columnname


--------------------------------------------------------------------------------
/4-Using-Database-With_Python/Quiz/Week 4 Many-to-Many Relationships and Python.txt:
--------------------------------------------------------------------------------
 1 | 1.How do we model a many-to-many relationship between two database tables?
 2 | ==>We add a table with two foreign keys
 3 | 
 4 | 2.In Python, what is a database "cursor" most like?
 5 | ==>A file handle
 6 | 
 7 | 3.What method do you call in an SQLIte cursor object in Python to run an SQL command?
 8 | ==>execute()
 9 | 
10 | 4.In the following SQL,
11 | cur.execute('SELECT count FROM Counts WHERE org = ? ', (org, ))
12 | what is the purpose of the "?"?
13 | ==>It is a placeholder for the contents of the "org" variable
14 | 
15 | 5.In the following Python code sequence (assuming cur is a SQLite cursor object),
16 | cur.execute('SELECT count FROM Counts WHERE org = ? ', (org, ))
17 | row = cur.fetchone()
18 | what is the value in row if no rows match the WHERE clause?
19 | ==>None
20 | 
21 | 6.What does the LIMIT clause in the following SQL accomplish?
22 | SELECT org, count FROM Counts 
23 |    ORDER BY count DESC LIMIT 10
24 | ==>It only retrieves the first 10 rows from the table
25 | 
26 | 7.What does the executescript() method in the Python SQLite cursor object do that the normal execute() method does not do?
27 | ==>It allows multiple SQL statements separated by semicolons
28 | 
29 | 8.What is the purpose of "OR IGNORE" in the following SQL:
30 | INSERT OR IGNORE INTO Course (title) VALUES ( ? )
31 | ==>It makes sure that if a particular title is already in the table, there are no duplicate rows inserted
32 | 
33 | 9.For the following Python code to work, what must be added to the title column in the CREATE TABLE statement for the Course table:
34 | cur.execute('''INSERT OR IGNORE INTO Course (title)
35 |     VALUES ( ? )''', ( title, ) )
36 | cur.execute('SELECT id FROM Course WHERE title = ? ', 
37 |     (title, ))
38 | course_id = cur.fetchone()[0]
39 | ==>A UNIQUE constraint


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/.DS_Store


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/.DS_Store


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/.DS_Store


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/.idea/inspectionProfiles/Project_Default.xml:
--------------------------------------------------------------------------------
 1 | <component name="InspectionProjectProfileManager">
 2 |   <profile version="1.0">
 3 |     <option name="myName" value="Project Default" />
 4 |     <inspection_tool class="PyPep8Inspection" enabled="true" level="WEAK WARNING" enabled_by_default="true">
 5 |       <option name="ignoredErrors">
 6 |         <list>
 7 |           <option value="W29" />
 8 |           <option value="E501" />
 9 |         </list>
10 |       </option>
11 |     </inspection_tool>
12 |   </profile>
13 | </component>


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/.idea/misc.xml:
--------------------------------------------------------------------------------
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 |   <component name="ProjectRootManager" version="2" project-jdk-name="Python 2.7 (Flask_1st)" project-jdk-type="Python SDK" />
4 | </project>


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/.idea/modules.xml:
--------------------------------------------------------------------------------
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 |   <component name="ProjectModuleManager">
4 |     <modules>
5 |       <module fileurl="file://$PROJECT_DIR$/.idea/pagerank.iml" filepath="$PROJECT_DIR$/.idea/pagerank.iml" />
6 |     </modules>
7 |   </component>
8 | </project>


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/.idea/pagerank.iml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="UTF-8"?>
 2 | <module type="PYTHON_MODULE" version="4">
 3 |   <component name="NewModuleRootManager">
 4 |     <content url="file://$MODULE_DIR$" />
 5 |     <orderEntry type="jdk" jdkName="Python 2.7 (Flask_1st)" jdkType="Python SDK" />
 6 |     <orderEntry type="sourceFolder" forTests="false" />
 7 |   </component>
 8 |   <component name="TestRunnerService">
 9 |     <option name="projectConfiguration" value="py.test" />
10 |     <option name="PROJECT_TEST_RUNNER" value="py.test" />
11 |   </component>
12 | </module>


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/BeautifulSoup.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/BeautifulSoup.pyc


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2012, Michael Bostock
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 | 
 7 | * Redistributions of source code must retain the above copyright notice, this
 8 |   list of conditions and the following disclaimer.
 9 | 
10 | * Redistributions in binary form must reproduce the above copyright notice,
11 |   this list of conditions and the following disclaimer in the documentation
12 |   and/or other materials provided with the distribution.
13 | 
14 | * The name Michael Bostock may not be used to endorse or promote products
15 |   derived from this software without specific prior written permission.
16 | 
17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
18 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
20 | DISCLAIMED. IN NO EVENT SHALL MICHAEL BOSTOCK BE LIABLE FOR ANY DIRECT,
21 | INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
22 | BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
23 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24 | OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
25 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
26 | EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/README.txt:
--------------------------------------------------------------------------------
  1 | Simple Python Search Spider, Page Ranker, and Visualizer
  2 | 
  3 | This is a set of programs that emulate some of the functions of a 
  4 | search engine.  They store their data in a SQLITE3 database named
  5 | 'spider.sqlite'.  This file can be removed at any time to restart the
  6 | process.   
  7 | 
  8 | You should install the SQLite browser to view and modify 
  9 | the databases from:
 10 | 
 11 | http://sqlitebrowser.org/
 12 | 
 13 | This program crawls a web site and pulls a series of pages into the
 14 | database, recording the links between pages.
 15 | 
 16 | Mac: rm spider.sqlite
 17 | Mac: python spider.py
 18 | 
 19 | Win: del spider.sqlite
 20 | Win: spider.py
 21 | 
 22 | Enter web url or enter: http://www.dr-chuck.com/
 23 | ['http://www.dr-chuck.com']
 24 | How many pages:2
 25 | 1 http://www.dr-chuck.com/ 12
 26 | 2 http://www.dr-chuck.com/csev-blog/ 57
 27 | How many pages:
 28 | 
 29 | In this sample run, we told it to crawl a website and retrieve two 
 30 | pages.  If you restart the program again and tell it to crawl more
 31 | pages, it will not re-crawl any pages already in the database.  Upon 
 32 | restart it goes to a random non-crawled page and starts there.  So 
 33 | each successive run of spider.py is additive.
 34 | 
 35 | Mac: python spider.py 
 36 | Win: spider.py
 37 | 
 38 | Enter web url or enter: http://www.dr-chuck.com/
 39 | ['http://www.dr-chuck.com']
 40 | How many pages:3
 41 | 3 http://www.dr-chuck.com/csev-blog 57
 42 | 4 http://www.dr-chuck.com/dr-chuck/resume/speaking.htm 1
 43 | 5 http://www.dr-chuck.com/dr-chuck/resume/index.htm 13
 44 | How many pages:
 45 | 
 46 | You can have multiple starting points in the same database - 
 47 | within the program these are called "webs".   The spider
 48 | chooses randomly amongst all non-visited links across all
 49 | the webs.
 50 | 
 51 | If your code fails complainin about certificate probems, 
 52 | there is some code (SSL) that can be un-commented to work
 53 | around certificate problems.
 54 | 
 55 | If you want to dump the contents of the spider.sqlite file, you can 
 56 | run spdump.py as follows:
 57 | 
 58 | Mac: python spdump.py 
 59 | Win: spdump.py
 60 | 
 61 | (5, None, 1.0, 3, u'http://www.dr-chuck.com/csev-blog')
 62 | (3, None, 1.0, 4, u'http://www.dr-chuck.com/dr-chuck/resume/speaking.htm')
 63 | (1, None, 1.0, 2, u'http://www.dr-chuck.com/csev-blog/')
 64 | (1, None, 1.0, 5, u'http://www.dr-chuck.com/dr-chuck/resume/index.htm')
 65 | 4 rows.
 66 | 
 67 | This shows the number of incoming links, the old page rank, the new page
 68 | rank, the id of the page, and the url of the page.  The spdump.py program
 69 | only shows pages that have at least one incoming link to them.
 70 | 
 71 | Once you have a few pages in the database, you can run Page Rank on the
 72 | pages using the sprank.py program.  You simply tell it how many Page
 73 | Rank iterations to run.
 74 | 
 75 | Mac: python sprank.py 
 76 | Win: sprank.py 
 77 | 
 78 | How many iterations:2
 79 | 1 0.546848992536
 80 | 2 0.226714939664
 81 | [(1, 0.559), (2, 0.659), (3, 0.985), (4, 2.135), (5, 0.659)]
 82 | 
 83 | You can dump the database again to see that page rank has been updated:
 84 | 
 85 | Mac: python spdump.py 
 86 | Win: spdump.py 
 87 | 
 88 | (5, 1.0, 0.985, 3, u'http://www.dr-chuck.com/csev-blog')
 89 | (3, 1.0, 2.135, 4, u'http://www.dr-chuck.com/dr-chuck/resume/speaking.htm')
 90 | (1, 1.0, 0.659, 2, u'http://www.dr-chuck.com/csev-blog/')
 91 | (1, 1.0, 0.659, 5, u'http://www.dr-chuck.com/dr-chuck/resume/index.htm')
 92 | 4 rows.
 93 | 
 94 | You can run sprank.py as many times as you like and it will simply refine
 95 | the page rank the more times you run it.  You can even run sprank.py a few times
 96 | and then go spider a few more pages sith spider.py and then run sprank.py
 97 | to converge the page ranks.
 98 | 
 99 | If you want to restart the Page Rank calculations without re-spidering the 
100 | web pages, you can use spreset.py
101 | 
102 | Mac: python spreset.py 
103 | Win: spreset.py 
104 | 
105 | All pages set to a rank of 1.0
106 | 
107 | Mac: python sprank.py 
108 | Win: sprank.py 
109 | 
110 | How many iterations:50
111 | 1 0.546848992536
112 | 2 0.226714939664
113 | 3 0.0659516187242
114 | 4 0.0244199333
115 | 5 0.0102096489546
116 | 6 0.00610244329379
117 | ...
118 | 42 0.000109076928206
119 | 43 9.91987599002e-05
120 | 44 9.02151706798e-05
121 | 45 8.20451504471e-05
122 | 46 7.46150183837e-05
123 | 47 6.7857770908e-05
124 | 48 6.17124694224e-05
125 | 49 5.61236959327e-05
126 | 50 5.10410499467e-05
127 | [(512, 0.02963718031139026), (1, 12.790786721866658), (2, 28.939418898678284), (3, 6.808468390725946), (4, 13.469889092397006)]
128 | 
129 | For each iteration of the page rank algorithm it prints the average
130 | change per page of the page rank.   The network initially is quite 
131 | unbalanced and so the individual page ranks are changeing wildly.
132 | But in a few short iterations, the page rank converges.  You 
133 | should run prank.py long enough that the page ranks converge.
134 | 
135 | If you want to visualize the current top pages in terms of page rank,
136 | run spjson.py to write the pages out in JSON format to be viewed in a
137 | web browser.
138 | 
139 | Mac: python spjson.py 
140 | Win: spjson.py 
141 | 
142 | Creating JSON output on spider.js...
143 | How many nodes? 30
144 | Open force.html in a browser to view the visualization
145 | 
146 | You can view this data by opening the file force.html in your web browser.  
147 | This shows an automatic layout of the nodes and links.  You can click and 
148 | drag any node and you can also double click on a node to find the URL
149 | that is represented by the node.
150 | 
151 | This visualization is provided using the force layout from:
152 | 
153 | http://mbostock.github.com/d3/
154 | 
155 | If you rerun the other utilities and then re-run spjson.py - you merely
156 | have to press refresh in the browser to get the new data from spider.js.
157 | 
158 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/force.css:
--------------------------------------------------------------------------------
 1 | circle.node {
 2 |   stroke: #fff;
 3 |   stroke-width: 1.5px;
 4 | }
 5 | 
 6 | line.link {
 7 |   stroke: #999;
 8 |   stroke-opacity: .6;
 9 | }
10 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/force.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 |   <head>
 4 |     <title>Force-Directed Layout</title>
 5 |     <script type="text/javascript" src="d3.v2.js"></script>
 6 |     <script type="text/javascript" src="spider.js"></script>
 7 |     <link type="text/css" rel="stylesheet" href="force.css"/>
 8 |   </head>
 9 |   <body style="font-family: sans-serif;">
10 |     <script>
11 |         document.write("<p>Starting url: "+spiderJson.nodes[0].url+"</p>");
12 |     </script>
13 |     <div id="chart" style="border:1px"></div>
14 |     <script type="text/javascript" src="force.js"></script>
15 | 	<p>If you don't see a chart above, check the JavaScript console. You may
16 | 	need to use a different browser.</p>
17 |   </body>
18 | </html>
19 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/force.js:
--------------------------------------------------------------------------------
 1 | var width = 600,
 2 |     height = 600;
 3 | 
 4 | var color = d3.scale.category20();
 5 | 
 6 | var dist = (width + height) / 4;
 7 | 
 8 | var force = d3.layout.force()
 9 |     .charge(-120)
10 |     .linkDistance(dist)
11 |     .size([width, height]);
12 | 
13 | function getrank(rval) {
14 |   return (rval/2.0) + 3;
15 | }
16 | 
17 | function getcolor(rval) {
18 |   return color(rval);
19 | }
20 | 
21 | var svg = d3.select("#chart").append("svg")
22 |     .attr("width", width)
23 |     .attr("height", height);
24 | 
25 | function loadData(json) {
26 |   force
27 |       .nodes(json.nodes)
28 |       .links(json.links);
29 | 
30 |     var k = Math.sqrt(json.nodes.length / (width * height));
31 | 
32 |     force
33 |         .charge(-10 / k)
34 |         .gravity(100 * k)
35 |         .start();
36 | 
37 |   var link = svg.selectAll("line.link")
38 |       .data(json.links)
39 |       .enter().append("line")
40 |       .attr("class", "link")
41 |       .style("stroke-width", function(d) { return Math.sqrt(d.value); });
42 | 
43 |   var node = svg.selectAll("circle.node")
44 |       .data(json.nodes)
45 |       .enter().append("circle")
46 |       .attr("class", "node")
47 |       .attr("r", function(d) { return getrank(d.rank); } )
48 |       .style("fill", function(d) { return getcolor(d.rank); })
49 |       .on("dblclick",function(d) { 
50 |             if ( confirm('Do you want to open '+d.url) ) 
51 |                 window.open(d.url,'_new',''); 
52 |             d3.event.stopPropagation();
53 |         })
54 |       .call(force.drag);
55 | 
56 |   node.append("title")
57 |       .text(function(d) { return d.url; });
58 | 
59 |   force.on("tick", function() {
60 |     link.attr("x1", function(d) { return d.source.x; })
61 |         .attr("y1", function(d) { return d.source.y; })
62 |         .attr("x2", function(d) { return d.target.x; })
63 |         .attr("y2", function(d) { return d.target.y; });
64 | 
65 |     node.attr("cx", function(d) { return d.x; })
66 |         .attr("cy", function(d) { return d.y; });
67 |   });
68 | 
69 | }
70 | loadData(spiderJson);
71 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/.DS_Store


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/force.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/force.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/force_oth.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/force_oth.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/gmainC.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/gmainC.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/spdump.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/outputImages/spdump.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/pageRank.txt:
--------------------------------------------------------------------------------
  1 | Simple Python Search Spider, Page Ranker, and Visualizer
  2 | 
  3 | This is a set of programs that emulate some of the functions of a 
  4 | search engine.  They store their data in a SQLITE3 database named
  5 | 'spider.sqlite'.  This file can be removed at any time to restart the
  6 | process.   
  7 | 
  8 | You should install the SQLite browser to view and modify 
  9 | the databases from:
 10 | 
 11 | http://sqlitebrowser.org/
 12 | 
 13 | This program crawls a web site and pulls a series of pages into the
 14 | database, recording the links between pages.
 15 | 
 16 | Mac: rm spider.sqlite
 17 | Mac: python spider.py
 18 | 
 19 | Win: del spider.sqlite
 20 | Win: spider.py
 21 | 
 22 | Enter web url or enter: http://www.dr-chuck.com/
 23 | ['http://www.dr-chuck.com']
 24 | How many pages:2
 25 | 1 http://www.dr-chuck.com/ 12
 26 | 2 http://www.dr-chuck.com/csev-blog/ 57
 27 | How many pages:
 28 | 
 29 | In this sample run, we told it to crawl a website and retrieve two 
 30 | pages.  If you restart the program again and tell it to crawl more
 31 | pages, it will not re-crawl any pages already in the database.  Upon 
 32 | restart it goes to a random non-crawled page and starts there.  So 
 33 | each successive run of spider.py is additive.
 34 | 
 35 | Mac: python spider.py 
 36 | Win: spider.py
 37 | 
 38 | Enter web url or enter: http://www.dr-chuck.com/
 39 | ['http://www.dr-chuck.com']
 40 | How many pages:3
 41 | 3 http://www.dr-chuck.com/csev-blog 57
 42 | 4 http://www.dr-chuck.com/dr-chuck/resume/speaking.htm 1
 43 | 5 http://www.dr-chuck.com/dr-chuck/resume/index.htm 13
 44 | How many pages:
 45 | 
 46 | You can have multiple starting points in the same database - 
 47 | within the program these are called "webs".   The spider
 48 | chooses randomly amongst all non-visited links across all
 49 | the webs.
 50 | 
 51 | If your code fails complainin about certificate probems, 
 52 | there is some code (SSL) that can be un-commented to work
 53 | around certificate problems.
 54 | 
 55 | If you want to dump the contents of the spider.sqlite file, you can 
 56 | run spdump.py as follows:
 57 | 
 58 | Mac: python spdump.py 
 59 | Win: spdump.py
 60 | 
 61 | (5, None, 1.0, 3, u'http://www.dr-chuck.com/csev-blog')
 62 | (3, None, 1.0, 4, u'http://www.dr-chuck.com/dr-chuck/resume/speaking.htm')
 63 | (1, None, 1.0, 2, u'http://www.dr-chuck.com/csev-blog/')
 64 | (1, None, 1.0, 5, u'http://www.dr-chuck.com/dr-chuck/resume/index.htm')
 65 | 4 rows.
 66 | 
 67 | This shows the number of incoming links, the old page rank, the new page
 68 | rank, the id of the page, and the url of the page.  The spdump.py program
 69 | only shows pages that have at least one incoming link to them.
 70 | 
 71 | Once you have a few pages in the database, you can run Page Rank on the
 72 | pages using the sprank.py program.  You simply tell it how many Page
 73 | Rank iterations to run.
 74 | 
 75 | Mac: python sprank.py 
 76 | Win: sprank.py 
 77 | 
 78 | How many iterations:2
 79 | 1 0.546848992536
 80 | 2 0.226714939664
 81 | [(1, 0.559), (2, 0.659), (3, 0.985), (4, 2.135), (5, 0.659)]
 82 | 
 83 | You can dump the database again to see that page rank has been updated:
 84 | 
 85 | Mac: python spdump.py 
 86 | Win: spdump.py 
 87 | 
 88 | (5, 1.0, 0.985, 3, u'http://www.dr-chuck.com/csev-blog')
 89 | (3, 1.0, 2.135, 4, u'http://www.dr-chuck.com/dr-chuck/resume/speaking.htm')
 90 | (1, 1.0, 0.659, 2, u'http://www.dr-chuck.com/csev-blog/')
 91 | (1, 1.0, 0.659, 5, u'http://www.dr-chuck.com/dr-chuck/resume/index.htm')
 92 | 4 rows.
 93 | 
 94 | You can run sprank.py as many times as you like and it will simply refine
 95 | the page rank the more times you run it.  You can even run sprank.py a few times
 96 | and then go spider a few more pages sith spider.py and then run sprank.py
 97 | to converge the page ranks.
 98 | 
 99 | If you want to restart the Page Rank calculations without re-spidering the 
100 | web pages, you can use spreset.py
101 | 
102 | Mac: python spreset.py 
103 | Win: spreset.py 
104 | 
105 | All pages set to a rank of 1.0
106 | 
107 | Mac: python sprank.py 
108 | Win: sprank.py 
109 | 
110 | How many iterations:50
111 | 1 0.546848992536
112 | 2 0.226714939664
113 | 3 0.0659516187242
114 | 4 0.0244199333
115 | 5 0.0102096489546
116 | 6 0.00610244329379
117 | ...
118 | 42 0.000109076928206
119 | 43 9.91987599002e-05
120 | 44 9.02151706798e-05
121 | 45 8.20451504471e-05
122 | 46 7.46150183837e-05
123 | 47 6.7857770908e-05
124 | 48 6.17124694224e-05
125 | 49 5.61236959327e-05
126 | 50 5.10410499467e-05
127 | [(512, 0.02963718031139026), (1, 12.790786721866658), (2, 28.939418898678284), (3, 6.808468390725946), (4, 13.469889092397006)]
128 | 
129 | For each iteration of the page rank algorithm it prints the average
130 | change per page of the page rank.   The network initially is quite 
131 | unbalanced and so the individual page ranks are changeing wildly.
132 | But in a few short iterations, the page rank converges.  You 
133 | should run prank.py long enough that the page ranks converge.
134 | 
135 | If you want to visualize the current top pages in terms of page rank,
136 | run spjson.py to write the pages out in JSON format to be viewed in a
137 | web browser.
138 | 
139 | Mac: python spjson.py 
140 | Win: spjson.py 
141 | 
142 | Creating JSON output on spider.js...
143 | How many nodes? 30
144 | Open force.html in a browser to view the visualization
145 | 
146 | You can view this data by opening the file force.html in your web browser.  
147 | This shows an automatic layout of the nodes and links.  You can click and 
148 | drag any node and you can also double click on a node to find the URL
149 | that is represented by the node.
150 | 
151 | This visualization is provided using the force layout from:
152 | 
153 | http://mbostock.github.com/d3/
154 | 
155 | If you rerun the other utilities and then re-run spjson.py - you merely
156 | have to press refresh in the browser to get the new data from spider.js.


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/spdump.py:
--------------------------------------------------------------------------------
 1 | import sqlite3
 2 | 
 3 | conn = sqlite3.connect('spider.sqlite')
 4 | cur = conn.cursor()
 5 | 
 6 | cur.execute('''SELECT COUNT(from_id) AS inbound, old_rank, new_rank, id, url 
 7 |      FROM Pages JOIN Links ON Pages.id = Links.to_id
 8 |      WHERE html IS NOT NULL
 9 |      GROUP BY id ORDER BY inbound DESC''')
10 | 
11 | count = 0
12 | for row in cur :
13 |     if count < 50 : print row
14 |     count = count + 1
15 | print count, 'rows.'
16 | cur.close()
17 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/spider.js:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/spider.js


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/spider.py:
--------------------------------------------------------------------------------
  1 | import sqlite3
  2 | import urllib
  3 | import ssl 
  4 | from urlparse import urljoin
  5 | from urlparse import urlparse
  6 | from BeautifulSoup import *
  7 | 
  8 | # Deal with SSL certificate anomalies Python > 2.7
  9 | # scontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
 10 | scontext = None
 11 | 
 12 | conn = sqlite3.connect('spider.sqlite')
 13 | cur = conn.cursor()
 14 | 
 15 | cur.execute('''CREATE TABLE IF NOT EXISTS Pages 
 16 |     (id INTEGER PRIMARY KEY, url TEXT UNIQUE, html TEXT, 
 17 |      error INTEGER, old_rank REAL, new_rank REAL)''')
 18 | 
 19 | cur.execute('''CREATE TABLE IF NOT EXISTS Links 
 20 |     (from_id INTEGER, to_id INTEGER)''')
 21 | 
 22 | cur.execute('''CREATE TABLE IF NOT EXISTS Webs (url TEXT UNIQUE)''')
 23 | 
 24 | # Check to see if we are already in progress...
 25 | cur.execute('SELECT id,url FROM Pages WHERE html is NULL and error is NULL ORDER BY RANDOM() LIMIT 1')
 26 | row = cur.fetchone()
 27 | if row is not None:
 28 |     print "Restarting existing crawl.  Remove spider.sqlite to start a fresh crawl."
 29 | else :
 30 |     starturl = raw_input('Enter web url or enter: ')
 31 |     if ( len(starturl) < 1 ) : starturl = 'http://python-data.dr-chuck.net/'
 32 |     if ( starturl.endswith('/') ) : starturl = starturl[:-1]
 33 |     web = starturl
 34 |     if ( starturl.endswith('.htm') or starturl.endswith('.html') ) :
 35 |         pos = starturl.rfind('/')
 36 |         web = starturl[:pos]
 37 | 
 38 |     if ( len(web) > 1 ) :
 39 |         cur.execute('INSERT OR IGNORE INTO Webs (url) VALUES ( ? )', ( web, ) )
 40 |         cur.execute('INSERT OR IGNORE INTO Pages (url, html, new_rank) VALUES ( ?, NULL, 1.0 )', ( starturl, ) ) 
 41 |         conn.commit()
 42 | # http://www.dr-chuck.com/
 43 | # Get the current webs
 44 | cur.execute('''SELECT url FROM Webs''')
 45 | webs = list()
 46 | for row in cur:
 47 |     webs.append(str(row[0]))
 48 | 
 49 | print webs
 50 | 
 51 | many = 0
 52 | while True:
 53 |     if ( many < 1 ) :
 54 |         sval = raw_input('How many pages:')
 55 |         if ( len(sval) < 1 ) : break
 56 |         many = int(sval)
 57 |     many = many - 1
 58 | 
 59 |     cur.execute('SELECT id,url FROM Pages WHERE html is NULL and error is NULL ORDER BY RANDOM() LIMIT 1')
 60 |     try:
 61 |         row = cur.fetchone()
 62 |         # print row
 63 |         fromid = row[0]
 64 |         url = row[1]
 65 |     except:
 66 |         print 'No unretrieved HTML pages found'
 67 |         many = 0
 68 |         break
 69 | 
 70 |     print fromid, url, 
 71 | 
 72 |     # If we are retrieving this page, there should be no links from it
 73 |     cur.execute('DELETE from Links WHERE from_id=?', (fromid, ) )
 74 |     try:
 75 |         # Deal with SSL certificate anomalies Python > 2.7
 76 |         # scontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
 77 |         # document = urllib.urlopen(url, context=scontext)
 78 | 
 79 |         # Normal Unless you encounter certificate problems
 80 |         document = urllib.urlopen(url)
 81 | 
 82 |         html = document.read()
 83 |         if document.getcode() != 200 :
 84 |             print "Error on page: ",document.getcode()
 85 |             cur.execute('UPDATE Pages SET error=? WHERE url=?', (document.getcode(), url) )
 86 | 
 87 |         if 'text/html' != document.info().gettype() :
 88 |             print "Ignore non text/html page"
 89 |             cur.execute('UPDATE Pages SET error=-1 WHERE url=?', (url, ) )
 90 |             conn.commit()
 91 |             continue
 92 | 
 93 |         print '('+str(len(html))+')',
 94 | 
 95 |         soup = BeautifulSoup(html)
 96 |     except KeyboardInterrupt:
 97 |         print ''
 98 |         print 'Program interrupted by user...'
 99 |         break
100 |     except:
101 |         print "Unable to retrieve or parse page"
102 |         cur.execute('UPDATE Pages SET error=-1 WHERE url=?', (url, ) )
103 |         conn.commit()
104 |         continue
105 | 
106 |     cur.execute('INSERT OR IGNORE INTO Pages (url, html, new_rank) VALUES ( ?, NULL, 1.0 )', ( url, ) ) 
107 |     cur.execute('UPDATE Pages SET html=? WHERE url=?', (buffer(html), url ) )
108 |     conn.commit()
109 | 
110 |     # Retrieve all of the anchor tags
111 |     tags = soup('a')
112 |     count = 0
113 |     for tag in tags:
114 |         href = tag.get('href', None)
115 |         if ( href is None ) : continue
116 |         # Resolve relative references like href="/contact"
117 |         up = urlparse(href)
118 |         if ( len(up.scheme) < 1 ) :
119 |             href = urljoin(url, href)
120 |         ipos = href.find('#')
121 |         if ( ipos > 1 ) : href = href[:ipos]
122 |         if ( href.endswith('.png') or href.endswith('.jpg') or href.endswith('.gif') ) : continue
123 |         if ( href.endswith('/') ) : href = href[:-1]
124 |         # print href
125 |         if ( len(href) < 1 ) : continue
126 | 
127 |         # Check if the URL is in any of the webs
128 |         found = False
129 |         for web in webs:
130 |             if ( href.startswith(web) ) :
131 |                 found = True
132 |                 break
133 |         if not found : continue
134 | 
135 |         cur.execute('INSERT OR IGNORE INTO Pages (url, html, new_rank) VALUES ( ?, NULL, 1.0 )', ( href, ) ) 
136 |         count = count + 1
137 |         conn.commit()
138 | 
139 |         cur.execute('SELECT id FROM Pages WHERE url=? LIMIT 1', ( href, ))
140 |         try:
141 |             row = cur.fetchone()
142 |             toid = row[0]
143 |         except:
144 |             print 'Could not retrieve id'
145 |             continue
146 |         # print fromid, toid
147 |         cur.execute('INSERT OR IGNORE INTO Links (from_id, to_id) VALUES ( ?, ? )', ( fromid, toid ) ) 
148 | 
149 | 
150 |     print count
151 | 
152 | cur.close()
153 | 
154 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/spider.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/spider.sqlite


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/spjson.py:
--------------------------------------------------------------------------------
 1 | import sqlite3
 2 | 
 3 | conn = sqlite3.connect('spider.sqlite')
 4 | cur = conn.cursor()
 5 | 
 6 | print "Creating JSON output on spider.js..."
 7 | howmany = int(raw_input("How many nodes? "))
 8 | 
 9 | cur.execute('''SELECT COUNT(from_id) AS inbound, old_rank, new_rank, id, url 
10 |     FROM Pages JOIN Links ON Pages.id = Links.to_id
11 |     WHERE html IS NOT NULL AND ERROR IS NULL
12 |     GROUP BY id ORDER BY id,inbound''')
13 | 
14 | fhand = open('spider.js','w')
15 | nodes = list()
16 | maxrank = None
17 | minrank = None
18 | for row in cur :
19 |     nodes.append(row)
20 |     rank = row[2]
21 |     if maxrank < rank or maxrank is None : maxrank = rank
22 |     if minrank > rank or minrank is None : minrank = rank
23 |     if len(nodes) > howmany : break
24 | 
25 | if maxrank == minrank or maxrank is None or minrank is None:
26 |     print "Error - please run sprank.py to compute page rank"
27 |     quit()
28 | 
29 | fhand.write('spiderJson = {"nodes":[\n')
30 | count = 0
31 | map = dict()
32 | ranks = dict()
33 | for row in nodes :
34 |     if count > 0 : fhand.write(',\n')
35 |     # print row
36 |     rank = row[2]
37 |     rank = 19 * ( (rank - minrank) / (maxrank - minrank) ) 
38 |     fhand.write('{'+'"weight":'+str(row[0])+',"rank":'+str(rank)+',')
39 |     fhand.write(' "id":'+str(row[3])+', "url":"'+row[4]+'"}')
40 |     map[row[3]] = count
41 |     ranks[row[3]] = rank
42 |     count = count + 1
43 | fhand.write('],\n')
44 | 
45 | cur.execute('''SELECT DISTINCT from_id, to_id FROM Links''')
46 | fhand.write('"links":[\n')
47 | 
48 | count = 0
49 | for row in cur :
50 |     # print row
51 |     if row[0] not in map or row[1] not in map : continue
52 |     if count > 0 : fhand.write(',\n')
53 |     rank = ranks[row[0]]
54 |     srank = 19 * ( (rank - minrank) / (maxrank - minrank) ) 
55 |     fhand.write('{"source":'+str(map[row[0]])+',"target":'+str(map[row[1]])+',"value":3}')
56 |     count = count + 1
57 | fhand.write(']};')
58 | fhand.close()
59 | cur.close()
60 | 
61 | print "Open force.html in a browser to view the visualization"
62 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/sprank.py:
--------------------------------------------------------------------------------
  1 | import sqlite3
  2 | 
  3 | conn = sqlite3.connect('spider.sqlite')
  4 | cur = conn.cursor()
  5 | 
  6 | # Find the ids that send out page rank - we only are interested
  7 | # in pages in the SCC that have in and out links
  8 | cur.execute('''SELECT DISTINCT from_id FROM Links''')
  9 | from_ids = list()
 10 | for row in cur: 
 11 |     from_ids.append(row[0])
 12 | 
 13 | # Find the ids that receive page rank 
 14 | to_ids = list()
 15 | links = list()
 16 | cur.execute('''SELECT DISTINCT from_id, to_id FROM Links''')
 17 | for row in cur:
 18 |     from_id = row[0]
 19 |     to_id = row[1]
 20 |     if from_id == to_id : continue
 21 |     if from_id not in from_ids : continue
 22 |     if to_id not in from_ids : continue
 23 |     links.append(row)
 24 |     if to_id not in to_ids : to_ids.append(to_id)
 25 | 
 26 | # Get latest page ranks for strongly connected component
 27 | prev_ranks = dict()
 28 | for node in from_ids:
 29 |     cur.execute('''SELECT new_rank FROM Pages WHERE id = ?''', (node, ))
 30 |     row = cur.fetchone()
 31 |     prev_ranks[node] = row[0]
 32 | 
 33 | sval = raw_input('How many iterations:')
 34 | many = 1
 35 | if ( len(sval) > 0 ) : many = int(sval)
 36 | 
 37 | # Sanity check
 38 | if len(prev_ranks) < 1 : 
 39 |     print "Nothing to page rank.  Check data."
 40 |     quit()
 41 | 
 42 | # Lets do Page Rank in memory so it is really fast
 43 | for i in range(many):
 44 |     # print prev_ranks.items()[:5]
 45 |     next_ranks = dict();
 46 |     total = 0.0
 47 |     for (node, old_rank) in prev_ranks.items():
 48 |         total = total + old_rank
 49 |         next_ranks[node] = 0.0
 50 |     # print total
 51 | 
 52 |     # Find the number of outbound links and sent the page rank down each
 53 |     for (node, old_rank) in prev_ranks.items():
 54 |         # print node, old_rank
 55 |         give_ids = list()
 56 |         for (from_id, to_id) in links:
 57 |             if from_id != node : continue
 58 |            #  print '   ',from_id,to_id
 59 | 
 60 |             if to_id not in to_ids: continue
 61 |             give_ids.append(to_id)
 62 |         if ( len(give_ids) < 1 ) : continue
 63 |         amount = old_rank / len(give_ids)
 64 |         # print node, old_rank,amount, give_ids
 65 |     
 66 |         for id in give_ids:
 67 |             next_ranks[id] = next_ranks[id] + amount
 68 |     
 69 |     newtot = 0
 70 |     for (node, next_rank) in next_ranks.items():
 71 |         newtot = newtot + next_rank
 72 |     evap = (total - newtot) / len(next_ranks)
 73 | 
 74 |     # print newtot, evap
 75 |     for node in next_ranks:
 76 |         next_ranks[node] = next_ranks[node] + evap
 77 | 
 78 |     newtot = 0
 79 |     for (node, next_rank) in next_ranks.items():
 80 |         newtot = newtot + next_rank
 81 | 
 82 |     # Compute the per-page average change from old rank to new rank
 83 |     # As indication of convergence of the algorithm
 84 |     totdiff = 0
 85 |     for (node, old_rank) in prev_ranks.items():
 86 |         new_rank = next_ranks[node]
 87 |         diff = abs(old_rank-new_rank)
 88 |         totdiff = totdiff + diff
 89 | 
 90 |     avediff = totdiff / len(prev_ranks)
 91 |     print i+1, avediff
 92 | 
 93 |     # rotate
 94 |     prev_ranks = next_ranks
 95 | 
 96 | # Put the final ranks back into the database
 97 | print next_ranks.items()[:5]
 98 | cur.execute('''UPDATE Pages SET old_rank=new_rank''')
 99 | for (id, new_rank) in next_ranks.items() :
100 |     cur.execute('''UPDATE Pages SET new_rank=? WHERE id=?''', (new_rank, id))
101 | conn.commit()
102 | cur.close()
103 | 
104 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 1 pagerank/spreset.py:
--------------------------------------------------------------------------------
 1 | import sqlite3
 2 | 
 3 | conn = sqlite3.connect('spider.sqlite')
 4 | cur = conn.cursor()
 5 | 
 6 | cur.execute('''UPDATE Pages SET new_rank=1.0, old_rank=0.0''')
 7 | conn.commit()
 8 | 
 9 | cur.close()
10 | 
11 | print "All pages set to a rank of 1.0"
12 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/.DS_Store


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gbasic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gbasic.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gline.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gmain.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gmain.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gmodel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gmodel.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gword.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/OutputImages/gword.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/README.txt:
--------------------------------------------------------------------------------
  1 | Analyzing an EMAIL Archive vizualizing the data using the 
  2 | D3 JavaScript library
  3 | 
  4 | Here is a copy of the Sakai Developer Mailing list from 2006-2014.
  5 | 
  6 | http://mbox.dr-chuck.net/
  7 | 
  8 | You should install the SQLite browser to view and modify the databases from:
  9 | 
 10 | http://sqlitebrowser.org/
 11 | 
 12 | The base URL is hard-coded in the gmane.py.  Make sure to delete the 
 13 | content.sqlite file if you switch the base url.  The gmane.py file 
 14 | operates as a spider in that it runs slowly and retrieves one mail 
 15 | message per second so as to avoid getting throttled.   It stores all of
 16 | its data in a database and can be interrupted and re-started 
 17 | as often as needed.   It may take many hours to pull all the data
 18 | down.  So you may need to restart several times.
 19 | 
 20 | To give you a head-start, I have put up 600MB of pre-spidered Sakai 
 21 | email here:
 22 | 
 23 | https://online.dr-chuck.com/files/sakai/email/content.sqlite.zip
 24 | 
 25 | If you download and unzip this, you can "catch up with the 
 26 | latest" by running gmane.py.
 27 | 
 28 | Navigate to the folder where you extracted the gmane.zip
 29 | 
 30 | Here is a run of gmane.py getting the last five messages of the
 31 | sakai developer list:
 32 | 
 33 | Mac: python gmane.py 
 34 | Win: gmane.py 
 35 | 
 36 | How many messages:10
 37 | http://mbox.dr-chuck.net/sakai.devel/5/6 9443
 38 |     john@caret.cam.ac.uk 2005-12-09T13:32:29+00:00 re: lms/vle rants/comments
 39 | http://mbox.dr-chuck.net/sakai.devel/6/7 3586
 40 |     s-githens@northwestern.edu 2005-12-09T13:32:31-06:00 re: sakaiportallogin and presense
 41 | http://mbox.dr-chuck.net/sakai.devel/7/8 10600
 42 |     john@caret.cam.ac.uk 2005-12-09T13:42:24+00:00 re: lms/vle rants/comments
 43 | 
 44 | The program scans content.sqlite from 1 up to the first message number not
 45 | already spidered and starts spidering at that message.  It continues spidering
 46 | until it has spidered the desired number of messages or it reaches a page
 47 | that does not appear to be a properly formatted message.
 48 | 
 49 | Sometimes there is missing a message.  Perhaps administrators can delete messages
 50 | or perhaps they get lost - I don't know.   If your spider stops, and it seems it has hit
 51 | a missing message, go into the SQLite Manager and add a row with the missing id - leave
 52 | all the other fields blank - and then restart gmane.py.   This will unstick the 
 53 | spidering process and allow it to continue.  These empty messages will be ignored in the next
 54 | phase of the process.
 55 | 
 56 | One nice thing is that once you have spidered all of the messages and have them in 
 57 | content.sqlite, you can run gmane.py again to get new messages as they get sent to the
 58 | list.  gmane.py will quickly scan to the end of the already-spidered pages and check 
 59 | if there are new messages and then quickly retrieve those messages and add them 
 60 | to content.sqlite.
 61 | 
 62 | The content.sqlite data is pretty raw, with an innefficient data model, and not compressed.
 63 | This is intentional as it allows you to look at content.sqlite to debug the process.
 64 | It would be a bad idea to run any queries against this database as they would be 
 65 | slow.
 66 | 
 67 | The second process is running the program gmodel.py.  gmodel.py reads the rough/raw 
 68 | data from content.sqlite and produces a cleaned-up and well-modeled version of the 
 69 | data in the file index.sqlite.  The file index.sqlite will be much smaller (often 10X
 70 | smaller) than content.sqlite because it also compresses the header and body text.
 71 | 
 72 | Each time gmodel.py runs - it completely wipes out and re-builds index.sqlite, allowing
 73 | you to adjust its parameters and edit the mapping tables in content.sqlite to tweak the 
 74 | data cleaning process.
 75 | 
 76 | Running gmodel.py works as follows:
 77 | 
 78 | Mac: python gmodel.py
 79 | Win: gmodel.py
 80 | 
 81 | Loaded allsenders 1588 and mapping 28 dns mapping 1
 82 | 1 2005-12-08T23:34:30-06:00 ggolden22@mac.com
 83 | 251 2005-12-22T10:03:20-08:00 tpamsler@ucdavis.edu
 84 | 501 2006-01-12T11:17:34-05:00 lance@indiana.edu
 85 | 751 2006-01-24T11:13:28-08:00 vrajgopalan@ucmerced.edu
 86 | ...
 87 | 
 88 | The gmodel.py program does a number of data cleaing steps
 89 | 
 90 | Domain names are truncated to two levels for .com, .org, .edu, and .net 
 91 | other domain names are truncated to three levels.  So si.umich.edu becomes
 92 | umich.edu and caret.cam.ac.uk becomes cam.ac.uk.   Also mail addresses are
 93 | forced to lower case and some of the @gmane.org address like the following
 94 | 
 95 |    arwhyte-63aXycvo3TyHXe+LvDLADg@public.gmane.org
 96 | 
 97 | are converted to the real address whenever there is a matching real email
 98 | address elsewhere in the message corpus.
 99 | 
100 | If you look in the content.sqlite database there are two tables that allow
101 | you to map both domain names and individual email addresses that change over 
102 | the lifetime of the email list.  For example, Steve Githens used the following
103 | email addresses over the life of the Sakai developer list:
104 | 
105 | s-githens@northwestern.edu
106 | sgithens@cam.ac.uk
107 | swgithen@mtu.edu
108 | 
109 | We can add two entries to the Mapping table
110 | 
111 | s-githens@northwestern.edu ->  swgithen@mtu.edu
112 | sgithens@cam.ac.uk -> swgithen@mtu.edu
113 | 
114 | And so all the mail messages will be collected under one sender even if 
115 | they used several email addresses over the lifetime of the mailing list.
116 | 
117 | You can also make similar entries in the DNSMapping table if there are multiple
118 | DNS names you want mapped to a single DNS.  In the Sakai data I add the following
119 | mapping:
120 | 
121 | iupui.edu -> indiana.edu
122 | 
123 | So all the folks from the various Indiana University campuses are tracked together
124 | 
125 | You can re-run the gmodel.py over and over as you look at the data, and add mappings
126 | to make the data cleaner and cleaner.   When you are done, you will have a nicely
127 | indexed version of the email in index.sqlite.   This is the file to use to do data
128 | analysis.   With this file, data analysis will be really quick.
129 | 
130 | The first, simplest data analysis is to do a "who does the most" and "which 
131 | organzation does the most"?  This is done using gbasic.py:
132 | 
133 | Mac: python gbasic.py 
134 | Win: gbasic.py 
135 | 
136 | How many to dump? 5
137 | Loaded messages= 51330 subjects= 25033 senders= 1584
138 | 
139 | Top 5 Email list participants
140 | steve.swinsburg@gmail.com 2657
141 | azeckoski@unicon.net 1742
142 | ieb@tfd.co.uk 1591
143 | csev@umich.edu 1304
144 | david.horwitz@uct.ac.za 1184
145 | 
146 | Top 5 Email list organizations
147 | gmail.com 7339
148 | umich.edu 6243
149 | uct.ac.za 2451
150 | indiana.edu 2258
151 | unicon.net 2055
152 | 
153 | You can look at the data in index.sqlite and if you find a problem, you 
154 | can update the Mapping table and DNSMapping table in content.sqlite and
155 | re-run gmodel.py.
156 | 
157 | There is a simple vizualization of the word frequence in the subject lines
158 | in the file gword.py:
159 | 
160 | Mac: python gword.py
161 | Win: gword.py
162 | 
163 | Range of counts: 33229 129
164 | Output written to gword.js
165 | 
166 | This produces the file gword.js which you can visualize using the file 
167 | gword.htm.
168 | 
169 | A second visualization is in gline.py.  It visualizes email participation by 
170 | organizations over time.
171 | 
172 | Mac: python gline.py 
173 | Win: gline.py 
174 | 
175 | Loaded messages= 51330 subjects= 25033 senders= 1584
176 | Top 10 Oranizations
177 | ['gmail.com', 'umich.edu', 'uct.ac.za', 'indiana.edu', 'unicon.net', 'tfd.co.uk', 'berkeley.edu', 'longsight.com', 'stanford.edu', 'ox.ac.uk']
178 | Output written to gline.js
179 | 
180 | Its output is written to gline.js which is visualized using gline.htm.
181 | 
182 | Some URLs for visualization ideas:
183 | 
184 | https://developers.google.com/chart/
185 | 
186 | https://developers.google.com/chart/interactive/docs/gallery/motionchart
187 | 
188 | https://code.google.com/apis/ajax/playground/?type=visualization#motion_chart_time_formats
189 | 
190 | https://developers.google.com/chart/interactive/docs/gallery/annotatedtimeline
191 | 
192 | http://bost.ocks.org/mike/uberdata/
193 | 
194 | http://mbostock.github.io/d3/talk/20111018/calendar.html
195 | 
196 | http://nltk.org/install.html
197 | 
198 | As always - comments welcome.
199 | 
200 | -- Dr. Chuck
201 | Sun Sep 29 00:11:01 EDT 2013
202 | 
203 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/content.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/content.sqlite


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/d3.layout.cloud.js:
--------------------------------------------------------------------------------
  1 | // Word cloud layout by Jason Davies, http://www.jasondavies.com/word-cloud/
  2 | // Algorithm due to Jonathan Feinberg, http://static.mrfeinberg.com/bv_ch03.pdf
  3 | (function(exports) {
  4 |   function cloud() {
  5 |     var size = [256, 256],
  6 |         text = cloudText,
  7 |         font = cloudFont,
  8 |         fontSize = cloudFontSize,
  9 |         fontStyle = cloudFontNormal,
 10 |         fontWeight = cloudFontNormal,
 11 |         rotate = cloudRotate,
 12 |         padding = cloudPadding,
 13 |         spiral = archimedeanSpiral,
 14 |         words = [],
 15 |         timeInterval = Infinity,
 16 |         event = d3.dispatch("word", "end"),
 17 |         timer = null,
 18 |         cloud = {};
 19 | 
 20 |     cloud.start = function() {
 21 |       var board = zeroArray((size[0] >> 5) * size[1]),
 22 |           bounds = null,
 23 |           n = words.length,
 24 |           i = -1,
 25 |           tags = [],
 26 |           data = words.map(function(d, i) {
 27 |             d.text = text.call(this, d, i);
 28 |             d.font = font.call(this, d, i);
 29 |             d.style = fontStyle.call(this, d, i);
 30 |             d.weight = fontWeight.call(this, d, i);
 31 |             d.rotate = rotate.call(this, d, i);
 32 |             d.size = ~~fontSize.call(this, d, i);
 33 |             d.padding = cloudPadding.call(this, d, i);
 34 |             return d;
 35 |           }).sort(function(a, b) { return b.size - a.size; });
 36 | 
 37 |       if (timer) clearInterval(timer);
 38 |       timer = setInterval(step, 0);
 39 |       step();
 40 | 
 41 |       return cloud;
 42 | 
 43 |       function step() {
 44 |         var start = +new Date,
 45 |             d;
 46 |         while (+new Date - start < timeInterval && ++i < n && timer) {
 47 |           d = data[i];
 48 |           d.x = (size[0] * (Math.random() + .5)) >> 1;
 49 |           d.y = (size[1] * (Math.random() + .5)) >> 1;
 50 |           cloudSprite(d, data, i);
 51 |           if (place(board, d, bounds)) {
 52 |             tags.push(d);
 53 |             event.word(d);
 54 |             if (bounds) cloudBounds(bounds, d);
 55 |             else bounds = [{x: d.x + d.x0, y: d.y + d.y0}, {x: d.x + d.x1, y: d.y + d.y1}];
 56 |             // Temporary hack
 57 |             d.x -= size[0] >> 1;
 58 |             d.y -= size[1] >> 1;
 59 |           }
 60 |         }
 61 |         if (i >= n) {
 62 |           cloud.stop();
 63 |           event.end(tags, bounds);
 64 |         }
 65 |       }
 66 |     }
 67 | 
 68 |     cloud.stop = function() {
 69 |       if (timer) {
 70 |         clearInterval(timer);
 71 |         timer = null;
 72 |       }
 73 |       return cloud;
 74 |     };
 75 | 
 76 |     cloud.timeInterval = function(x) {
 77 |       if (!arguments.length) return timeInterval;
 78 |       timeInterval = x == null ? Infinity : x;
 79 |       return cloud;
 80 |     };
 81 | 
 82 |     function place(board, tag, bounds) {
 83 |       var perimeter = [{x: 0, y: 0}, {x: size[0], y: size[1]}],
 84 |           startX = tag.x,
 85 |           startY = tag.y,
 86 |           maxDelta = Math.sqrt(size[0] * size[0] + size[1] * size[1]),
 87 |           s = spiral(size),
 88 |           dt = Math.random() < .5 ? 1 : -1,
 89 |           t = -dt,
 90 |           dxdy,
 91 |           dx,
 92 |           dy;
 93 | 
 94 |       while (dxdy = s(t += dt)) {
 95 |         dx = ~~dxdy[0];
 96 |         dy = ~~dxdy[1];
 97 | 
 98 |         if (Math.min(dx, dy) > maxDelta) break;
 99 | 
100 |         tag.x = startX + dx;
101 |         tag.y = startY + dy;
102 | 
103 |         if (tag.x + tag.x0 < 0 || tag.y + tag.y0 < 0 ||
104 |             tag.x + tag.x1 > size[0] || tag.y + tag.y1 > size[1]) continue;
105 |         // TODO only check for collisions within current bounds.
106 |         if (!bounds || !cloudCollide(tag, board, size[0])) {
107 |           if (!bounds || collideRects(tag, bounds)) {
108 |             var sprite = tag.sprite,
109 |                 w = tag.width >> 5,
110 |                 sw = size[0] >> 5,
111 |                 lx = tag.x - (w << 4),
112 |                 sx = lx & 0x7f,
113 |                 msx = 32 - sx,
114 |                 h = tag.y1 - tag.y0,
115 |                 x = (tag.y + tag.y0) * sw + (lx >> 5),
116 |                 last;
117 |             for (var j = 0; j < h; j++) {
118 |               last = 0;
119 |               for (var i = 0; i <= w; i++) {
120 |                 board[x + i] |= (last << msx) | (i < w ? (last = sprite[j * w + i]) >>> sx : 0);
121 |               }
122 |               x += sw;
123 |             }
124 |             delete tag.sprite;
125 |             return true;
126 |           }
127 |         }
128 |       }
129 |       return false;
130 |     }
131 | 
132 |     cloud.words = function(x) {
133 |       if (!arguments.length) return words;
134 |       words = x;
135 |       return cloud;
136 |     };
137 | 
138 |     cloud.size = function(x) {
139 |       if (!arguments.length) return size;
140 |       size = [+x[0], +x[1]];
141 |       return cloud;
142 |     };
143 | 
144 |     cloud.font = function(x) {
145 |       if (!arguments.length) return font;
146 |       font = d3.functor(x);
147 |       return cloud;
148 |     };
149 | 
150 |     cloud.fontStyle = function(x) {
151 |       if (!arguments.length) return fontStyle;
152 |       fontStyle = d3.functor(x);
153 |       return cloud;
154 |     };
155 | 
156 |     cloud.fontWeight = function(x) {
157 |       if (!arguments.length) return fontWeight;
158 |       fontWeight = d3.functor(x);
159 |       return cloud;
160 |     };
161 | 
162 |     cloud.rotate = function(x) {
163 |       if (!arguments.length) return rotate;
164 |       rotate = d3.functor(x);
165 |       return cloud;
166 |     };
167 | 
168 |     cloud.text = function(x) {
169 |       if (!arguments.length) return text;
170 |       text = d3.functor(x);
171 |       return cloud;
172 |     };
173 | 
174 |     cloud.spiral = function(x) {
175 |       if (!arguments.length) return spiral;
176 |       spiral = spirals[x + ""] || x;
177 |       return cloud;
178 |     };
179 | 
180 |     cloud.fontSize = function(x) {
181 |       if (!arguments.length) return fontSize;
182 |       fontSize = d3.functor(x);
183 |       return cloud;
184 |     };
185 | 
186 |     cloud.padding = function(x) {
187 |       if (!arguments.length) return padding;
188 |       padding = d3.functor(x);
189 |       return cloud;
190 |     };
191 | 
192 |     return d3.rebind(cloud, event, "on");
193 |   }
194 | 
195 |   function cloudText(d) {
196 |     return d.text;
197 |   }
198 | 
199 |   function cloudFont() {
200 |     return "serif";
201 |   }
202 | 
203 |   function cloudFontNormal() {
204 |     return "normal";
205 |   }
206 | 
207 |   function cloudFontSize(d) {
208 |     return Math.sqrt(d.value);
209 |   }
210 | 
211 |   function cloudRotate() {
212 |     return (~~(Math.random() * 6) - 3) * 30;
213 |   }
214 | 
215 |   function cloudPadding() {
216 |     return 1;
217 |   }
218 | 
219 |   // Fetches a monochrome sprite bitmap for the specified text.
220 |   // Load in batches for speed.
221 |   function cloudSprite(d, data, di) {
222 |     if (d.sprite) return;
223 |     c.clearRect(0, 0, (cw << 5) / ratio, ch / ratio);
224 |     var x = 0,
225 |         y = 0,
226 |         maxh = 0,
227 |         n = data.length;
228 |     di--;
229 |     while (++di < n) {
230 |       d = data[di];
231 |       c.save();
232 |       c.font = d.style + " " + d.weight + " " + ~~((d.size + 1) / ratio) + "px " + d.font;
233 |       var w = c.measureText(d.text + "m").width * ratio,
234 |           h = d.size << 1;
235 |       if (d.rotate) {
236 |         var sr = Math.sin(d.rotate * cloudRadians),
237 |             cr = Math.cos(d.rotate * cloudRadians),
238 |             wcr = w * cr,
239 |             wsr = w * sr,
240 |             hcr = h * cr,
241 |             hsr = h * sr;
242 |         w = (Math.max(Math.abs(wcr + hsr), Math.abs(wcr - hsr)) + 0x1f) >> 5 << 5;
243 |         h = ~~Math.max(Math.abs(wsr + hcr), Math.abs(wsr - hcr));
244 |       } else {
245 |         w = (w + 0x1f) >> 5 << 5;
246 |       }
247 |       if (h > maxh) maxh = h;
248 |       if (x + w >= (cw << 5)) {
249 |         x = 0;
250 |         y += maxh;
251 |         maxh = 0;
252 |       }
253 |       if (y + h >= ch) break;
254 |       c.translate((x + (w >> 1)) / ratio, (y + (h >> 1)) / ratio);
255 |       if (d.rotate) c.rotate(d.rotate * cloudRadians);
256 |       c.fillText(d.text, 0, 0);
257 |       c.restore();
258 |       d.width = w;
259 |       d.height = h;
260 |       d.xoff = x;
261 |       d.yoff = y;
262 |       d.x1 = w >> 1;
263 |       d.y1 = h >> 1;
264 |       d.x0 = -d.x1;
265 |       d.y0 = -d.y1;
266 |       x += w;
267 |     }
268 |     var pixels = c.getImageData(0, 0, (cw << 5) / ratio, ch / ratio).data,
269 |         sprite = [];
270 |     while (--di >= 0) {
271 |       d = data[di];
272 |       var w = d.width,
273 |           w32 = w >> 5,
274 |           h = d.y1 - d.y0,
275 |           p = d.padding;
276 |       // Zero the buffer
277 |       for (var i = 0; i < h * w32; i++) sprite[i] = 0;
278 |       x = d.xoff;
279 |       if (x == null) return;
280 |       y = d.yoff;
281 |       var seen = 0,
282 |           seenRow = -1;
283 |       for (var j = 0; j < h; j++) {
284 |         for (var i = 0; i < w; i++) {
285 |           var k = w32 * j + (i >> 5),
286 |               m = pixels[((y + j) * (cw << 5) + (x + i)) << 2] ? 1 << (31 - (i % 32)) : 0;
287 |           if (p) {
288 |             if (j) sprite[k - w32] |= m;
289 |             if (j < w - 1) sprite[k + w32] |= m;
290 |             m |= (m << 1) | (m >> 1);
291 |           }
292 |           sprite[k] |= m;
293 |           seen |= m;
294 |         }
295 |         if (seen) seenRow = j;
296 |         else {
297 |           d.y0++;
298 |           h--;
299 |           j--;
300 |           y++;
301 |         }
302 |       }
303 |       d.y1 = d.y0 + seenRow;
304 |       d.sprite = sprite.slice(0, (d.y1 - d.y0) * w32);
305 |     }
306 |   }
307 | 
308 |   // Use mask-based collision detection.
309 |   function cloudCollide(tag, board, sw) {
310 |     sw >>= 5;
311 |     var sprite = tag.sprite,
312 |         w = tag.width >> 5,
313 |         lx = tag.x - (w << 4),
314 |         sx = lx & 0x7f,
315 |         msx = 32 - sx,
316 |         h = tag.y1 - tag.y0,
317 |         x = (tag.y + tag.y0) * sw + (lx >> 5),
318 |         last;
319 |     for (var j = 0; j < h; j++) {
320 |       last = 0;
321 |       for (var i = 0; i <= w; i++) {
322 |         if (((last << msx) | (i < w ? (last = sprite[j * w + i]) >>> sx : 0))
323 |             & board[x + i]) return true;
324 |       }
325 |       x += sw;
326 |     }
327 |     return false;
328 |   }
329 | 
330 |   function cloudBounds(bounds, d) {
331 |     var b0 = bounds[0],
332 |         b1 = bounds[1];
333 |     if (d.x + d.x0 < b0.x) b0.x = d.x + d.x0;
334 |     if (d.y + d.y0 < b0.y) b0.y = d.y + d.y0;
335 |     if (d.x + d.x1 > b1.x) b1.x = d.x + d.x1;
336 |     if (d.y + d.y1 > b1.y) b1.y = d.y + d.y1;
337 |   }
338 | 
339 |   function collideRects(a, b) {
340 |     return a.x + a.x1 > b[0].x && a.x + a.x0 < b[1].x && a.y + a.y1 > b[0].y && a.y + a.y0 < b[1].y;
341 |   }
342 | 
343 |   function archimedeanSpiral(size) {
344 |     var e = size[0] / size[1];
345 |     return function(t) {
346 |       return [e * (t *= .1) * Math.cos(t), t * Math.sin(t)];
347 |     };
348 |   }
349 | 
350 |   function rectangularSpiral(size) {
351 |     var dy = 4,
352 |         dx = dy * size[0] / size[1],
353 |         x = 0,
354 |         y = 0;
355 |     return function(t) {
356 |       var sign = t < 0 ? -1 : 1;
357 |       // See triangular numbers: T_n = n * (n + 1) / 2.
358 |       switch ((Math.sqrt(1 + 4 * sign * t) - sign) & 3) {
359 |         case 0:  x += dx; break;
360 |         case 1:  y += dy; break;
361 |         case 2:  x -= dx; break;
362 |         default: y -= dy; break;
363 |       }
364 |       return [x, y];
365 |     };
366 |   }
367 | 
368 |   // TODO reuse arrays?
369 |   function zeroArray(n) {
370 |     var a = [],
371 |         i = -1;
372 |     while (++i < n) a[i] = 0;
373 |     return a;
374 |   }
375 | 
376 |   var cloudRadians = Math.PI / 180,
377 |       cw = 1 << 11 >> 5,
378 |       ch = 1 << 11,
379 |       canvas,
380 |       ratio = 1;
381 | 
382 |   if (typeof document !== "undefined") {
383 |     canvas = document.createElement("canvas");
384 |     canvas.width = 1;
385 |     canvas.height = 1;
386 |     ratio = Math.sqrt(canvas.getContext("2d").getImageData(0, 0, 1, 1).data.length >> 2);
387 |     canvas.width = (cw << 5) / ratio;
388 |     canvas.height = ch / ratio;
389 |   } else {
390 |     // node-canvas support
391 |     var Canvas = require("canvas");
392 |     canvas = new Canvas(cw << 5, ch);
393 |   }
394 | 
395 |   var c = canvas.getContext("2d"),
396 |       spirals = {
397 |         archimedean: archimedeanSpiral,
398 |         rectangular: rectangularSpiral
399 |       };
400 |   c.fillStyle = "red";
401 |   c.textAlign = "center";
402 | 
403 |   exports.cloud = cloud;
404 | })(typeof exports === "undefined" ? d3.layout || (d3.layout = {}) : exports);
405 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/email .txt:
--------------------------------------------------------------------------------
  1 | Analyzing an EMAIL Archive vizualizing the data using the 
  2 | D3 JavaScript library
  3 | 
  4 | Here is a copy of the Sakai Developer Mailing list from 2006-2014.
  5 | 
  6 | http://mbox.dr-chuck.net/
  7 | 
  8 | You should install the SQLite browser to view and modify the databases from:
  9 | 
 10 | http://sqlitebrowser.org/
 11 | 
 12 | The base URL is hard-coded in the gmane.py.  Make sure to delete the 
 13 | content.sqlite file if you switch the base url.  The gmane.py file 
 14 | operates as a spider in that it runs slowly and retrieves one mail 
 15 | message per second so as to avoid getting throttled.   It stores all of
 16 | its data in a database and can be interrupted and re-started 
 17 | as often as needed.   It may take many hours to pull all the data
 18 | down.  So you may need to restart several times.
 19 | 
 20 | To give you a head-start, I have put up 600MB of pre-spidered Sakai 
 21 | email here:
 22 | 
 23 | https://online.dr-chuck.com/files/sakai/email/content.sqlite.zip
 24 | 
 25 | If you download and unzip this, you can "catch up with the 
 26 | latest" by running gmane.py.
 27 | 
 28 | Navigate to the folder where you extracted the gmane.zip
 29 | 
 30 | Here is a run of gmane.py getting the last five messages of the
 31 | sakai developer list:
 32 | 
 33 | Mac: python gmane.py 
 34 | Win: gmane.py 
 35 | 
 36 | How many messages:10
 37 | http://mbox.dr-chuck.net/sakai.devel/5/6 9443
 38 |     john@caret.cam.ac.uk 2005-12-09T13:32:29+00:00 re: lms/vle rants/comments
 39 | http://mbox.dr-chuck.net/sakai.devel/6/7 3586
 40 |     s-githens@northwestern.edu 2005-12-09T13:32:31-06:00 re: sakaiportallogin and presense
 41 | http://mbox.dr-chuck.net/sakai.devel/7/8 10600
 42 |     john@caret.cam.ac.uk 2005-12-09T13:42:24+00:00 re: lms/vle rants/comments
 43 | 
 44 | The program scans content.sqlite from 1 up to the first message number not
 45 | already spidered and starts spidering at that message.  It continues spidering
 46 | until it has spidered the desired number of messages or it reaches a page
 47 | that does not appear to be a properly formatted message.
 48 | 
 49 | Sometimes there is missing a message.  Perhaps administrators can delete messages
 50 | or perhaps they get lost - I don't know.   If your spider stops, and it seems it has hit
 51 | a missing message, go into the SQLite Manager and add a row with the missing id - leave
 52 | all the other fields blank - and then restart gmane.py.   This will unstick the 
 53 | spidering process and allow it to continue.  These empty messages will be ignored in the next
 54 | phase of the process.
 55 | 
 56 | One nice thing is that once you have spidered all of the messages and have them in 
 57 | content.sqlite, you can run gmane.py again to get new messages as they get sent to the
 58 | list.  gmane.py will quickly scan to the end of the already-spidered pages and check 
 59 | if there are new messages and then quickly retrieve those messages and add them 
 60 | to content.sqlite.
 61 | 
 62 | The content.sqlite data is pretty raw, with an innefficient data model, and not compressed.
 63 | This is intentional as it allows you to look at content.sqlite to debug the process.
 64 | It would be a bad idea to run any queries against this database as they would be 
 65 | slow.
 66 | 
 67 | The second process is running the program gmodel.py.  gmodel.py reads the rough/raw 
 68 | data from content.sqlite and produces a cleaned-up and well-modeled version of the 
 69 | data in the file index.sqlite.  The file index.sqlite will be much smaller (often 10X
 70 | smaller) than content.sqlite because it also compresses the header and body text.
 71 | 
 72 | Each time gmodel.py runs - it completely wipes out and re-builds index.sqlite, allowing
 73 | you to adjust its parameters and edit the mapping tables in content.sqlite to tweak the 
 74 | data cleaning process.
 75 | 
 76 | Running gmodel.py works as follows:
 77 | 
 78 | Mac: python gmodel.py
 79 | Win: gmodel.py
 80 | 
 81 | Loaded allsenders 1588 and mapping 28 dns mapping 1
 82 | 1 2005-12-08T23:34:30-06:00 ggolden22@mac.com
 83 | 251 2005-12-22T10:03:20-08:00 tpamsler@ucdavis.edu
 84 | 501 2006-01-12T11:17:34-05:00 lance@indiana.edu
 85 | 751 2006-01-24T11:13:28-08:00 vrajgopalan@ucmerced.edu
 86 | ...
 87 | 
 88 | The gmodel.py program does a number of data cleaing steps
 89 | 
 90 | Domain names are truncated to two levels for .com, .org, .edu, and .net 
 91 | other domain names are truncated to three levels.  So si.umich.edu becomes
 92 | umich.edu and caret.cam.ac.uk becomes cam.ac.uk.   Also mail addresses are
 93 | forced to lower case and some of the @gmane.org address like the following
 94 | 
 95 |    arwhyte-63aXycvo3TyHXe+LvDLADg@public.gmane.org
 96 | 
 97 | are converted to the real address whenever there is a matching real email
 98 | address elsewhere in the message corpus.
 99 | 
100 | If you look in the content.sqlite database there are two tables that allow
101 | you to map both domain names and individual email addresses that change over 
102 | the lifetime of the email list.  For example, Steve Githens used the following
103 | email addresses over the life of the Sakai developer list:
104 | 
105 | s-githens@northwestern.edu
106 | sgithens@cam.ac.uk
107 | swgithen@mtu.edu
108 | 
109 | We can add two entries to the Mapping table
110 | 
111 | s-githens@northwestern.edu ->  swgithen@mtu.edu
112 | sgithens@cam.ac.uk -> swgithen@mtu.edu
113 | 
114 | And so all the mail messages will be collected under one sender even if 
115 | they used several email addresses over the lifetime of the mailing list.
116 | 
117 | You can also make similar entries in the DNSMapping table if there are multiple
118 | DNS names you want mapped to a single DNS.  In the Sakai data I add the following
119 | mapping:
120 | 
121 | iupui.edu -> indiana.edu
122 | 
123 | So all the folks from the various Indiana University campuses are tracked together
124 | 
125 | You can re-run the gmodel.py over and over as you look at the data, and add mappings
126 | to make the data cleaner and cleaner.   When you are done, you will have a nicely
127 | indexed version of the email in index.sqlite.   This is the file to use to do data
128 | analysis.   With this file, data analysis will be really quick.
129 | 
130 | The first, simplest data analysis is to do a "who does the most" and "which 
131 | organzation does the most"?  This is done using gbasic.py:
132 | 
133 | Mac: python gbasic.py 
134 | Win: gbasic.py 
135 | 
136 | How many to dump? 5
137 | Loaded messages= 51330 subjects= 25033 senders= 1584
138 | 
139 | Top 5 Email list participants
140 | steve.swinsburg@gmail.com 2657
141 | azeckoski@unicon.net 1742
142 | ieb@tfd.co.uk 1591
143 | csev@umich.edu 1304
144 | david.horwitz@uct.ac.za 1184
145 | 
146 | Top 5 Email list organizations
147 | gmail.com 7339
148 | umich.edu 6243
149 | uct.ac.za 2451
150 | indiana.edu 2258
151 | unicon.net 2055
152 | 
153 | You can look at the data in index.sqlite and if you find a problem, you 
154 | can update the Mapping table and DNSMapping table in content.sqlite and
155 | re-run gmodel.py.
156 | 
157 | There is a simple vizualization of the word frequence in the subject lines
158 | in the file gword.py:
159 | 
160 | Mac: python gword.py
161 | Win: gword.py
162 | 
163 | Range of counts: 33229 129
164 | Output written to gword.js
165 | 
166 | This produces the file gword.js which you can visualize using the file 
167 | gword.htm.
168 | 
169 | A second visualization is in gline.py.  It visualizes email participation by 
170 | organizations over time.
171 | 
172 | Mac: python gline.py 
173 | Win: gline.py 
174 | 
175 | Loaded messages= 51330 subjects= 25033 senders= 1584
176 | Top 10 Oranizations
177 | ['gmail.com', 'umich.edu', 'uct.ac.za', 'indiana.edu', 'unicon.net', 'tfd.co.uk', 'berkeley.edu', 'longsight.com', 'stanford.edu', 'ox.ac.uk']
178 | Output written to gline.js
179 | 
180 | Its output is written to gline.js which is visualized using gline.htm.
181 | 
182 | Some URLs for visualization ideas:
183 | 
184 | https://developers.google.com/chart/
185 | 
186 | https://developers.google.com/chart/interactive/docs/gallery/motionchart
187 | 
188 | https://code.google.com/apis/ajax/playground/?type=visualization#motion_chart_time_formats
189 | 
190 | https://developers.google.com/chart/interactive/docs/gallery/annotatedtimeline
191 | 
192 | http://bost.ocks.org/mike/uberdata/
193 | 
194 | http://mbostock.github.io/d3/talk/20111018/calendar.html
195 | 
196 | http://nltk.org/install.html
197 | 
198 | As always - comments welcome.
199 | 
200 | -- Dr. Chuck
201 | Sun Sep 29 00:11:01 EDT 2013


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gbasic.py:
--------------------------------------------------------------------------------
 1 | import sqlite3
 2 | import time
 3 | import urllib
 4 | import zlib
 5 | 
 6 | howmany = int(raw_input("How many to dump? "))
 7 | 
 8 | conn = sqlite3.connect('index.sqlite')
 9 | conn.text_factory = str
10 | cur = conn.cursor()
11 | 
12 | cur.execute('''SELECT Messages.id, sender FROM Messages
13 |     JOIN Senders ON Messages.sender_id = Senders.id''')
14 |     
15 | sendcounts = dict()
16 | sendorgs = dict()
17 | for message in cur :
18 |     sender = message[1]
19 |     sendcounts[sender] = sendcounts.get(sender,0) + 1
20 |     pieces = sender.split("@")
21 |     if len(pieces) != 2 : continue
22 |     dns = pieces[1]
23 |     sendorgs[dns] = sendorgs.get(dns,0) + 1
24 | 
25 | print ''
26 | print 'Top',howmany,'Email list participants'
27 | 
28 | x = sorted(sendcounts, key=sendcounts.get, reverse=True)
29 | for k in x[:howmany]:
30 |     print k, sendcounts[k]
31 |     if sendcounts[k] < 10 : break
32 | 
33 | print ''
34 | print 'Top',howmany,'Email list organizations'
35 | 
36 | x = sorted(sendorgs, key=sendorgs.get, reverse=True)
37 | for k in x[:howmany]:
38 |     print k, sendorgs[k]
39 |     if sendorgs[k] < 10 : break
40 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gline.htm:
--------------------------------------------------------------------------------
 1 | <html>
 2 |   <head>
 3 |     <script type="text/javascript" src="gline.js"></script>
 4 |     <script type="text/javascript" src="https://www.google.com/jsapi"></script>
 5 |     <script type="text/javascript">
 6 |       google.load("visualization", "1", {packages:["corechart"]});
 7 |       google.setOnLoadCallback(drawChart);
 8 |       function drawChart() {
 9 |         var data = google.visualization.arrayToDataTable( gline );
10 | 
11 |         var options = {
12 |           title: 'Sakai Developer Email Participation by Organization'
13 |         };
14 | 
15 |         var chart = new google.visualization.LineChart(document.getElementById('chart_div'));
16 |         chart.draw(data, options);
17 |       }
18 |     </script>
19 |   </head>
20 |   <body>
21 |     <div id="chart_div" style="width: 1400px; height: 800px;"></div>
22 |   </body>
23 | </html>
24 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gline.js:
--------------------------------------------------------------------------------
1 | gline = [ ['Month','umich.edu','unl.edu','mac.com','columbia.edu','berkeley.edu','unicon.net','virginia.edu','hull.ac.uk','cam.ac.uk','weber.edu'],
2 | ['2005-12',25,12,10,7,6,6,6,6,5,5]
3 | ];
4 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gline.py:
--------------------------------------------------------------------------------
 1 | import sqlite3
 2 | import time
 3 | import urllib
 4 | import zlib
 5 | 
 6 | conn = sqlite3.connect('index.sqlite')
 7 | conn.text_factory = str
 8 | cur = conn.cursor()
 9 | 
10 | # Determine the top ten organizations
11 | cur.execute('''SELECT Messages.id, sender FROM Messages 
12 |     JOIN Senders ON Messages.sender_id = Senders.id''')
13 | 
14 | sendorgs = dict()
15 | for message_row in cur :
16 |     sender = message_row[1]
17 |     pieces = sender.split("@")
18 |     if len(pieces) != 2 : continue
19 |     dns = pieces[1]
20 |     sendorgs[dns] = sendorgs.get(dns,0) + 1
21 | 
22 | # pick the top schools
23 | orgs = sorted(sendorgs, key=sendorgs.get, reverse=True)
24 | orgs = orgs[:10]
25 | print "Top 10 Organizations"
26 | print orgs
27 | # orgs = ['total'] + orgs
28 | 
29 | # Read through the messages
30 | counts = dict()
31 | months = list()
32 | 
33 | cur.execute('''SELECT Messages.id, sender, sent_at FROM Messages 
34 |     JOIN Senders ON Messages.sender_id = Senders.id''')
35 | 
36 | for message_row in cur :
37 |     sender = message_row[1]
38 |     pieces = sender.split("@")
39 |     if len(pieces) != 2 : continue
40 |     dns = pieces[1]
41 |     if dns not in orgs : continue
42 |     month = message_row[2][:7]
43 |     if month not in months : months.append(month)
44 |     key = (month, dns)
45 |     counts[key] = counts.get(key,0) + 1
46 |     tkey = (month, 'total')
47 |     counts[tkey] = counts.get(tkey,0) + 1
48 |     
49 | months.sort()
50 | print counts
51 | print months
52 | 
53 | fhand = open('gline.js','w')
54 | fhand.write("gline = [ ['Month'")
55 | for org in orgs:
56 |     fhand.write(",'"+org+"'")
57 | fhand.write("]")
58 | 
59 | # for month in months[1:-1]:
60 | for month in months:
61 |     fhand.write(",\n['"+month+"'")
62 |     for org in orgs:
63 |         key = (month, org)
64 |         val = counts.get(key,0)
65 |         fhand.write(","+str(val))
66 |     fhand.write("]");
67 | 
68 | fhand.write("\n];\n")
69 | 
70 | print "Data written to gline.js"
71 | print "Open gline.htm in a browser to view"
72 | 
73 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gline2.htm:
--------------------------------------------------------------------------------
 1 | <html>
 2 |   <head>
 3 |     <script type="text/javascript" src="gline.js"></script>
 4 |     <script type="text/javascript"
 5 |         src="https://www.gstatic.com/charts/loader.js"></script>
 6 |     <script type="text/javascript">
 7 |       // https://developers.google.com/chart/interactive/docs/basic_load_libs
 8 |       google.charts.load("current", {packages:["corechart"]});
 9 |       google.charts.setOnLoadCallback(drawChart);
10 |       function drawChart() {
11 |         var data = google.visualization.arrayToDataTable( gline );
12 | 
13 |         var options = {
14 |           title: 'Sakai Developer Email Participation by Organization'
15 |         };
16 | 
17 |         var chart = new google.visualization.LineChart(document.getElementById('chart_div'));
18 |         chart.draw(data, options);
19 |       }
20 |     </script>
21 |   </head>
22 |   <body>
23 |     <div id="chart_div" style="width: 1400px; height: 800px;"></div>
24 |   </body>
25 | </html>
26 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gmane.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import sqlite3
  3 | import time
  4 | import ssl
  5 | import urllib
  6 | from urlparse import urljoin
  7 | from urlparse import urlparse
  8 | import re
  9 | from datetime import datetime, timedelta
 10 | 
 11 | # Not all systems have this so conditionally define parser
 12 | try:
 13 |     import dateutil.parser as parser
 14 | except:
 15 |     pass
 16 | 
 17 | def parsemaildate(md) :
 18 |     # See if we have dateutil
 19 |     try:
 20 |         pdate = parser.parse(tdate)
 21 |         test_at = pdate.isoformat()
 22 |         return test_at
 23 |     except:
 24 |         pass
 25 | 
 26 |     # Non-dateutil version - we try our best
 27 | 
 28 |     pieces = md.split()
 29 |     notz = " ".join(pieces[:4]).strip()
 30 | 
 31 |     # Try a bunch of format variations - strptime() is *lame*
 32 |     dnotz = None
 33 |     for form in [ '%d %b %Y %H:%M:%S', '%d %b %Y %H:%M:%S',
 34 |         '%d %b %Y %H:%M', '%d %b %Y %H:%M', '%d %b %y %H:%M:%S',
 35 |         '%d %b %y %H:%M:%S', '%d %b %y %H:%M', '%d %b %y %H:%M' ] :
 36 |         try:
 37 |             dnotz = datetime.strptime(notz, form)
 38 |             break
 39 |         except:
 40 |             continue
 41 | 
 42 |     if dnotz is None :
 43 |         # print 'Bad Date:',md
 44 |         return None
 45 | 
 46 |     iso = dnotz.isoformat()
 47 | 
 48 |     tz = "+0000"
 49 |     try:
 50 |         tz = pieces[4]
 51 |         ival = int(tz) # Only want numeric timezone values
 52 |         if tz == '-0000' : tz = '+0000'
 53 |         tzh = tz[:3]
 54 |         tzm = tz[3:]
 55 |         tz = tzh+":"+tzm
 56 |     except:
 57 |         pass
 58 | 
 59 |     return iso+tz
 60 | 
 61 | conn = sqlite3.connect('content.sqlite')
 62 | cur = conn.cursor()
 63 | conn.text_factory = str
 64 | 
 65 | baseurl = "http://mbox.dr-chuck.net/sakai.devel/"
 66 | 
 67 | cur.execute('''CREATE TABLE IF NOT EXISTS Messages 
 68 |     (id INTEGER UNIQUE, email TEXT, sent_at TEXT, 
 69 |      subject TEXT, headers TEXT, body TEXT)''')
 70 | 
 71 | start = 0
 72 | cur.execute('SELECT max(id) FROM Messages')
 73 | try:
 74 |     row = cur.fetchone()
 75 |     if row[0] is not None: 
 76 |         start = row[0]
 77 | except:
 78 |     start = 0
 79 |     row = None
 80 | 
 81 | print start
 82 | 
 83 | many = 0
 84 | 
 85 | # Skip up to five messages
 86 | skip = 5
 87 | while True:
 88 |     if ( many < 1 ) :
 89 |         sval = raw_input('How many messages:')
 90 |         if ( len(sval) < 1 ) : break
 91 |         many = int(sval)
 92 | 
 93 |     start = start + 1
 94 |     cur.execute('SELECT id FROM Messages WHERE id=?', (start,) )
 95 |     try:
 96 |         row = cur.fetchone()
 97 |         if row is not None : continue
 98 |     except:
 99 |         row = None
100 |         
101 |     many = many - 1
102 |     url = baseurl + str(start) + '/' + str(start + 1)
103 | 
104 |     try:
105 |         # Deal with SSL certificate anomalies Python > 2.7
106 | 	    # scontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
107 |         # document = urllib.urlopen(url, context=scontext)
108 | 
109 |         document = urllib.urlopen(url)
110 | 
111 |         text = document.read()
112 |         if document.getcode() != 200 :
113 |             print "Error code=",document.getcode(), url
114 |             break
115 |     except KeyboardInterrupt:
116 |         print ''
117 |         print 'Program interrupted by user...'
118 |         break
119 |     except:
120 |         print "Unable to retrieve or parse page",url
121 |         print sys.exc_info()[0]
122 |         break
123 | 
124 |     print url,len(text)
125 | 
126 |     if not text.startswith("From "):
127 |         if skip < 1 :
128 |             print text
129 |             print "End of mail stream reached..."
130 |             quit ()
131 |         print "    Skipping badly formed message"
132 |         skip = skip-1
133 |         continue
134 | 
135 |     pos = text.find("\n\n")
136 |     if pos > 0 : 
137 |         hdr = text[:pos]
138 |         body = text[pos+2:]
139 |     else:
140 |         print text
141 |         print "Could not find break between headers and body"
142 |         break
143 | 
144 |     skip = 5 # reset skip count
145 | 
146 |     email = None
147 |     x = re.findall('\nFrom: .* <(\S+@\S+)>\n', hdr)
148 |     if len(x) == 1 : 
149 |         email = x[0];
150 |         email = email.strip().lower()
151 |         email = email.replace("<","")
152 |     else:
153 |         x = re.findall('\nFrom: (\S+@\S+)\n', hdr)
154 |         if len(x) == 1 : 
155 |             email = x[0];
156 |             email = email.strip().lower()
157 |             email = email.replace("<","")
158 | 
159 |     date = None
160 |     y = re.findall('\Date: .*, (.*)\n', hdr)
161 |     if len(y) == 1 : 
162 |         tdate = y[0]
163 |         tdate = tdate[:26]
164 |         try:
165 |             sent_at = parsemaildate(tdate)
166 |         except:
167 |             print text
168 |             print "Parse fail",tdate
169 |             break
170 | 
171 |     subject = None
172 |     z = re.findall('\Subject: (.*)\n', hdr)
173 |     if len(z) == 1 : subject = z[0].strip().lower();
174 | 
175 |     print "   ",email,sent_at,subject
176 |     cur.execute('''INSERT OR IGNORE INTO Messages (id, email, sent_at, subject, headers, body) 
177 |         VALUES ( ?, ?, ?, ?, ?, ? )''', ( start, email, sent_at, subject, hdr, body))
178 | 
179 |     # Only commit every 50th record
180 |     # if (many % 50) == 0 : conn.commit() 
181 |     time.sleep(1)
182 | 
183 | conn.commit()
184 | cur.close()
185 | 
186 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gmodel.py:
--------------------------------------------------------------------------------
  1 | import sqlite3
  2 | import time
  3 | import urllib
  4 | import re
  5 | import zlib
  6 | from datetime import datetime, timedelta
  7 | # Not all systems have this
  8 | try:
  9 |     import dateutil.parser as parser 
 10 | except:
 11 |     pass
 12 | 
 13 | dnsmapping = dict()
 14 | mapping = dict()
 15 | 
 16 | def fixsender(sender,allsenders=None) :
 17 |     global dnsmapping
 18 |     global mapping
 19 |     if sender is None : return None
 20 |     sender = sender.strip().lower()
 21 |     sender = sender.replace('<','').replace('>','')
 22 | 
 23 |     # Check if we have a hacked gmane.org from address
 24 |     if allsenders is not None and sender.endswith('gmane.org') : 
 25 |         pieces = sender.split('-')
 26 |         realsender = None
 27 |         for s in allsenders:
 28 |             if s.startswith(pieces[0]) :
 29 |                 realsender = sender
 30 |                 sender = s
 31 |                 # print realsender, sender
 32 |                 break
 33 |         if realsender is None : 
 34 |             for s in mapping:
 35 |                 if s.startswith(pieces[0]) :
 36 |                     realsender = sender
 37 |                     sender = mapping[s]
 38 |                     # print realsender, sender
 39 |                     break
 40 |         if realsender is None : sender = pieces[0]
 41 | 
 42 |     mpieces = sender.split("@")
 43 |     if len(mpieces) != 2 : return sender
 44 |     dns = mpieces[1]
 45 |     x = dns
 46 |     pieces = dns.split(".")
 47 |     if dns.endswith(".edu") or dns.endswith(".com") or dns.endswith(".org") or dns.endswith(".net") :
 48 |         dns = ".".join(pieces[-2:])
 49 |     else:
 50 |         dns = ".".join(pieces[-3:])
 51 |     # if dns != x : print x,dns
 52 |     # if dns != dnsmapping.get(dns,dns) : print dns,dnsmapping.get(dns,dns)
 53 |     dns = dnsmapping.get(dns,dns)
 54 |     return mpieces[0] + '@' + dns
 55 | 
 56 | def parsemaildate(md) :
 57 |     # See if we have dateutil
 58 |     try:
 59 |         pdate = parser.parse(tdate)
 60 |         test_at = pdate.isoformat()
 61 |         return test_at
 62 |     except:
 63 |         pass
 64 | 
 65 |     # Non-dateutil version - we try our best
 66 | 
 67 |     pieces = md.split()
 68 |     notz = " ".join(pieces[:4]).strip()
 69 |    
 70 |     # Try a bunch of format variations - strptime() is *lame*
 71 |     dnotz = None
 72 |     for form in [ '%d %b %Y %H:%M:%S', '%d %b %Y %H:%M:%S', 
 73 |         '%d %b %Y %H:%M', '%d %b %Y %H:%M', '%d %b %y %H:%M:%S', 
 74 |         '%d %b %y %H:%M:%S', '%d %b %y %H:%M', '%d %b %y %H:%M' ] :
 75 |         try:
 76 |             dnotz = datetime.strptime(notz, form)
 77 |             break
 78 |         except:
 79 |             continue
 80 | 
 81 |     if dnotz is None :
 82 |         # print 'Bad Date:',md
 83 |         return None
 84 | 
 85 |     iso = dnotz.isoformat()
 86 | 
 87 |     tz = "+0000"
 88 |     try:
 89 |         tz = pieces[4]
 90 |         ival = int(tz) # Only want numeric timezone values
 91 |         if tz == '-0000' : tz = '+0000'
 92 |         tzh = tz[:3]
 93 |         tzm = tz[3:]
 94 |         tz = tzh+":"+tzm
 95 |     except:
 96 |         pass
 97 | 
 98 |     return iso+tz
 99 | 
100 | # Parse out the info...
101 | def parseheader(hdr, allsenders=None):
102 |     if hdr is None or len(hdr) < 1 : return None
103 |     sender = None
104 |     x = re.findall('\nFrom: .* <(\S+@\S+)>\n', hdr)
105 |     if len(x) >= 1 :
106 |         sender = x[0]
107 |     else:
108 |         x = re.findall('\nFrom: (\S+@\S+)\n', hdr)
109 |         if len(x) >= 1 :
110 |             sender = x[0]
111 | 
112 |     # normalize the domain name of Email addresses
113 |     sender = fixsender(sender, allsenders)
114 | 
115 |     date = None
116 |     y = re.findall('\nDate: .*, (.*)\n', hdr)
117 |     sent_at = None
118 |     if len(y) >= 1 :
119 |         tdate = y[0]
120 |         tdate = tdate[:26]
121 |         try:
122 |             sent_at = parsemaildate(tdate)
123 |         except Exception, e:
124 |             # print 'Date ignored ',tdate, e
125 |             return None
126 | 
127 |     subject = None
128 |     z = re.findall('\nSubject: (.*)\n', hdr)
129 |     if len(z) >= 1 : subject = z[0].strip().lower()
130 | 
131 |     guid = None
132 |     z = re.findall('\nMessage-ID: (.*)\n', hdr)
133 |     if len(z) >= 1 : guid = z[0].strip().lower()
134 | 
135 |     if sender is None or sent_at is None or subject is None or guid is None :
136 |         return None
137 |     return (guid, sender, subject, sent_at)
138 | 
139 | # Open the output database and create empty tables
140 | conn = sqlite3.connect('index.sqlite')
141 | conn.text_factory = str
142 | cur = conn.cursor()
143 | 
144 | cur.execute('''DROP TABLE IF EXISTS Messages ''')
145 | cur.execute('''DROP TABLE IF EXISTS Senders ''')
146 | cur.execute('''DROP TABLE IF EXISTS Subjects ''')
147 | cur.execute('''DROP TABLE IF EXISTS Replies ''')
148 | 
149 | cur.execute('''CREATE TABLE IF NOT EXISTS Messages 
150 |     (id INTEGER PRIMARY KEY, guid TEXT UNIQUE, sent_at INTEGER, 
151 |      sender_id INTEGER, subject_id INTEGER, 
152 |      headers BLOB, body BLOB)''')
153 | cur.execute('''CREATE TABLE IF NOT EXISTS Senders 
154 |     (id INTEGER PRIMARY KEY, sender TEXT UNIQUE)''')
155 | cur.execute('''CREATE TABLE IF NOT EXISTS Subjects 
156 |     (id INTEGER PRIMARY KEY, subject TEXT UNIQUE)''')
157 | cur.execute('''CREATE TABLE IF NOT EXISTS Replies 
158 |     (from_id INTEGER, to_id INTEGER)''')
159 | 
160 | # Open the mapping information
161 | conn_1 = sqlite3.connect('mapping.sqlite')
162 | conn_1.text_factory = str
163 | cur_1 = conn_1.cursor()
164 | 
165 | # Load up the mapping information into memory structures
166 | cur_1.execute('''SELECT old,new FROM DNSMapping''')
167 | for message_row in cur_1 :
168 |     dnsmapping[message_row[0].strip().lower()] = message_row[1].strip().lower()
169 | 
170 | mapping = dict()
171 | cur_1.execute('''SELECT old,new FROM Mapping''')
172 | for message_row in cur_1 :
173 |     old = fixsender(message_row[0])
174 |     new = fixsender(message_row[1])
175 |     mapping[old] = fixsender(new)
176 | 
177 | cur_1.close()
178 | 
179 | # Open the raw data retrieved from the network
180 | conn_2 = sqlite3.connect('content.sqlite')
181 | conn_2.text_factory = str
182 | cur_2 = conn_2.cursor()
183 | 
184 | allsenders = list()
185 | cur_2.execute('''SELECT email FROM Messages''')
186 | for message_row in cur_2 :
187 |     sender = fixsender(message_row[0])
188 |     if sender is None : continue
189 |     if 'gmane.org' in sender : continue
190 |     if sender in allsenders: continue
191 |     allsenders.append(sender)
192 | 
193 | print "Loaded allsenders",len(allsenders),"and mapping",len(mapping),"dns mapping",len(dnsmapping)
194 | 
195 | cur_2.execute('''SELECT headers, body, sent_at 
196 |     FROM Messages ORDER BY sent_at''')
197 | 
198 | senders = dict()
199 | subjects = dict()
200 | guids = dict()
201 | 
202 | count = 0
203 | 
204 | for message_row in cur_2 :
205 |     hdr = message_row[0]
206 |     parsed = parseheader(hdr, allsenders)
207 |     if parsed is None: continue
208 |     (guid, sender, subject, sent_at) = parsed
209 |     
210 |     # Apply the sender mapping
211 |     sender = mapping.get(sender,sender)
212 | 
213 |     count = count + 1
214 |     if count % 250 == 1 : print count,sent_at, sender
215 |     # print guid, sender, subject, sent_at
216 | 
217 |     if 'gmane.org' in sender:
218 |         print "Error in sender ===", sender
219 | 
220 |     sender_id = senders.get(sender,None)
221 |     subject_id = subjects.get(subject,None)
222 |     guid_id = guids.get(guid,None)
223 | 
224 |     if sender_id is None : 
225 |         cur.execute('INSERT OR IGNORE INTO Senders (sender) VALUES ( ? )', ( sender, ) )
226 |         conn.commit()
227 |         cur.execute('SELECT id FROM Senders WHERE sender=? LIMIT 1', ( sender, ))
228 |         try:
229 |             row = cur.fetchone()
230 |             sender_id = row[0]
231 |             senders[sender] = sender_id
232 |         except:
233 |             print 'Could not retrieve sender id',sender
234 |             break
235 |     if subject_id is None : 
236 |         cur.execute('INSERT OR IGNORE INTO Subjects (subject) VALUES ( ? )', ( subject, ) )
237 |         conn.commit()
238 |         cur.execute('SELECT id FROM Subjects WHERE subject=? LIMIT 1', ( subject, ))
239 |         try:
240 |             row = cur.fetchone()
241 |             subject_id = row[0]
242 |             subjects[subject] = subject_id
243 |         except:
244 |             print 'Could not retrieve subject id',subject
245 |             break
246 |     # print sender_id, subject_id
247 |     cur.execute('INSERT OR IGNORE INTO Messages (guid,sender_id,subject_id,sent_at,headers,body) VALUES ( ?,?,?,datetime(?),?,? )', 
248 |             ( guid, sender_id, subject_id, sent_at, zlib.compress(message_row[0]), zlib.compress(message_row[1])) )
249 |     conn.commit()
250 |     cur.execute('SELECT id FROM Messages WHERE guid=? LIMIT 1', ( guid, ))
251 |     try:
252 |         row = cur.fetchone()
253 |         message_id = row[0]
254 |         guids[guid] = message_id
255 |     except:
256 |         print 'Could not retrieve guid id',guid
257 |         break
258 | 
259 | # Close the connections
260 | cur.close()
261 | cur_2.close()
262 | 
263 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gword.htm:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <meta charset="utf-8">
 3 | <script src="d3.v2.js"></script>
 4 | <script src="d3.layout.cloud.js"></script>
 5 | <script src="gword.js"></script>
 6 | <body>
 7 | <script>
 8 |   var fill = d3.scale.category20();
 9 | 
10 |   d3.layout.cloud().size([700, 700])
11 |       .words(gword)
12 |       .rotate(function() { return ~~(Math.random() * 2) * 90; })
13 |       .font("Impact")
14 |       .fontSize(function(d) { return d.size; })
15 |       .on("end", draw)
16 |       .start();
17 | 
18 |   function draw(words) {
19 |     d3.select("body").append("svg")
20 |         .attr("width", 700)
21 |         .attr("height", 700)
22 |       .append("g")
23 |         .attr("transform", "translate(350,350)")
24 |       .selectAll("text")
25 |         .data(words)
26 |       .enter().append("text")
27 |         .style("font-size", function(d) { return d.size + "px"; })
28 |         .style("font-family", "Impact")
29 |         .style("fill", function(d, i) { return fill(i); })
30 |         .attr("text-anchor", "middle")
31 |         .attr("transform", function(d) {
32 |           return "translate(" + [d.x, d.y] + ")rotate(" + d.rotate + ")";
33 |         })
34 |         .text(function(d) { return d.text; });
35 |   }
36 | </script>
37 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gword.js:
--------------------------------------------------------------------------------
  1 | gword = [{text: 'sakai', size: 100},
  2 | {text: 'with', size: 57},
  3 | {text: 'error', size: 51},
  4 | {text: 'password', size: 45},
  5 | {text: 'forgotten', size: 45},
  6 | {text: 'feature', size: 45},
  7 | {text: 'mysql', size: 38},
  8 | {text: 'apis', size: 31},
  9 | {text: 'section', size: 30},
 10 | {text: 'problem', size: 30},
 11 | {text: 'site', size: 30},
 12 | {text: 'collab', size: 30},
 13 | {text: 'webdav', size: 28},
 14 | {text: 'memory', size: 28},
 15 | {text: 'taxonomy', size: 27},
 16 | {text: 'worksite', size: 27},
 17 | {text: 'nosuchbeandefinitionexception', size: 25},
 18 | {text: 'resources', size: 25},
 19 | {text: 'sectionmanager', size: 25},
 20 | {text: 'creating', size: 25},
 21 | {text: 'maven', size: 25},
 22 | {text: 'austin', size: 25},
 23 | {text: 'tool', size: 25},
 24 | {text: 'manager', size: 24},
 25 | {text: 'provider', size: 24},
 26 | {text: 'regarding', size: 24},
 27 | {text: 'level', size: 24},
 28 | {text: 'schedule', size: 24},
 29 | {text: 'question', size: 24},
 30 | {text: 'sakaiportallogin', size: 24},
 31 | {text: 'related', size: 24},
 32 | {text: 'high', size: 24},
 33 | {text: 'other', size: 24},
 34 | {text: 'presense', size: 24},
 35 | {text: 'displayed', size: 22},
 36 | {text: 'cannot', size: 22},
 37 | {text: 'document', size: 22},
 38 | {text: 'page', size: 22},
 39 | {text: 'examples', size: 22},
 40 | {text: 'tools', size: 22},
 41 | {text: 'internet', size: 22},
 42 | {text: 'email', size: 22},
 43 | {text: 'accessing', size: 22},
 44 | {text: 'lmsvle', size: 22},
 45 | {text: 'recordings', size: 22},
 46 | {text: 'address', size: 22},
 47 | {text: 'configuration', size: 22},
 48 | {text: 'presentations', size: 22},
 49 | {text: 'samigo', size: 22},
 50 | {text: 'rantscomments', size: 22},
 51 | {text: 'username', size: 22},
 52 | {text: 'http', size: 22},
 53 | {text: 'problems', size: 22},
 54 | {text: 'oracle', size: 22},
 55 | {text: 'audio', size: 22},
 56 | {text: 'planning', size: 21},
 57 | {text: 'converting', size: 21},
 58 | {text: 'tables', size: 21},
 59 | {text: 'breakage', size: 21},
 60 | {text: 'stovepipe', size: 21},
 61 | {text: 'picker', size: 21},
 62 | {text: 'denied', size: 21},
 63 | {text: 'nonlegacy', size: 21},
 64 | {text: 'update', size: 21},
 65 | {text: 'news', size: 21},
 66 | {text: 'urls', size: 21},
 67 | {text: 'wiki', size: 21},
 68 | {text: 'firefox', size: 21},
 69 | {text: 'conference', size: 21},
 70 | {text: 'from', size: 21},
 71 | {text: 'anyone', size: 21},
 72 | {text: 'translation', size: 21},
 73 | {text: 'future', size: 21},
 74 | {text: 'file', size: 21},
 75 | {text: 'conversion', size: 21},
 76 | {text: 'permission', size: 21},
 77 | {text: 'developers', size: 21},
 78 | {text: 'explorer', size: 21},
 79 | {text: 'myfaces', size: 21},
 80 | {text: 'jira', size: 20},
 81 | {text: 'code', size: 20},
 82 | {text: 'courserosteruser', size: 20},
 83 | {text: 'entity', size: 20},
 84 | {text: 'group', size: 20},
 85 | {text: 'clarification', size: 20},
 86 | {text: 'ldap', size: 20},
 87 | {text: 'song', size: 20},
 88 | {text: 'dynamic', size: 20},
 89 | {text: 'break', size: 20},
 90 | {text: 'report', size: 20},
 91 | {text: 'renamed', size: 20},
 92 | {text: 'release', size: 20},
 93 | {text: 'simplified', size: 20},
 94 | {text: 'direct', size: 20},
 95 | {text: 'library', size: 20},
 96 | {text: 'zero', size: 20},
 97 | {text: 'export', size: 20},
 98 | {text: 'logo', size: 20},
 99 | {text: 'preferences', size: 20},
100 | {text: 'import', size: 20}
101 | ];
102 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gword.py:
--------------------------------------------------------------------------------
 1 | import sqlite3
 2 | import time
 3 | import urllib
 4 | import zlib
 5 | import string
 6 | 
 7 | conn = sqlite3.connect('index.sqlite')
 8 | conn.text_factory = str
 9 | cur = conn.cursor()
10 | 
11 | cur.execute('''SELECT subject_id,subject FROM Messages
12 |     JOIN Subjects ON Messages.subject_id = Subjects.id''')
13 | 
14 | counts = dict()
15 | for message_row in cur :
16 |     text = message_row[1]
17 |     text = text.translate(None, string.punctuation)
18 |     text = text.translate(None, '1234567890')
19 |     text = text.strip()
20 |     text = text.lower()
21 |     words = text.split()
22 |     for word in words:
23 |         if len(word) < 4 : continue
24 |         counts[word] = counts.get(word,0) + 1
25 | 
26 | # Find the top 100 words
27 | words = sorted(counts, key=counts.get, reverse=True)
28 | highest = None
29 | lowest = None
30 | for w in words[:100]:
31 |     if highest is None or highest < counts[w] :
32 |         highest = counts[w]
33 |     if lowest is None or lowest > counts[w] :
34 |         lowest = counts[w]
35 | print 'Range of counts:',highest,lowest
36 | 
37 | # Spread the font sizes across 20-100 based on the count
38 | bigsize = 80
39 | smallsize = 20
40 | 
41 | fhand = open('gword.js','w')
42 | fhand.write("gword = [")
43 | first = True
44 | for k in words[:100]:
45 |     if not first : fhand.write( ",\n")
46 |     first = False
47 |     size = counts[k]
48 |     size = (size - lowest) / float(highest - lowest)
49 |     size = int((size * bigsize) + smallsize)
50 |     fhand.write("{text: '"+k+"', size: "+str(size)+"}")
51 | fhand.write( "\n];\n")
52 | 
53 | print "Output written to gword.js"
54 | print "Open gword.htm in a browser to view"
55 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/gyear.py:
--------------------------------------------------------------------------------
 1 | import sqlite3
 2 | import time
 3 | import urllib
 4 | import zlib
 5 | 
 6 | conn = sqlite3.connect('index.sqlite')
 7 | conn.text_factory = str
 8 | cur = conn.cursor()
 9 | 
10 | # Determine the top ten organizations
11 | cur.execute('''SELECT Messages.id, sender FROM Messages 
12 |     JOIN Senders ON Messages.sender_id = Senders.id''')
13 | 
14 | sendorgs = dict()
15 | for message_row in cur :
16 |     sender = message_row[1]
17 |     pieces = sender.split("@")
18 |     if len(pieces) != 2 : continue
19 |     dns = pieces[1]
20 |     sendorgs[dns] = sendorgs.get(dns,0) + 1
21 | 
22 | # pick the top schools
23 | orgs = sorted(sendorgs, key=sendorgs.get, reverse=True)
24 | orgs = orgs[:10]
25 | print "Top 10 Organizations"
26 | print orgs
27 | # orgs = ['total'] + orgs
28 | 
29 | # Read through the messages
30 | counts = dict()
31 | years = list()
32 | 
33 | cur.execute('''SELECT Messages.id, sender, sent_at FROM Messages 
34 |     JOIN Senders ON Messages.sender_id = Senders.id''')
35 | 
36 | for message_row in cur :
37 |     sender = message_row[1]
38 |     pieces = sender.split("@")
39 |     if len(pieces) != 2 : continue
40 |     dns = pieces[1]
41 |     if dns not in orgs : continue
42 |     year = message_row[2][:4]
43 |     if year not in years : years.append(year)
44 |     key = (year, dns)
45 |     counts[key] = counts.get(key,0) + 1
46 |     tkey = (year, 'total')
47 |     counts[tkey] = counts.get(tkey,0) + 1
48 |     
49 | years.sort()
50 | print counts
51 | print years
52 | 
53 | fhand = open('gline.js','w')
54 | fhand.write("gline = [ ['Year'")
55 | for org in orgs:
56 |     fhand.write(",'"+org+"'")
57 | fhand.write("]")
58 | 
59 | # for year in years[1:-1]:
60 | for year in years:
61 |     fhand.write(",\n['"+year+"'")
62 |     for org in orgs:
63 |         key = (year, org)
64 |         val = counts.get(key,0)
65 |         fhand.write(","+str(val))
66 |     fhand.write("]");
67 | 
68 | fhand.write("\n];\n")
69 | 
70 | print "Data written to gline.js"
71 | print "Open gline.htm in a browser to view"
72 | 
73 | 


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/index.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/index.sqlite


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/mapping.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 2 Spidering and Modeling Email Data/mapping.sqlite


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/.DS_Store


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/.DS_Store


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/blob_serve (1).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/blob_serve (1).png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/blob_serve (2).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/blob_serve (2).png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/blob_serve (4).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/blob_serve (4).png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/blob_serve.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Kalpesh14m/Python-For-Everybody-Answers/4cd08bcbca30fe3d54c7a6e957243d2e47ab76d3/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Assignment/Assignment 3/outputImages/blob_serve.png


--------------------------------------------------------------------------------
/5-Capstone-Retrieving-Processing-And-Visualizing-Data-with-Python/Quiz/Week 1 Using Encoded Data in Python 3.txt:
--------------------------------------------------------------------------------
 1 | 1.What is the most common Unicode encoding when moving data between systems?
 2 | ==> UTF-8
 3 | 
 4 | 2.What is the ASCII character that is associated with the decimal value 42?
 5 | ==> *
 6 | 
 7 | 3.What word does the following sequence of numbers represent in ASCII:
 8 | 108, 105, 115, 116
 9 | ==> list
10 | 
11 | 4.How are strings stored internally in Python 3?
12 | ==> Unicode
13 | 
14 | 5.When reading data across the network (i.e. from a URL) in Python 3, what string method must be used to convert it to the internal format used by strings?
15 | ==> decode()
16 | 


--------------------------------------------------------------------------------