├── data
    └── README.md
├── 1-intro-R
    ├── data-link.txt
    ├── Lecture1.pdf
    ├── solutions.zip
    ├── Assignment.R
    ├── README.md
    ├── CEOmissing.csv
    ├── CEOcomp.csv
    ├── 1-5.R
    ├── 1-3.R
    ├── 1-4.R
    ├── 1-2.R
    ├── .Rapp.history
    └── 1-1.R
├── 4-graphs
    ├── Networks.pdf
    ├── code
    │   ├── exercise1_start.R
    │   ├── section5.R
    │   ├── exercise3_complete.R
    │   ├── exercise4_complete.R
    │   ├── exercise1_complete.R
    │   ├── exercise2_complete.R
    │   ├── exercise5_complete.R
    │   ├── section3.R
    │   ├── section1.R
    │   ├── section2.R
    │   └── section4.R
    └── README.md
├── 5-simulation
    ├── simjulia_slides.ppt
    ├── preassignment.jl
    ├── simjulia_examples
    │   ├── bank_01.jl
    │   ├── bank_01 (complete).jl
    │   ├── bank_06.jl
    │   ├── bank_08.jl
    │   ├── bank_06 (complete).jl
    │   ├── bank_08 (complete).jl
    │   ├── bank_11.jl
    │   └── bank_11 (complete).jl
    ├── README.md
    └── distributed.jl
├── 2-intermediate-R
    ├── FirstHalfSlides.pdf
    ├── SecondHalf slides.pdf
    ├── extractTop20.R
    ├── Carrier Names
    ├── SecondHalf_solutions.R
    ├── README.md
    ├── SecondHalf.R
    ├── FirstHalf.R
    └── prcp_pretty.csv
├── 8-project
    ├── Class 8 Column Generation.pdf
    ├── README.md
    ├── Historical_Route.csv
    └── Flight_Alaska.csv
├── 3-visualization
    ├── IAPvisualization2015.pptx
    ├── README.md
    └── pollData.csv
├── README.md
├── 7-adv-optimization
    ├── README.md
    └── Callbacks.ipynb
└── 6-nonlinear-opt
    ├── README.md
    ├── Nonlinear-JuMP.ipynb
    ├── IJulia intro.ipynb
    ├── Nonlinear-DCP.ipynb
    └── Nonlinear-DualNumbers.ipynb


/data/README.md:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/1-intro-R/data-link.txt:
--------------------------------------------------------------------------------
1 | http://www.transtats.bts.gov/Download/On_Time_On_Time_Performance_2014_9.zip


--------------------------------------------------------------------------------
/4-graphs/Networks.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joehuchette/OR-software-tools-2015/HEAD/4-graphs/Networks.pdf


--------------------------------------------------------------------------------
/1-intro-R/Lecture1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joehuchette/OR-software-tools-2015/HEAD/1-intro-R/Lecture1.pdf


--------------------------------------------------------------------------------
/1-intro-R/solutions.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joehuchette/OR-software-tools-2015/HEAD/1-intro-R/solutions.zip


--------------------------------------------------------------------------------
/5-simulation/simjulia_slides.ppt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joehuchette/OR-software-tools-2015/HEAD/5-simulation/simjulia_slides.ppt


--------------------------------------------------------------------------------
/2-intermediate-R/FirstHalfSlides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joehuchette/OR-software-tools-2015/HEAD/2-intermediate-R/FirstHalfSlides.pdf


--------------------------------------------------------------------------------
/2-intermediate-R/SecondHalf slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joehuchette/OR-software-tools-2015/HEAD/2-intermediate-R/SecondHalf slides.pdf


--------------------------------------------------------------------------------
/8-project/Class 8 Column Generation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joehuchette/OR-software-tools-2015/HEAD/8-project/Class 8 Column Generation.pdf


--------------------------------------------------------------------------------
/3-visualization/IAPvisualization2015.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joehuchette/OR-software-tools-2015/HEAD/3-visualization/IAPvisualization2015.pptx


--------------------------------------------------------------------------------
/2-intermediate-R/extractTop20.R:
--------------------------------------------------------------------------------
1 | top20 = c("ATL","LAX","ORD","DFW","DEN","JFK","SFO","CLT","LAS","PHX","MIA","IAH","EWR","MCO","SEA","MSP","DTW","BOS","PHL","LGA")


--------------------------------------------------------------------------------
/5-simulation/preassignment.jl:
--------------------------------------------------------------------------------
1 | Pkg.add("SimJulia")
2 | using SimJulia
3 | include(Pkg.dir("SimJulia") * "/test/example_1.jl")
4 | 
5 | addprocs(2)
6 | 
7 | @parallel for i in 1:2
8 | 	println("Hello from core $(myid())")
9 | end


--------------------------------------------------------------------------------
/1-intro-R/Assignment.R:
--------------------------------------------------------------------------------
 1 | # IAP 2014
 2 | # 15.S60 Software Tools for Operations Research
 3 | # Lecture 1: Introduction to R
 4 | 
 5 | # Pre-assignment
 6 | 
 7 | library(stats)
 8 | lm_test <- lm(mpg ~ hp + cyl + wt + gear, data = mtcars)
 9 | print(summary(lm_test))
10 | 
11 | 


--------------------------------------------------------------------------------
/5-simulation/simjulia_examples/bank_01.jl:
--------------------------------------------------------------------------------
 1 | using SimJulia
 2 | 
 3 | # Model components
 4 | 
 5 | function visit(customer::Process, time_in_bank::Float64)
 6 | 
 7 | end
 8 | 
 9 | # Experiment data
10 | 
11 | end_time = 100.0
12 | time_in_bank = 10.0
13 | 
14 | # Model/Experiment
15 | 
16 | 
17 | 
18 | 


--------------------------------------------------------------------------------
/2-intermediate-R/Carrier Names:
--------------------------------------------------------------------------------
 1 | 9E - Endeavor
 2 | AA - American
 3 | AS - Alaska
 4 | B6 - JetBlue
 5 | DL - Delta
 6 | EV - ExpressJet
 7 | F9 - Frontier
 8 | FL - AirTran
 9 | HA - Hawaiian
10 | MQ - Envoy
11 | OO - SkyWest
12 | UA - United
13 | US - US Airways
14 | VX - Virgin America
15 | WN - Southwest
16 | YV - Mesa
17 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # MIT 15.S60 2015
 2 | 
 3 | ## Software Tools for Operations Research
 4 | 
 5 | ### Schedule
 6 | * [Introduction to R]
 7 | * [Intermediate R]
 8 | * [Visualization in R]
 9 | * [Graphs]
10 | * [Simulations]
11 | * [Nonlinear Optimization]
12 | * [Advanced Optimization]
13 | * [Project]
14 | 
15 | ### Assignments
16 | Assignments should be submitted online via Stellar.
17 | 
18 | 


--------------------------------------------------------------------------------
/5-simulation/simjulia_examples/bank_01 (complete).jl:
--------------------------------------------------------------------------------
 1 | using SimJulia
 2 | 
 3 | # Model components
 4 | 
 5 | function visit(customer::Process, time_in_bank::Float64)
 6 | 	println("$(now(customer)) $customer Here I am")
 7 | 	hold(customer, time_in_bank) # stay in the bank
 8 | 	println("$(now(customer)) $customer I must leave")
 9 | end
10 | 
11 | # Experiment data
12 | 
13 | end_time = 100.0
14 | time_in_bank = 10.0
15 | 
16 | # Model/Experiment
17 | 
18 | sim = Simulation(uint(16)) # define environment
19 | c = Process(sim, "Ben") # define process
20 | activate(c, 5.0, visit, time_in_bank) # add process method
21 | run(sim, end_time)
22 | 


--------------------------------------------------------------------------------
/4-graphs/code/exercise1_start.R:
--------------------------------------------------------------------------------
 1 | # Split the data by the carrier; this creates a list.
 2 | spl <- split(dat, dat$Carrier)
 3 | 
 4 | # Using lapply, we will call a function on each subset of dat that
 5 | # builds a graph using the exact same split-apply-combine code we used
 6 | # before.
 7 | carrier.graphs <- lapply(spl, function(dat) {
 8 | 	# Compute "edges" by splitting on Origin/Dest pairs, computing a 1-row
 9 | 	# data frame for each, and then combining with do.call and rbind.
10 | 
11 | 	# Compute "vertices" by splitting on Origin, computing a 1-row data
12 | 	# frame for each, and then combining with do.call and rbind.
13 | 
14 | 	# Compute and return a graph g using graph.data.frame()
15 | })
16 | 
17 | 


--------------------------------------------------------------------------------
/5-simulation/simjulia_examples/bank_06.jl:
--------------------------------------------------------------------------------
 1 | using Distributions
 2 | using SimJulia
 3 | 
 4 | # Model components
 5 | 
 6 | # process method for customers
 7 | function visit(customer::Process, time_in_bank::Float64) 
 8 | 	@printf("%7.4f %s: Here I am\n", now(customer), customer)
 9 | 	hold(customer, time_in_bank)
10 | 	@printf("%7.4f %s: I must leave\n", now(customer), customer)
11 | end
12 | 
13 | # process method for source
14 | function generate(source::Process, number::Int64, mean_time_between_arrivals::Float64) 
15 | 
16 | end
17 | 
18 | # Experiment data
19 | 
20 | num_customer = 5
21 | end_time = 400.0
22 | mean_time_between_arrivals = 10.0
23 | theseed = 99999
24 | srand(theseed)
25 | 
26 | # Model/Experiment
27 | 
28 | sim = Simulation(uint(16))
29 | # define source here
30 | 
31 | run(sim, end_time)
32 | 


--------------------------------------------------------------------------------
/1-intro-R/README.md:
--------------------------------------------------------------------------------
 1 | ## Introduction to R Pre-Assignment
 2 | 
 3 | ## Installation Instructions
 4 | 
 5 | Please download and install R from [this webpage](http://cran.us.r-project.org). 
 6 | 
 7 | Once there, select your operating system:
 8 | 
 9 | -For Windows users, select "Install R for the first time" then "Download R 3.1.2 for Windows"
10 | 
11 | -For Mac users, select "R-3.1.2.pkg"
12 | 
13 | ## Assignment
14 | 
15 | Copy and paste the following lines of code to the R Console:
16 | 
17 | ```
18 | library(stats)
19 | lm_test <- lm(mpg ~ hp + cyl + wt + gear, data = mtcars)
20 | summary(lm_test)
21 | ```
22 | 
23 | Press Enter and copy the output to a .txt file. 
24 | 
25 | The first two lines of your output should look like:
26 | 
27 | ```
28 | Call:
29 | lm(formula = mpg ~ hp + cyl + wt + gear, data = mtcars)
30 | ```
31 | 
32 | ## Questions?
33 | Please e-mail jkung@mit.edu.
34 | 
35 | 


--------------------------------------------------------------------------------
/5-simulation/simjulia_examples/bank_08.jl:
--------------------------------------------------------------------------------
 1 | using Distributions
 2 | using SimJulia
 3 | 
 4 | # Model components
 5 | 
 6 | function visit(customer::Process, time_in_bank::Float64, clerk::Resource)
 7 |     
 8 | end
 9 | 
10 | function generate(source::Process, number::Int64, mean_time_between_arrivals::Float64, mean_time_in_bank::Float64, clerk::Resource)
11 | 	d_tba = Exponential(mean_time_between_arrivals)
12 | 	d_tib = Exponential(mean_time_in_bank)
13 |     # generate customers
14 | end
15 | 
16 | # Experiment data
17 | 
18 | max_number = 5
19 | max_time = 400.0
20 | mean_time_between_arrivals = 10.0
21 | mean_time_in_bank = 12.0
22 | theseed = 99999
23 | srand(theseed)
24 | 
25 | # Model/Experiment
26 | 
27 | sim = Simulation(uint(16))
28 | # create resource "k"
29 | s = Process(sim, "Source")
30 | activate(s, 0.0, generate, max_number, mean_time_between_arrivals, mean_time_in_bank, k)
31 | run(sim, max_time)
32 | 


--------------------------------------------------------------------------------
/5-simulation/README.md:
--------------------------------------------------------------------------------
 1 | # Simulation and Distributed Computing
 2 | 
 3 | This class introduces the Julia discrete-event simulation library SimJulia and teaches users how to run things like simulations in parallel on a computer with more than one processing core. 
 4 | 
 5 | ## Pre-assignment:
 6 | 
 7 | ### Update Code Repo
 8 | Update your ORC software repository (`git pull`)
 9 | 
10 | ### Download Julia
11 | If you don't have Julia already installed, check out Ian and Miles' instructions at http://www.juliaopt.org/install.pdf. IJulia is recommended, but is not strictly necessary.
12 | 
13 | ### Test Julia
14 | Run the file "preassignment.jl" in the 5-simulation/ folder and submit the output in a .txt file to Stellar. 
15 | 
16 | This class assumes a working knowledge of Julia. If you're not familiar with it, check out http://docs.julialang.org/en/release-0.3/manual/getting-started/ and http://learnxinyminutes.com/docs/julia/.
17 | 


--------------------------------------------------------------------------------
/1-intro-R/CEOmissing.csv:
--------------------------------------------------------------------------------
1 | CompanyNumber,TotalCompensation,Years,ChangeStockPrice,ChangeCompanySales,MBA1,1530,7,48,89,12,NA,6,35,19,13,602,3,9,24,04,1170,6,37,8,NA5,1086,NA,34,28,06,2536,9,NA,-16,17,300,2,-17,-17,NA8,NA,2,-15,-67,19,250,0,-52,49,010,2413,10,109,-27,111,2707,NA,44,26,112,341,1,28,-7,013,734,4,NA,-7,NA14,NA,8,16,NA,015,743,4,11,50,116,898,7,-21,-20,117,498,4,16,-24,018,NA,2,-10,64,019,1388,4,8,-58,120,898,5,28,-73,121,408,4,13,31,122,1091,NA,34,66,023,1550,7,NA,-4,124,NA,5,26,55,025,1462,7,46,10,126,1456,7,46,NA,127,1984,8,63,28,128,NA,10,12,-36,029,2021,7,48,72,130,2871,8,7,5,131,245,NA,-58,-16,132,3217,11,NA,51,133,1315,7,42,-7,034,NA,9,55,122,NA35,260,0,-54,-41,136,250,NA,-17,-35,037,718,5,23,19,138,1593,8,NA,76,NA39,1905,8,67,-48,140,NA,5,21,64,141,2253,7,46,104,142,254,0,-41,99,043,1883,NA,60,NA,144,1501,5,10,20,145,NA,0,-17,-18,046,NA,11,NA,27,147,NA,6,40,41,148,1897,8,-24,-41,NA49,1157,5,21,87,150,246,3,1,-34,0


--------------------------------------------------------------------------------
/1-intro-R/CEOcomp.csv:
--------------------------------------------------------------------------------
1 | CompanyNumber,TotalCompensation,Years,ChangeStockPrice,ChangeCompanySales,MBA1,1530,7,48,89,12,1117,6,35,19,13,602,3,9,24,04,1170,6,37,8,15,1086,6,34,28,06,2536,9,81,-16,17,300,2,-17,-17,08,670,2,-15,-67,19,250,0,-52,49,010,2413,10,109,-27,111,2707,7,44,26,112,341,1,28,-7,013,734,4,10,-7,014,2368,8,16,-4,015,743,4,11,50,116,898,7,-21,-20,117,498,4,16,-24,018,250,2,-10,64,019,1388,4,8,-58,120,898,5,28,-73,121,408,4,13,31,122,1091,6,34,66,023,1550,7,49,-4,124,832,5,26,55,025,1462,7,46,10,126,1456,7,46,-5,127,1984,8,63,28,128,1493,10,12,-36,029,2021,7,48,72,130,2871,8,7,5,131,245,0,-58,-16,132,3217,11,102,51,133,1315,7,42,-7,034,1730,9,55,122,135,260,0,-54,-41,136,250,2,-17,-35,037,718,5,23,19,138,1593,8,66,76,139,1905,8,67,-48,140,2283,5,21,64,141,2253,7,46,104,142,254,0,-41,99,043,1883,8,60,-12,144,1501,5,10,20,145,386,0,-17,-18,046,2181,11,37,27,147,1766,6,40,41,148,1897,8,-24,-41,149,1157,5,21,87,150,246,3,1,-34,0


--------------------------------------------------------------------------------
/5-simulation/simjulia_examples/bank_06 (complete).jl:
--------------------------------------------------------------------------------
 1 | using Distributions
 2 | using SimJulia
 3 | 
 4 | # Model components
 5 | 
 6 | # process method for customers
 7 | function visit(customer::Process, time_in_bank::Float64) 
 8 | 	@printf("%7.4f %s: Here I am\n", now(customer), customer)
 9 | 	hold(customer, time_in_bank)
10 | 	@printf("%7.4f %s: I must leave\n", now(customer), customer)
11 | end
12 | 
13 | # process method for source
14 | function generate(source::Process, number::Int64, mean_time_between_arrivals::Float64) 
15 | 	d = Exponential(mean_time_between_arrivals)
16 |     for i = 1:number
17 | 		c = Process(simulation(source), @sprintf("Customer%02d", i))
18 | 		activate(c, now(source), visit, 12.0)
19 | 		t = rand(d) # sample inter-arrival time "t"
20 | 		hold(source, t) # suspend source for "t" time units
21 | 	end
22 | end
23 | 
24 | # Experiment data
25 | 
26 | num_customer = 5
27 | end_time = 400.0
28 | mean_time_between_arrivals = 10.0
29 | theseed = 99999
30 | srand(theseed)
31 | 
32 | # Model/Experiment
33 | 
34 | sim = Simulation(uint(16))
35 | s = Process(sim, "Source")
36 | activate(s, 0.0, generate, num_customer, mean_time_between_arrivals)
37 | run(sim, end_time)
38 | 


--------------------------------------------------------------------------------
/4-graphs/README.md:
--------------------------------------------------------------------------------
 1 | ## Networks in R Pre-Assignment
 2 | 
 3 | ## Git update
 4 | 
 5 | Please update your git repository to get the latest version of everything.
 6 | 
 7 | ## Data setup
 8 | 
 9 | If you already have the file On_Time_On_Time_Performance_2014_9.csv (the September 2014 airline flight network), that is great. Simply copy it into the folder 4-graphs in the git repository.
10 | 
11 | If you don't already have this file, download http://www.transtats.bts.gov/Download/On_Time_On_Time_Performance_2014_9.zip and unzip it, saving On_Time_On_Time_Performance_2014_9.csv to the folder 4-graphs in the git repository.
12 | 
13 | ## Assignment
14 | 
15 | First, start R and set your working directory to the 4-graphs folder of the git repository.
16 | 
17 | To verify your data is downloaded and located properly, please run the following in R (note that the data will take a small while to load):
18 | 
19 | ```
20 | dat <- read.csv("On_Time_On_Time_Performance_2014_9.csv", stringsAsFactors=FALSE)
21 | nrow(dat)
22 | length(unique(dat$Origin))
23 | ```
24 | 
25 | Next, install the igraph package in R and run some simple commands:
26 | 
27 | ```
28 | install.packages("igraph")
29 | library(igraph)
30 | set.seed(144)
31 | max(betweenness(erdos.renyi.game(100, 0.5)))
32 | ```
33 | 
34 | Please submit the output of these two R snippits (3 total lines of output) in a .txt file on stellar.
35 | 
36 | ## Questions?
37 | Please email John Silberholz (josilber@mit.edu).


--------------------------------------------------------------------------------
/5-simulation/simjulia_examples/bank_08 (complete).jl:
--------------------------------------------------------------------------------
 1 | using Distributions
 2 | using SimJulia
 3 | 
 4 | # Model components
 5 | 
 6 | function visit(customer::Process, time_in_bank::Float64, clerk::Resource)
 7 | 	arrive = now(customer)
 8 | 	@printf("%8.3f %s: Here I am\n", arrive, customer)
 9 | 	request(customer, clerk) # waiting for the server
10 | 	wait = now(customer) - arrive
11 | 	@printf("%8.3f %s: Waited %6.3f\n", now(customer), customer, wait)
12 | 	hold(customer, time_in_bank) # using the server
13 | 	release(customer, clerk) # finish service
14 | 	@printf("%8.3f %s: Finished\n", now(customer), customer)
15 | end
16 | 
17 | function generate(source::Process, number::Int64, mean_time_between_arrivals::Float64, mean_time_in_bank::Float64, clerk::Resource)
18 | 	d_tba = Exponential(mean_time_between_arrivals)
19 | 	d_tib = Exponential(mean_time_in_bank)
20 | 	for i = 1:number
21 | 		c = Process(simulation(source), @sprintf("Customer%02d", i))
22 | 		tib = rand(d_tib)
23 | 		activate(c, now(source), visit, tib, clerk)
24 | 		tba = rand(d_tba)
25 | 		hold(source, tba)
26 | 	end
27 | end
28 | 
29 | # Experiment data
30 | 
31 | max_number = 5
32 | max_time = 400.0
33 | mean_time_between_arrivals = 10.0
34 | mean_time_in_bank = 12.0
35 | theseed = 99999
36 | srand(theseed)
37 | 
38 | # Model/Experiment
39 | 
40 | sim = Simulation(uint(16))
41 | k = Resource(sim, "Counter", uint(1), false)
42 | s = Process(sim, "Source")
43 | activate(s, 0.0, generate, max_number, mean_time_between_arrivals, mean_time_in_bank, k)
44 | run(sim, max_time)
45 | 


--------------------------------------------------------------------------------
/5-simulation/simjulia_examples/bank_11.jl:
--------------------------------------------------------------------------------
 1 | using Distributions
 2 | using SimJulia
 3 | 
 4 | # Model components
 5 | 
 6 | function visit(customer::Process, time_in_bank::Float64, clerk::Resource)
 7 | 	arrive = now(customer)
 8 | 	@printf("%8.3f %s: Here I am\n", arrive, customer)
 9 | 	request(customer, clerk) # waiting for the server
10 | 	wait = now(customer) - arrive
11 | 	@printf("%8.3f %s: Waited %6.3f\n", now(customer), customer, wait)
12 | 	hold(customer, time_in_bank) # using the server
13 | 	release(customer, clerk) # finish service
14 | 	@printf("%8.3f %s: Finished\n", now(customer), customer)
15 | end
16 | 
17 | function generate(source::Process, number::Int64, mean_time_between_arrivals::Float64, mean_time_in_bank::Float64, clerk::Resource)
18 | 	d_tba = Exponential(mean_time_between_arrivals)
19 | 	d_tib = Exponential(mean_time_in_bank)
20 | 	for i = 1:number
21 | 		c = Process(simulation(source), @sprintf("Customer%02d", i))
22 | 		tib = rand(d_tib)
23 | 		activate(c, now(source), visit, tib, clerk)
24 | 		tba = rand(d_tba)
25 | 		hold(source, tba)
26 | 	end
27 | end
28 | 
29 | # Experiment data
30 | 
31 | max_number = 5
32 | max_time = 400.0
33 | mean_time_between_arrivals = 10.0
34 | mean_time_in_bank = 12.0
35 | theseed = 99999
36 | srand(theseed)
37 | 
38 | # Model/Experiment
39 | 
40 | sim = Simulation(uint(16))
41 | k = Resource(sim, "Counter", uint(1), false) # set "false" to "true"
42 | s = Process(sim, "Source")
43 | activate(s, 0.0, generate, max_number, mean_time_between_arrivals, mean_time_in_bank, k)
44 | run(sim, max_time)
45 | 
46 | # Print result
47 | 


--------------------------------------------------------------------------------
/2-intermediate-R/SecondHalf_solutions.R:
--------------------------------------------------------------------------------
 1 | ###
 2 | #Joins assignment - solutions
 3 | ###
 4 | 
 5 | # 1) Join airport latitudes to the flight data. What was the largest change in latitude for any flight?
 6 | flights = merge(flights,latlong[,1:2],by.x="Origin",by.y="locationID")
 7 | #let's take a look at the data frame now
 8 | #see that the column we've just merged in is called "Latitude"
 9 | #but since we merged on origin, it's really the origin latitude.
10 | #So we rename it:
11 | names(flights)[match("Latitude",names(flights))]="Origin.Lat"
12 | #same for destination latitude
13 | flights = merge(flights,latlong[,1:2],by.x="Dest",by.y="locationID")
14 | names(flights)[match("Latitude",names(flights))]="Dest.Lat"
15 | flights$DiffLat = flights$Dest.Lat - flights$Origin.Lat
16 | biggest.lat.change = max(abs(flights$DiffLat))
17 | # 2) (optional) Find a flight (may not be unique) which experienced this largest change in latitude. 
18 | #    Hint: use the order() function to sort a data frame
19 | flights = flights[order(abs(flights$DiffLat),decreasing=TRUE),]
20 | flights[1,]
21 | # 3) (optional) Re-do the jet stream example using latitudes instead of longitudes.
22 | #    Is there a relationship between change in latitude and flight speed?
23 | plot(flights$DiffLat, flights$Speed,pch=".")
24 | lat.effect = cor(flights$DiffLat, flights$Speed)
25 | 
26 | 
27 | ###
28 | #Optional joins assignment
29 | ###
30 | #Is there a relationship between airport latitude and average delay ratio?
31 | airport.info = merge(airport.info, latlong[,1:2],by.x="Airport",by.y="locationID")
32 | plot(airport.info$Latitude,airport.info$Avg.delay.ratio)
33 | cor(airport.info$Latitude,airport.info$Avg.delay.ratio)
34 | 


--------------------------------------------------------------------------------
/5-simulation/simjulia_examples/bank_11 (complete).jl:
--------------------------------------------------------------------------------
 1 | using Distributions
 2 | using SimJulia
 3 | 
 4 | # Model components
 5 | 
 6 | function visit(customer::Process, time_in_bank::Float64, clerk::Resource)
 7 | 	arrive = now(customer)
 8 | 	@printf("%8.3f %s: Here I am\n", arrive, customer)
 9 | 	request(customer, clerk) # waiting for the server
10 | 	wait = now(customer) - arrive
11 | 	@printf("%8.3f %s: Waited %6.3f\n", now(customer), customer, wait)
12 | 	hold(customer, time_in_bank) # using the server
13 | 	release(customer, clerk) # finish service
14 | 	@printf("%8.3f %s: Finished\n", now(customer), customer)
15 | end
16 | 
17 | function generate(source::Process, number::Int64, mean_time_between_arrivals::Float64, mean_time_in_bank::Float64, clerk::Resource)
18 | 	d_tba = Exponential(mean_time_between_arrivals)
19 | 	d_tib = Exponential(mean_time_in_bank)
20 | 	for i = 1:number
21 | 		c = Process(simulation(source), @sprintf("Customer%02d", i))
22 | 		tib = rand(d_tib)
23 | 		activate(c, now(source), visit, tib, clerk)
24 | 		tba = rand(d_tba)
25 | 		hold(source, tba)
26 | 	end
27 | end
28 | 
29 | # Experiment data
30 | 
31 | max_number = 5
32 | max_time = 400.0
33 | mean_time_between_arrivals = 10.0
34 | mean_time_in_bank = 12.0
35 | theseed = 99999
36 | srand(theseed)
37 | 
38 | # Model/Experiment
39 | 
40 | sim = Simulation(uint(16))
41 | k = Resource(sim, "Counter", uint(1), true) # set "monitered=true"
42 | s = Process(sim, "Source")
43 | activate(s, 0.0, generate, max_number, mean_time_between_arrivals, mean_time_in_bank, k)
44 | run(sim, max_time)
45 | 
46 | # Print result
47 | println("TimeAverage no. waiting: $(time_average(wait_monitor(k)))")
48 | println("TimeAverage no. in service: $(time_average(activity_monitor(k)))")
49 | 


--------------------------------------------------------------------------------
/4-graphs/code/section5.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Section 5 -- Community Detection 
 3 | ##################################################################
 4 | 
 5 | # One of the many modularity-maximizing algorithms in spinglass.community
 6 | comm <- spinglass.community(g)
 7 | comm
 8 | str(comm)
 9 | table(comm$membership)
10 | 
11 | # Great. We'll want to plot our communities so let's actually do this
12 | # again, limiting to the continental US. We'll take this code from the
13 | # plotting bonus question and modify it for our needs, coloring airports
14 | # based on their community.
15 | g2 <- induced.subgraph(g, V(g)$Lat >= 15 & V(g)$Lat <= 50 & V(g)$Lon >= -130 & V(g)$Lon <= -60 & V(g)$Country == "United States")
16 | comm2 <- spinglass.community(g2)
17 | comm2
18 | 
19 | # Let's get some spiffy colors for our nodes -- we'll get a palette
20 | # with 5 colors from RColorBrewer, which has carefully selected palettes
21 | # where all the colors look good together.
22 | library(RColorBrewer)
23 | display.brewer.all()
24 | colors <- brewer.pal(5, "Set1")
25 | colors
26 | 
27 | # Now we can actually plot our image and check it out. We'll index within
28 | # the colors vector when we set vertex.color. We can see the benefit of
29 | # having vertex metadata instead of storing it outside the graph -- if
30 | # we hadn't stored Lon, Lat, and NumFlights as metadata we would have
31 | # needed to subset each for our continental US plot.
32 | png("section5.png", width=960, height=480)
33 | plot(g2, layout=cbind(V(g2)$Lon, V(g2)$Lat), edge.arrow.mode=0, vertex.label=NA, vertex.size=3, edge.color=ifelse(E(g2)$NumFlights >= 100, "black", NA), vertex.color=colors[comm2$membership], asp=0.5)
34 | dev.off()
35 | 


--------------------------------------------------------------------------------
/1-intro-R/1-5.R:
--------------------------------------------------------------------------------
 1 | # IAP 2015
 2 | # 15.S60 Software Tools for Operations Research
 3 | # Lecture 1: Introduction to R
 4 | 
 5 | # Script file 1-5.R
 6 | # In this script file, we cover SVMs
 7 | 
 8 | #############################
 9 | ## SUPPORT VECTOR MACHINES ##
10 | #############################
11 | 
12 | # Install and load new package
13 | install.packages("e1071")
14 | library(e1071)
15 | 
16 | # Build SVM model for iris data set (since SVM is 
17 | # easier to visualize with smaller datasets with 
18 | # continuous attributes)
19 | 
20 | # First, we want to subset the dataset to only 
21 | # keep two attributes (so we can easily visualize 
22 | # the model)
23 | 
24 | IrisDataSVM = subset(iris, select = Petal.Length:Species)
25 | 
26 | # SVM model - linear kernel
27 | IrisSVM = svm(Species ~ Petal.Length + Petal.Width, data = IrisDataSVM, kernel = "linear")
28 | 
29 | # Plot the model
30 | plot(IrisSVM, data = IrisDataSVM)
31 | 
32 | # Color of the data points indicates 
33 | # the true class; background color indicates 
34 | # prediction; X indicates a support vector
35 | 
36 | # SVM model - polynomial kernel
37 | IrisSVM = svm(Species ~ Petal.Length + Petal.Width, data = IrisDataSVM, kernel = "polynomial", degree = 3)
38 | plot(IrisSVM, data = IrisDataSVM)
39 | 
40 | # degree = degree of polynomial used. Different 
41 | # values will often give very different results.
42 | 
43 | #SVM model - radial basis kernel
44 | IrisSVM = svm(Species ~ Petal.Length + Petal.Width, data = IrisDataSVM, kernel = "radial", gamma = 10)
45 | plot(IrisSVM, data=IrisDataSVM)
46 | 
47 | # gamma controls how well the model will 
48 | # fit the data. Larger gamma will fit the data 
49 | # more exactly. Try gamma = 100 and gamma = 0.1
50 | 
51 | 
52 | 
53 | 
54 | 
55 | 
56 | 
57 | 
58 | 
59 | 
60 | 


--------------------------------------------------------------------------------
/4-graphs/code/exercise3_complete.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Exercise 3 -- Regression models over edges
 3 | ##################################################################
 4 | 
 5 | # Use linear regression to predict the proportion of delayed departures
 6 | # and arrivals for each edge. Predict using the number of flights on that
 7 | # edge, the edge betweenness, the degree of the departure and arrival
 8 | # airports on the edge, and the PageRank of the departure and arrival
 9 | # airports on the edge. Check for multicollinearity between the network
10 | # metrics.
11 | 
12 | g
13 | 
14 | emetrics <- data.frame(LateDep=E(g)$LateDep,
15 | LateArr=E(g)$LateArr,
16 | NumFlights=E(g)$NumFlights,
17 | EdgeBetweenness=edge.betweenness(g),
18 | DepDegree=degree(g)[get.edges(g, E(g))[,1]],
19 | ArrDegree=degree(g)[get.edges(g, E(g))[,2]],
20 | DepPageRank=page.rank(g)$vector[get.edges(g, E(g))[,1]],
21 | ArrPageRank=page.rank(g)$vector[get.edges(g, E(g))[,2]])
22 | 
23 | head(emetrics)
24 | 
25 | summary(lm(LateDep~NumFlights+EdgeBetweenness+DepDegree+ArrDegree+DepPageRank+ArrPageRank, data=emetrics))
26 | summary(lm(LateArr~NumFlights+EdgeBetweenness+DepDegree+ArrDegree+DepPageRank+ArrPageRank, data=emetrics))
27 | 
28 | cor(emetrics)
29 | # Looks like we have some missing data in LateArr, so let's use na.omit:
30 | cor(na.omit(emetrics))
31 | # Better be careful interpreting coefficients!
32 | 
33 | # Bonus: one airport has relatively low degree (<= 50) but relatively
34 | # high betweenness centrality (>= 500). Plot these two metrics against
35 | # each other to observe the outlier. What is the airport and why does
36 | # it have this property? Hint: you can access neighbors with ?neighbors.
37 | 
38 | plot(degree(g), betweenness(g))
39 | which(degree(g) <= 50 & betweenness(g) >= 5000)
40 | airports[airports$IATA == "ANC",]
41 | neighbors(g, "ANC")
42 | V(g)$name[neighbors(g, "ANC")]
43 | degree(g)[neighbors(g, "ANC")]
44 | 


--------------------------------------------------------------------------------
/7-adv-optimization/README.md:
--------------------------------------------------------------------------------
 1 | # Mixed-integer optimization
 2 | 
 3 | ## Preassignment
 4 | 
 5 | For this class, we will be using the Gurobi mixed-integer programming solver. 
 6 | 
 7 | ### Installing Gurobi
 8 | Gurobi is commercial software, but they have a very permissive (and free!) academic license. If you have an older version of Gurobi (>= 5.5) on your computer, that should be fine.
 9 | 
10 | 1. Go to www.gurobi.com
11 | 2. Create an account, and request an academic license.
12 | 3. Download the installer for Gurobi 6.0
13 | 4. Install Gurobi, accepting default options. Remember where it installed to!
14 | 5. Go back to the website and navigate to the page for your academic license. You'll be given a command with a big code in it, e.g. grbgetkey aaaaa-bbbb
15 | 6. In a terminal, navigate to the ``gurobi600/<operating system>/bin`` folder where ``<operating system>`` is the name of your operating system.  
16 | 7. Copy-and-paste the command from the website into the command prompt---you need to be on campus for this to work!
17 | 
18 | 
19 | ### Install the Gurobi interface in Julia
20 | 
21 | Installing this is easy using the Julia package manager: 
22 | ```jl
23 | julia> Pkg.add("Gurobi")
24 | ```
25 | 
26 | If you don't have an academic email or cannot get access for Gurobi for another reason, you should be able to follow along with the open source solver GLPK for much of the class. To install, simply do
27 | ```jl
28 | julia> Pkg.add("GLPKMathProgInterface")
29 | ```
30 | 
31 | ## Solving a simple MIP
32 | How about a simple knapsack problem? Enter the following JuMP code and submit all the output to Stellar.
33 | 
34 | ```jl
35 | using JuMP, Gurobi
36 | m = Model(solver=GurobiSolver(Presolve=0)) # turn presolve off to make it a lil more interesting
37 | N = 100
38 | @defVar(m, x[1:N], Bin)
39 | @addConstraint(m, dot(rand(N), x) <= 5)
40 | @setObjective(m, Max, dot(rand(N), x))
41 | solve(m)
42 | ```
43 | 
44 | ## Questions?
45 | Email huchette@mit.edu
46 | 


--------------------------------------------------------------------------------
/6-nonlinear-opt/README.md:
--------------------------------------------------------------------------------
 1 | # Nonlinear optimization
 2 | 
 3 | This class covers topics in nonlinear optimization. Code will be posted before the start of the class.
 4 | 
 5 | ## Pre-assignment:
 6 | 
 7 | ### Install Julia and IJulia
 8 | IJulia is required for this class. See the instructions at http://www.juliaopt.org/install.pdf. Alternatively, you may use [JuliaBox](https://juliabox.org/) to complete the assignment and follow along with the class if there's any trouble with a local installation. (Troubleshooting note: if Julia is working but IJulia is not, try running ``Pkg.build("IJulia")`` and check for reported errors.)
 9 | 
10 | ### Install packages
11 | We will use the following packages:
12 | - JuMP
13 | - Optim
14 | - Ipopt
15 | - Convex
16 | - Distributions
17 | - PyPlot
18 | - Gadfly
19 | - Interact
20 | - ECOS
21 | 
22 | First run ``Pkg.update()`` to update the package database, then install each one with ``Pkg.add("xxx")`` where ``xxx`` is the package name.
23 | 
24 | ### Test the installation
25 | 
26 | In a blank IJulia notebook, paste the following code into a cell:
27 | 
28 | ```julia
29 | import Convex
30 | x = Convex.Variable(Convex.Positive())
31 | Convex.solve!(Convex.minimize(x))
32 | Convex.evaluate(x)
33 | ```
34 | 
35 | and run it by pressing shift-enter. The result should be some iteration output from ECOS and then a small value that's very close to zero.
36 | 
37 | In the next cell, paste and run the following code:
38 | 
39 | ```julia
40 | import JuMP
41 | m = JuMP.Model()
42 | @JuMP.defVar(m, x >= 0)
43 | @JuMP.setNLObjective(m, Min, x)
44 | JuMP.solve(m)
45 | JuMP.getValue(x)
46 | ```
47 | 
48 | You should see some output from Ipopt and then the result which should be a number that's exactly or very close to zero.
49 | 
50 | (Note that we use ``import JuMP`` instead of ``using JuMP`` because there are some clashes in the names used by Convex.jl and JuMP.)
51 | 
52 | Now go to ``File -> Download as -> IPython Notebook (.ipynb)`` and save the notebook file to your computer. Submit this file to Stellar.
53 | 


--------------------------------------------------------------------------------
/4-graphs/code/exercise4_complete.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Exercise 4 -- Bond percolation
 3 | ##################################################################
 4 | # Perform uniform random bond percolation, randomly retaining proportion
 5 | # phi of edges (hint: ?subgraph.edges). As before, test a range of phi
 6 | # values and compute the normalized size of the largest component.
 7 | random.bond.percolation <- function(g, phi, reps) {
 8 | 	mean(replicate(reps, max(c(0, clusters(subgraph.edges(g, eids=sample(ecount(g), phi*ecount(g))))$csize)))) / vcount(g)
 9 | }
10 | rb.perc <- data.frame(phi=phis, perc=sapply(phis, random.bond.percolation, g=g, reps=100))
11 | plot(rb.perc)
12 | 
13 | # Perform targeted bond percolation, comparing the following strategies:
14 | # 1) Remove edges with largest minimum degree of endpoints (hint: ?pmin)
15 | # 2) Remove edges with largest edge betweenness
16 | targeted.bond.percolation1 <- function(g, phi) {
17 | 	ordering <- order(pmin(degree(g)[get.edges(g, E(g))[,1]], degree(g)[get.edges(g, E(g))[,2]]))
18 | 	max(c(0, clusters(subgraph.edges(g, head(ordering, phi*ecount(g))))$csize)) / vcount(g)
19 | }
20 | tb.perc1 <- data.frame(phi=phis, perc=sapply(phis, targeted.bond.percolation1, g=g))
21 | plot(tb.perc1)
22 | 
23 | targeted.bond.percolation2 <- function(g, phi) {
24 | 	ordering <- order(edge.betweenness(g))
25 | 	max(c(0, clusters(subgraph.edges(g, head(ordering, phi*ecount(g))))$csize)) / vcount(g)
26 | }
27 | tb.perc2 <- data.frame(phi=phis, perc=sapply(phis, targeted.bond.percolation2, g=g))
28 | plot(tb.perc2)
29 | 
30 | # Compare targeted site percolation of the Delta (DL) and Southwest (WN)
31 | # networks.
32 | ts.delta <- data.frame(phi=phis, carrier="Delta", perc=sapply(phis, targeted.site.percolation, g=carrier.graphs$DL))
33 | ts.sw <- data.frame(phi=phis, carrier="Southwest", perc=sapply(phis, targeted.site.percolation, g=carrier.graphs$WN))
34 | ggplot(rbind(ts.delta, ts.sw), aes(x=phi, y=perc, group=carrier, color=carrier)) + geom_line()
35 | 


--------------------------------------------------------------------------------
/4-graphs/code/exercise1_complete.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Exercise 1 -- Carrier-Specific Flight Networks
 3 | ##################################################################
 4 | # 
 5 | # We have computed the network for all airlines combined, but we might
 6 | # be interested in the network for each separate carrier. Create a list
 7 | # of graphs for each carrier, which can be constructed by limiting the
 8 | # set of all flights to just those from that carrier and then
 9 | # constructing the graph in the same way that we constructed the full
10 | # graph. The carrier can be found in the Carrier variable.
11 | 
12 | spl <- split(dat, dat$Carrier)
13 | carrier.graphs <- lapply(spl, function(dat) {
14 | 	e.spl <- split(dat, paste(dat$Origin, dat$Dest))
15 | 	e.spl2 <- lapply(e.spl, function(x) {
16 |   data.frame(Origin = x$Origin[1],
17 |              Dest = x$Dest[1],
18 |              NumFlights = nrow(x),
19 |              LateDep = mean(x$DepDel15, na.rm=T),
20 |              LateArr = mean(x$ArrDel15, na.rm=T),
21 |              TaxiOut = mean(x$TaxiOut, na.rm=T),
22 |              TaxiIn = mean(x$TaxiIn, na.rm=T))
23 |     })
24 | 	edges <- do.call(rbind, e.spl2)
25 | vertices <- do.call(rbind, lapply(split(dat, dat$Origin), function(x) {
26 |   data.frame(Origin = x$Origin[1],
27 |              NumFlights = nrow(x),
28 |              LateDep = mean(x$DepDel15, na.rm=T),
29 |              LateArr = mean(x$ArrDel15, na.rm=T),
30 |              TaxiOut = mean(x$TaxiOut, na.rm=T),
31 |              TaxiIn = mean(x$TaxiIn, na.rm=T))
32 | }))
33 | 	g <- graph.data.frame(edges, TRUE, vertices)
34 | 	return(g)
35 | })
36 | 
37 | # Now we can look at qualitative differences between the carriers
38 | carriers <- do.call(rbind, lapply(carrier.graphs, function(g) {
39 | 	data.frame(airports=vcount(g), density=graph.density(g))
40 | }))
41 | carriers$name <- names(carrier.graphs)
42 | carriers
43 | 
44 | # We can plot to get a better sense of this data
45 | library(ggplot2)
46 | ggplot(carriers, aes(x=airports, y=density, label=name)) + geom_text()
47 | 


--------------------------------------------------------------------------------
/2-intermediate-R/README.md:
--------------------------------------------------------------------------------
 1 | ## Intermediate R Pre-Assignment
 2 | 
 3 | __Note that ``sqldf`` requires a relatively recent version of R (at least 3.1.0). Make sure your version is up-to-date.__
 4 | 
 5 | 1. Download <http://www.transtats.bts.gov/Download/On_Time_On_Time_Performance_2013_12.zip>
 6 | 2. Extract the CSV file to your Intermediate R directory.
 7 | 3. Fire up R, change your working directory to the Intermediate R directory, and run the following (could take a few minutes):
 8 | 
 9 | --------------------------
10 | 
11 | ```R
12 | flights.raw = read.csv("On_Time_On_Time_Performance_2013_12.csv")
13 | 
14 | keep = c("DayofMonth","DayOfWeek","FlightDate","Carrier","TailNum","FlightNum","Origin","OriginCityName","OriginStateFips","OriginStateName","Dest","DestCityName","DestStateFips","DestStateName","CRSDepTime","DepTime","DepDelay","DepDelayMinutes","DepDel15","DepartureDelayGroups","DepTimeBlk","TaxiOut","WheelsOff","WheelsOn","TaxiIn","CRSArrTime","ArrTime","ArrDelay","ArrDelayMinutes","ArrDel15","ArrivalDelayGroups","ArrTimeBlk", "Cancelled","CancellationCode","CRSElapsedTime","ActualElapsedTime","AirTime","Flights","Distance","DistanceGroup","CarrierDelay","WeatherDelay","NASDelay","SecurityDelay","LateAircraftDelay")
15 | 
16 | flights = flights.raw[,keep]
17 | 
18 | write.csv(flights,"flights.csv")
19 | 
20 | install.packages("sqldf")
21 | 
22 | library(sqldf)
23 | 
24 | flights.bos = sqldf("select * from 'flights' where Origin='BOS'")
25 | ```
26 | --------------------------
27 | 
28 | __Question 1:__ what is the most common day of the week for departures in the full data set?
29 | 
30 | __Question 2:__ what is the least common day of the week for departures from Boston?
31 | 
32 | Hint: use the table() function
33 | 
34 | When you're done, you can delete the file On_Time_On_Time_Performance_2013_12.csv. Keep the csv file written during the homework.
35 | 
36 | ## Questions?
37 | 
38 | Please email efields@mit.edu
39 | 
40 | The completed flights.csv is too big for the github repo but can be downloaded from https://dl.dropboxusercontent.com/u/1877897/flights.csv
41 | 


--------------------------------------------------------------------------------
/4-graphs/code/exercise2_complete.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Exercise 2 -- Manipulating visual properties
 3 | ##################################################################
 4 | # 
 5 | # 1)  Plot the Delta Airlines network (IATA code DL) with the node size
 6 | #     scaled by the square root of the number of flights from an airport.
 7 | #     Color the Atlanta airport (ATL) red and other airports black.
 8 | # B1) Plot the full network with nodes positioned based on their
 9 | #     latitude/longitude instead of using a layout algorithm. Adjust
10 | #     edge.color to only plot edges with 100 or more flights, and mark
11 | #     the top five airports by volume (ATL, ORD, DFW, DEN, LAX) as red
12 | #     and the other as light gray.
13 | # B2) Replicate B1, limiting to the continental United States. You can
14 | #     do this by limiting the longitude range to [-130, -60], limiting
15 | #     the latitude range to [15, 50], and limiting the country to
16 | #     "United States". Plot with a 2:1 width:height ratio and the
17 | #     appropriate asp value for plot. Hint: ?induced.subgraph.
18 | 
19 | png("exercise2_1.png")
20 | dl <- carrier.graphs$DL
21 | plot(dl, layout=layout.lgl(dl), edge.arrow.mode=0, vertex.label=NA, vertex.size=sqrt(V(dl)$NumFlights)/5, vertex.color=ifelse(V(dl)$name == "ATL", "red", "black"))
22 | dev.off()
23 | 
24 | png("exercise2_b1.png")
25 | plot(g, layout=cbind(V(g)$Lon, V(g)$Lat), edge.arrow.mode=0, vertex.label=NA, vertex.size=3, edge.color=ifelse(E(g)$NumFlights >= 100, "black", NA), vertex.color=ifelse(V(g)$name %in% c("ATL", "ORD", "DFW", "DEN", "LAX"), "red", "lightgray"))
26 | dev.off()
27 | 
28 | png("exercise2_b2.png", width=960, height=480)
29 | g2 <- induced.subgraph(g, V(g)$Lat >= 15 & V(g)$Lat <= 50 & V(g)$Lon >= -130 & V(g)$Lon <= -60 & V(g)$Country == "United States")
30 | plot(g2, layout=cbind(V(g2)$Lon, V(g2)$Lat), edge.arrow.mode=0, vertex.label=NA, vertex.size=3, edge.color=ifelse(E(g2)$NumFlights >= 100, "black", NA), vertex.color=ifelse(V(g2)$name %in% c("ATL", "ORD", "DFW", "DEN", "LAX"), "red", "lightgray"), asp=0.5)
31 | dev.off()
32 | 


--------------------------------------------------------------------------------
/3-visualization/README.md:
--------------------------------------------------------------------------------
 1 | ## Visualization in R
 2 | 
 3 | ### Prerequisites and Class Info:
 4 | 
 5 | This module builds on the Machine Learning in R and Data Wrangling classes given in the first week. You should be comfortable writing R code to run linear regression, logistic regression, and clustering algorithms which were all taught in Machine Learning in R. You should also be comfortable using the table command, the apply family of functions (tapply, lapply, apply), the merge command, the split-apply-combine framework, and creating your own functions. These were taught in Data Wrangling. Please review all these concepts before class on Tuesday, especially if you are new to R.
 6 | 
 7 | The material covered will be very similar to last year. However, you're welcome to repeat, if you like! Some datasets, examples, and in-class problems will be different.
 8 | 
 9 | ### Git Update:
10 | 
11 | Please update your git repository so that you have the most recent class materials.
12 | 
13 | ### Data:
14 | 
15 | You will need the "flights.csv" dataset that you created in your pre-class assignment for Module 2, Data Wrangling. You will also need the "airports.csv" dataset which is available in the data directory of the github repository. Please make sure you have both of these ready to go.
16 | 
17 | ### Installation Instructions:
18 | 
19 | Please run the following commands in an R console:
20 | 
21 | ```
22 | install.packages("ggplot2")
23 | install.packages("maps")
24 | install.packages("ggmap")
25 | install.packages("mapproj")
26 | ```
27 | 
28 | ### Assignment:
29 | 
30 | Run the following code. After each plot is produced, save it, and finally submit a document on Stellar containing the three plots.
31 | 
32 | ```
33 | library(ggplot2)
34 | ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
35 | ```
36 | ```
37 | library(maps)
38 | italy = map_data("italy")
39 | ggplot(italy, aes(x = long, y = lat, group = group)) + geom_polygon()
40 | ```
41 | ```
42 | library(ggmap)
43 | MIT = get_map(location = "Massachusetts Institute of Technology", zoom = 15)
44 | ggmap(MIT)
45 | ```
46 | 
47 | ### Questions?
48 | 
49 | Please email Angie King (aking10@mit.edu).
50 | 


--------------------------------------------------------------------------------
/4-graphs/code/exercise5_complete.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Exercise 5 -- Adding communities to prediction models 
 3 | ##################################################################
 4 | 
 5 | # Add the departure and arrival community to the regression for edge
 6 | # outcomes from Section 3. Compute communities for the whole graph (not
 7 | # just the continental U.S.) and model as two factor variables (hint:
 8 | # ?as.factor). Remember you can start with code from
 9 | # code/exercise3_complete.R.
10 | 
11 | emetrics <- data.frame(LateDep=E(g)$LateDep,
12 | DepCommunity=as.factor(comm$membership[get.edges(g, E(g))[,1]]),
13 | ArrCommunity=as.factor(comm$membership[get.edges(g, E(g))[,2]]),
14 | LateArr=E(g)$LateArr,
15 | NumFlights=E(g)$NumFlights,
16 | EdgeBetweenness=edge.betweenness(g),
17 | DepDegree=degree(g)[get.edges(g, E(g))[,1]],
18 | ArrDegree=degree(g)[get.edges(g, E(g))[,2]],
19 | DepPageRank=page.rank(g)$vector[get.edges(g, E(g))[,1]],
20 | ArrPageRank=page.rank(g)$vector[get.edges(g, E(g))[,2]])
21 | summary(lm(LateDep~DepCommunity+ArrCommunity+NumFlights+EdgeBetweenness+DepDegree+ArrDegree+DepPageRank+ArrPageRank, data=emetrics))
22 | summary(lm(LateArr~DepCommunity+ArrCommunity+NumFlights+EdgeBetweenness+DepDegree+ArrDegree+DepPageRank+ArrPageRank, data=emetrics))
23 | 
24 | # Bonus: perform targeted bond percolation using communities. Compute an
25 | # indicator for whether each edge bridges communities and order the
26 | # removal priority first by this indicator, then by edge betweenness.
27 | # Compare to the two targeted strategies from Exercise 4, Bonus 1.
28 | 
29 | comm1 <- comm$membership[get.edges(g, E(g))[,1]]
30 | comm2 <- comm$membership[get.edges(g, E(g))[,2]]
31 | ordering <- order(edge.betweenness(g) + 10000 * (comm1 != comm2))
32 | targeted.bond.percolation3 <- function(g, phi) {
33 | 	max(c(0, clusters(subgraph.edges(g, ordering[1:(phi*ecount(g))]))$csize)) / vcount(g)
34 | }
35 | tb.perc3 <- data.frame(phi=phis, perc=sapply(phis, targeted.bond.percolation3, g=g))
36 | tb.perc2$type <- "2"
37 | tb.perc3$type <- "3"
38 | tb.perc.compare <- rbind(tb.perc2, tb.perc3)
39 | ggplot(tb.perc.compare, aes(x=phi, y=perc, group=type, color=type)) + geom_line()


--------------------------------------------------------------------------------
/8-project/README.md:
--------------------------------------------------------------------------------
 1 | # Column Generation 
 2 | 
 3 | This class will cover column-wise modeling and column generation solution technique. Code will be posted before the start of the class.
 4 | 
 5 | ## Preassignment
 6 | 
 7 | 
 8 | ### Install Julia, IJulia and JuMP
 9 | 
10 | Please see preassignment for [module 6, nonlinear optimization](https://github.com/joehuchette/OR-software-tools-2015/tree/master/6-nonlinear-opt).
11 | 
12 | ### Install Gurobi and Gurobi Interface in Julia
13 | 
14 | Please see preassignment for [module 7, mixed-integer optimization](https://github.com/joehuchette/OR-software-tools-2015/blob/master/7-adv-optimization/README.md).
15 | 
16 | ### Install the [Graphs](https://github.com/JuliaLang/Graphs.jl) package in Julia
17 | 
18 | Enter the following in Julia console
19 | ```jl
20 | julia> Pkg.add("Graphs")
21 | ```
22 | 
23 | ## 1. Solving a shortest path problem
24 | Enter the following Julia code and submit the output to Stellar.
25 | 
26 | ```jl
27 | using Graphs
28 | 
29 | # construct a graph and the edge distance vector
30 | 
31 | g = simple_inclist(5)
32 | 
33 | inputs = [       # each element is (u, v, dist)
34 |     (1, 2, 10.),
35 |     (1, 3, 5.),
36 |     (2, 3, 2.),
37 |     (3, 2, 3.),
38 |     (2, 4, 1.),
39 |     (3, 5, 2.),
40 |     (4, 5, 4.),
41 |     (5, 4, 6.),
42 |     (5, 1, 7.),
43 |     (3, 4, 9.) ]
44 | 
45 | ne = length(inputs)
46 | dists = zeros(ne)
47 | 
48 | for i = 1 : ne
49 |     a = inputs[i]
50 |     add_edge!(g, a[1], a[2])   # add edge
51 |     dists[i] = a[3]            # set distance
52 | end
53 | 
54 | r = dijkstra_shortest_paths(g, dists, 1)
55 | 
56 | r.parents
57 | ```
58 | 
59 | ## 2. Column-wise modeling in JuMP
60 | 
61 | Enter the following JuMP code and submit the output to Stellar.
62 | ```jl
63 | using JuMP, Gurobi
64 | 
65 | m = Model(solver=GurobiSolver())
66 | @defVar(m, 0 <= x <= 1)
67 | @defVar(m, 0 <= y <= 1)
68 | @setObjective(m, Max, 5x + 1y)
69 | @addConstraint(m, con, x + y <= 1)
70 | solve(m)  # x = 1, y = 0
71 | @defVar(m, 0 <= z <= 1, objective = 10.0, inconstraints = [con], coefficients = [1.0])
72 | # The constraint is now x + y + z <= 1
73 | # The objective is now 5x + 1y + 10z
74 | solve(m)  # z = 1
75 | ```
76 | 
77 | ## Questions?
78 | Email chiwei@mit.edu
79 | 


--------------------------------------------------------------------------------
/3-visualization/pollData.csv:
--------------------------------------------------------------------------------
1 | State,Year,SurveyUSA,DiffCount,RepublicanAlabama,2004,18,5,1Alabama,2008,25,5,1Alaska,2004,21,1,1Alaska,2008,18,6,1Arizona,2004,15,8,1Arizona,2008,3,9,1Arizona,2012,5,4,1Arkansas,2004,5,8,1Arkansas,2008,7,5,1Arkansas,2012,21,2,1California,2004,-11,-8,0California,2008,-24,-5,0California,2012,-14,-6,0Colorado,2004,3,9,1Colorado,2008,-1,-15,0Colorado,2012,-2,-5,0Connecticut,2004,-33,-3,0Connecticut,2008,-16,-4,0Connecticut,2012,-13,-8,0Delaware,2004,-16,-2,0Delaware,2008,-30,-4,0Florida,2004,1,0,1Florida,2008,-3,-13,0Florida,2012,0,6,0Georgia,2004,12,4,1Georgia,2008,7,9,1Georgia,2012,8,4,1Hawaii,2004,4,2,0Hawaii,2008,-24,-1,0Hawaii,2012,-24,-2,0Idaho,2004,22,1,1Idaho,2008,30,1,1Idaho,2012,24,1,1Illinois,2004,-12,-5,0Illinois,2008,-33,-5,0Illinois,2012,-16,-5,0Indiana,2004,19,3,1Indiana,2008,0,2,0Indiana,2012,18,3,1Iowa,2004,-3,5,1Iowa,2008,-15,-8,0Iowa,2012,-2,-2,0Kansas,2004,23,3,1Kansas,2008,21,2,1Kansas,2012,9,1,1Kentucky,2004,21,3,1Kentucky,2008,16,5,1Kentucky,2012,14,1,1Louisiana,2004,7,5,1Louisiana,2008,21,2,1Louisiana,2012,21,2,1Maine,2004,-8,-6,0Maine,2008,-15,-6,0Maine,2012,-7,-6,0Maryland,2004,-11,-6,0Maryland,2008,-29,-1,0Maryland,2012,-29,-4,0Massachusetts,2004,-29,-2,0Massachusetts,2008,-17,-4,0Massachusetts,2012,-30,-8,0Michigan,2004,-29,-2,0Michigan,2008,-11,-11,0Michigan,2012,-11,-10,0Minnesota,2004,-1,-7,0Minnesota,2008,-3,-14,0Minnesota,2012,-11,-5,0Mississippi,2004,25,1,1Mississippi,2008,7,4,1Mississippi,2012,8,1,1Missouri,2004,5,8,1Missouri,2008,0,4,1Missouri,2012,7,8,1Montana,2004,21,3,1Montana,2008,8,4,1Montana,2012,12,5,1Nebraska,2004,30,2,1Nebraska,2008,22,1,1Nebraska,2012,25,2,1Nevada,2004,8,9,1Nevada,2008,0,-9,0Nevada,2012,-4,-10,0New Hampshire,2004,-1,-5,0New Hampshire,2008,-11,-14,0New Hampshire,2012,-11,-8,0New Jersey,2004,-12,-8,0New Jersey,2008,-10,-9,0New Jersey,2012,-14,-9,0New Mexico,2004,0,2,1New Mexico,2008,-7,-6,0New Mexico,2012,-7,-5,0New York,2004,-18,-6,0New York,2008,-33,-5,0New York,2012,-29,-5,0North Carolina,2004,8,7,1North Carolina,2008,1,-5,0North Carolina,2012,5,3,1North Dakota,2004,25,2,1North Dakota,2008,5,0,1North Dakota,2012,21,4,1Ohio,2004,2,3,1Ohio,2008,-2,-16,0Ohio,2012,-5,-16,0Oklahoma,2004,30,4,1Oklahoma,2008,24,2,1Oklahoma,2012,24,1,1Oregon,2004,-3,-8,0Oregon,2008,-19,-9,0Oregon,2012,-7,-4,0Pennsylvania,2004,-1,-12,0Pennsylvania,2008,-9,-19,0Pennsylvania,2012,0,-13,0Rhode Island,2004,-13,-2,0Rhode Island,2008,-29,-1,0Rhode Island,2012,-17,-2,0South Carolina,2004,18,4,1South Carolina,2008,8,5,1South Carolina,2012,23,1,1South Dakota,2004,8,4,1South Dakota,2008,7,4,1South Dakota,2012,16,1,1Tennessee,2004,18,7,1Tennessee,2008,8,5,1Tennessee,2012,24,1,1Texas,2004,22,2,1Texas,2008,7,5,1Texas,2012,12,4,1Utah,2004,18,3,1Utah,2008,30,3,1Utah,2012,22,1,1Vermont,2004,-16,-2,0Vermont,2008,-24,-2,0Virginia,2004,4,5,1Virginia,2008,-4,-18,0Virginia,2012,-2,-4,0Washington,2004,-4,-10,0Washington,2008,-16,-6,0Washington,2012,-14,-8,0West Virginia,2004,5,6,1West Virginia,2008,15,11,1West Virginia,2012,19,1,1Wisconsin,2004,-5,1,0Wisconsin,2008,-16,-12,0Wisconsin,2012,-4,-8,0Wyoming,2004,30,1,1Wyoming,2008,21,3,1


--------------------------------------------------------------------------------
/4-graphs/code/section3.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Section 3 -- Network Metrics
 3 | ##################################################################
 4 | 
 5 | # Let's start out by computing some global network metrics.
 6 | graph.density(g)
 7 | reciprocity(g)
 8 | assortativity.degree(g)
 9 | 
10 | # Now let's look at the distribution of some of the vertex and edge
11 | # metrics.
12 | hist(degree(g))
13 | head(sort(degree(g), decreasing=TRUE))
14 | hist(closeness(g))
15 | head(sort(closeness(g), decreasing=TRUE))
16 | hist(betweenness(g))
17 | table(betweenness(g) == 0)
18 | head(sort(betweenness(g), decreasing=TRUE))
19 | page.rank(g)
20 | hist(page.rank(g)$vector)
21 | head(sort(page.rank(g)$vector, decreasing=TRUE))
22 | hist(transitivity(g, "local"))
23 | head(sort(transitivity(g, "local"), decreasing=TRUE))
24 | 
25 | # transitivity() doesn't return a named vector, so we'll need to do a bit
26 | # more work to figure out the airports with the largest transitivity.
27 | # sort() returns the largest transitivities, but we will instead use
28 | # order(), which returns the indices of the nodes with the largest
29 | # transitivities.
30 | head(order(transitivity(g, "local"), decreasing=TRUE))
31 | transitivity(g, "local")[93]
32 | transitivity(g, "local")[265]
33 | 
34 | # We can use the indices from order() to look up node names or degrees.
35 | V(g)$name[head(order(transitivity(g, "local"), decreasing=TRUE))]
36 | degree(g)[head(order(transitivity(g, "local"), decreasing=TRUE))]
37 | 
38 | # Edge betweenness is one of the most important edge metrics
39 | hist(edge.betweenness(g))
40 | 
41 | # One really common thing to do with vertex or edge metrics is to add
42 | # them to a regression model that predicts some feature of the vertices
43 | # or edges. The igraph network metric functions return vectors containing
44 | # the metric so we can build a data frame with all the metrics we need
45 | # as well as our outcome data that we've stored as vertex and edge
46 | # metadata.
47 | 
48 | # We'll try to predict two outcomes for vertices -- the prop. of late
49 | # departures and the taxi out time. We'll include two metrics that capture
50 | # the volume of traffic at the airport -- the total number of flights and
51 | # the degree of the airport in the network. We'll also use closeness
52 | # centrality, which is how close this airport is to all others. We might
53 | # hypothesize that airports with high volume or near the center of the
54 | # network are overloaded and have more delays or that they have invested
55 | # in robust systems/procedures and will have fewer delays.
56 | 
57 | # Let's remind ourselves of our vertex attributes
58 | g
59 | 
60 | # Now we can build the data frame
61 | metrics <- data.frame(Origin=V(g)$name,
62 | LateDep=V(g)$LateDep,
63 | TaxiOut=V(g)$TaxiOut,
64 | NumFlights=V(g)$NumFlights,
65 | degree=degree(g),
66 | closeness=closeness(g))
67 | 
68 | head(metrics)
69 | 
70 | # Now we can build our models; we'll use simple linear regression but
71 | # clearly any regression model you learned in Module 1 could be used.
72 | summary(lm(LateDep~NumFlights+degree+closeness, data=metrics))
73 | summary(lm(TaxiOut~NumFlights+degree+closeness, data=metrics))
74 | 


--------------------------------------------------------------------------------
/4-graphs/code/section1.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Section 1 -- Data Wrangling to Construct Networks in R
 3 | ##################################################################
 4 | 
 5 | # Let's start by loading in our data. This could take a bit of time.
 6 | # We use stringsAsFactors=FALSE because it helps us avoid factor
 7 | # levels with no data when we subset our data.
 8 | dat <- read.csv("On_Time_On_Time_Performance_2014_9.csv",
 9 |                 stringsAsFactors=FALSE)
10 | head(dat)
11 | 
12 | # To get the edge information, we'll split into all unique Origin -> Dest
13 | # pairs; using paste() is a convenient way to build a key out of two or
14 | # more variables when splitting data with the split() function.
15 | e.spl <- split(dat, paste(dat$Origin, dat$Dest))
16 | 
17 | # In addition to the origin and destination of an edge, we can store
18 | # the number of flights for this pairing, the proportion of late
19 | # departures and arrivals, and the average taxi out and in times.
20 | # For the "apply" step of our split-apply-combine paradigm
21 | e.spl2 <- lapply(e.spl, function(x) {
22 |   data.frame(Origin = x$Origin[1],
23 |              Dest = x$Dest[1],
24 |              NumFlights = nrow(x),
25 |              LateDep = mean(x$DepDel15, na.rm=T),
26 |              LateArr = mean(x$ArrDel15, na.rm=T),
27 |              TaxiOut = mean(x$TaxiOut, na.rm=T),
28 |              TaxiIn = mean(x$TaxiIn, na.rm=T))
29 | })
30 | 
31 | # As usual, we'll use do.call() with rbind() for the "combine" step.
32 | edges <- do.call(rbind, e.spl2)
33 | 
34 | # We can put the whole split-apply-combine into a single line of code when
35 | # computing the vertex information, which limits the number of variables
36 | # we have floating around.
37 | vertices <- do.call(rbind, lapply(split(dat, dat$Origin), function(x) {
38 |   data.frame(Origin = x$Origin[1],
39 |              NumFlights = nrow(x),
40 |              LateDep = mean(x$DepDel15, na.rm=T),
41 |              LateArr = mean(x$ArrDel15, na.rm=T),
42 |              TaxiOut = mean(x$TaxiOut, na.rm=T),
43 |              TaxiIn = mean(x$TaxiIn, na.rm=T))
44 | }))
45 | 
46 | # Let's also load in the locations of the airports by merging with our
47 | # dataset of airport locations, making sure we didn't lose any
48 | # airports in the process of the merge.
49 | airports <- read.csv("../data/airports.csv", stringsAsFactors=FALSE)
50 | head(airports)
51 | dim(vertices)
52 | vertices <- merge(vertices, airports, by.x="Origin", by.y="IATA")
53 | dim(vertices)
54 | 
55 | # Now we can construct our graph with graph.data.frame() from igraph.
56 | library(igraph)
57 | g <- graph.data.frame(edges, TRUE, vertices)
58 | 
59 | # The first line says we have a directed graph (D) with named vertices (N).
60 | # The attributes list shows all vertex and edge attributes. The first
61 | # entry in ()'s is whether vertex (v) or edge (e) attribute, and the second
62 | # is the type of attribute: character (c) or numeric (n).
63 | g
64 | 
65 | # Easy to access vertex and edge sequences and metadata
66 | head(V(g))
67 | head(V(g)$Lat)
68 | head(E(g))
69 | head(E(g)$LateDep)
70 | 
71 | # Let's compute some basic properties of the network (more metrics coming
72 | # later in the module)
73 | ecount(g)
74 | vcount(g)
75 | graph.density(g)
76 | 


--------------------------------------------------------------------------------
/1-intro-R/1-3.R:
--------------------------------------------------------------------------------
  1 | # IAP 2015
  2 | # 15.S60 Software Tools for Operations Research
  3 | # Lecture 1: Introduction to R
  4 | 
  5 | # Script file 1-3.R
  6 | # In this script file, we cover CART and random forest
  7 | 
  8 | ################################################
  9 | ## CLASSIFICATION AND REGRESSION TREES (CART) ##
 10 | ################################################
 11 | 
 12 | # First install package rpart and load the library
 13 | install.packages("rpart")
 14 | library(rpart)
 15 | 
 16 | # Build a CART model
 17 | Titanic.CART = rpart(Survived ~ Class + Age + Sex, data = TitanicTrain, method = "class", control = rpart.control(minbucket = 10))
 18 | 
 19 | # Plot the tree. For all trees, if the conditional at the
 20 | # top is true, go to the left.
 21 | plot(Titanic.CART)
 22 | text(Titanic.CART, pretty = 0)
 23 | 
 24 | # Make prediction on the test set
 25 | Titanic.CARTpredTest = predict(Titanic.CART, newdata = TitanicTest, type = "class")
 26 | 
 27 | # Create the confusion matrix
 28 | CARTpredTable <- table(TitanicTest$Survived, Titanic.CARTpredTest)
 29 | CARTpredTable
 30 | 
 31 | # Calculate accuracy
 32 | sum(diag(CARTpredTable))/nrow(TitanicTest)
 33 | 
 34 | 
 35 | # We can also use CART for continuous outcomes
 36 | CEOcomp.CART = rpart(TotalCompensation ~ Years + ChangeStockPrice + ChangeCompanySales + MBA, data = CEOcomp, method = "anova", control = rpart.control(minsplit = 5))
 37 | 
 38 | # Create a vector of predictions
 39 | predict(CEOcomp.CART)
 40 | CEOcomp$TotalCompensation
 41 | 
 42 | ###################
 43 | ## RANDOM FOREST ##
 44 | ###################
 45 | 
 46 | # Install package randomForest and load the library
 47 | install.packages("randomForest")
 48 | library(randomForest)
 49 | 
 50 | # Build a random forest model for the Titanic dataset
 51 | Titanic.forest = randomForest(Survived ~ Class + Age + Sex, data = TitanicTrain, nodesize = 10, ntree = 200)
 52 | 
 53 | # Warning message! - random forest need to predict a factor
 54 | str(TitanicTrain$Survived)
 55 | TitanicTrain$Survived <- factor(TitanicTrain$Survived)
 56 | TitanicTest$Survived <- factor(TitanicTest$Survived)
 57 | 
 58 | # Let's try again!
 59 | Titanic.forest = randomForest(Survived ~ Class + Age + Sex, data = TitanicTrain, nodesize = 10, ntree = 200)
 60 | 
 61 | # Make predictions on the test set
 62 | 
 63 | Titanic.forestPred = predict(Titanic.forest, newdata = TitanicTest)
 64 | forest.table <- table(TitanicTest$Survived, Titanic.forestPred)
 65 | forest.table
 66 | 
 67 | # Check accuracy
 68 | sum(diag(forest.table))/nrow(TitanicTest)
 69 | 
 70 | ################
 71 | ## ASSIGNMENT ##
 72 | ################
 73 | 
 74 | # Let's compare the performance of CART and random
 75 | # forest on the LettersBinary dataset
 76 | 
 77 | # 1) Build a CART model on the training data. Set the
 78 | #    minbucket parameter to 25. Then test it on the 
 79 | #    testing set, create a confusion matrix, and determine
 80 | #    the accuracy.
 81 | 
 82 | letters.formula <- formula(Letter ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15 + x16)
 83 | 
 84 | 
 85 | 
 86 | 
 87 | 
 88 | # 2) Do the same as above for random forest. Use nodesize
 89 | #    = 25 and ntree = 200.
 90 | 
 91 | 
 92 | 
 93 | 
 94 | 
 95 | # EXTRA ASSIGNMENT:
 96 | 
 97 | # *1) Try different ways of control the tree growth. Look 
 98 | #     at the rpart.control help page. Try giving your model 
 99 | #     values for cp or maxdepth.
100 | 
101 | # *2) Try different values of ntree in your randomForest 
102 | #     model. Try setting it to a very low number, and a 
103 | #     very high number. How do the prediction results 
104 | #     compare?
105 | 
106 | 
107 | 
108 | 
109 | 
110 | 
111 | 
112 | 
113 | 
114 | 
115 | 
116 | 
117 | 
118 | 
119 | 
120 | 
121 | 
122 | 
123 | 


--------------------------------------------------------------------------------
/1-intro-R/1-4.R:
--------------------------------------------------------------------------------
  1 | # IAP 2015
  2 | # 15.S60 Software Tools for Operations Research
  3 | # Lecture 1: Introduction to R
  4 | 
  5 | # Script file 1-4.R
  6 | # In this script file, we cover hierarchical and 
  7 | # k-means clustering
  8 | 
  9 | #############################
 10 | 
 11 | # R has many built-in datasets -- 
 12 | # let's take a look at what they have
 13 | data()
 14 | 
 15 | # Load the iris set and learn about it
 16 | data(iris)
 17 | ?iris
 18 | str(iris)
 19 | 
 20 | #############################
 21 | ## HIERARCHICAL CLUSTERING ##
 22 | #############################
 23 | 
 24 | # Since species is not a number, we can't 
 25 | # compute a distance, so we need to exclude
 26 | # the last column
 27 | IrisDist = dist(iris[1:4], method = "euclidean")
 28 | 
 29 | # Alternative methods include "maximum" and 
 30 | # "manhattan" (different distance metrics)
 31 | 
 32 | # Compute the hierarchical clusters.  We use 
 33 | # method = "ward" to minimize the distance between
 34 | # the clusters and the variance within each 
 35 | # of the clusters
 36 | IrisHC = hclust(IrisDist, method = "ward.D")
 37 | 
 38 | # Plot a dendrogram
 39 | plot(IrisHC)
 40 | 
 41 | # This diagram will help us decide how many
 42 | # clusters are appropriate for this problem.
 43 | # The height of the vertical lines represents
 44 | # the distance between the points that were 
 45 | # combined into clusters. The record numbers
 46 | # are listed among the bottom (usually hard to
 47 | # see). The taller the lines, the more likely
 48 | # it is that clusters should be separate. Two
 49 | # or three clusters would be appropriate here.
 50 | 
 51 | # Plot rectangles around the clusters to aid
 52 | # in visualization
 53 | rect.hclust(IrisHC, k = 3, border = "red")
 54 | 
 55 | # Now, split the data into these three clusters
 56 | IrisHCGroups = cutree(IrisHC, k = 3)
 57 | 
 58 | # IrisHCGroups is now a vector assigning each
 59 | # data point to a cluster
 60 | 
 61 | # Use a table to look at the properties of each 
 62 | # of the clusters.
 63 | table(iris$Species, IrisHCGroups)
 64 | tapply(iris$Petal.Length, IrisHCGroups, mean)
 65 | 
 66 | # Using tapply for the means of each of the 
 67 | # attributes will give us the centroids of the
 68 | # clusters.
 69 | 
 70 | ########################
 71 | ## K-MEANS CLUSTERING ##
 72 | ########################
 73 | 
 74 | # K-means clustering requires that we have
 75 | # an initial guess as to how many clusters
 76 | # there are.  We will initialize it to 3 in this
 77 | # case, but if we didn't know, we could always
 78 | # try multiple values and experiment
 79 | 
 80 | # Run a k-means cluster with 3 clusters and 
 81 | # 100 iterations (centroids recomputed and points
 82 | # reassigned each time)
 83 | IrisKMC = kmeans(iris[1:4], centers = 3, iter.max = 100)
 84 | str(IrisKMC)
 85 | 
 86 | # Create a vector with the group numbers
 87 | IrisKMCGroups = IrisKMC$cluster
 88 | 
 89 | # Check out the properties of the clusters 
 90 | # using table
 91 | table(iris$Species, IrisKMCGroups)
 92 | 
 93 | # Try improving with more iterations!
 94 | IrisKMC = kmeans(iris[1:4], centers = 3, iter.max = 10000)
 95 | IrisKMCGroups = IrisKMC$cluster
 96 | table(iris$Species, IrisKMCGroups)
 97 | 
 98 | # Look at the locations of the centroids
 99 | IrisKMC$centers
100 | 
101 | ################
102 | ## ASSIGNMENT ##
103 | ################
104 | 
105 | # 1a) Cluster the LettersBinary dataset using
106 | #     hierarchical clustering. Don't forget to
107 | #     leave out the "Letter" attribute when
108 | #     computing the distance matrix! (Since
109 | #     this dataset is larger, it may take
110 | #     a bit longer to compute)
111 | 
112 | 
113 | 
114 | 
115 | #  b) Plot the dendrogram and use it to 
116 | #     decide how many clusters to select.
117 | 
118 | 
119 | 
120 | 
121 | #  c) Make a table comparing the "Letter" 
122 | #     attribute compared with the HC assignment
123 | 
124 | 
125 | 
126 | 
127 | # 2) Do the same using k-means clustering. 
128 | #    How well do you think clustering performs
129 | #    on this dataset?
130 | 
131 | 
132 | 
133 | # Clustering doesn't seem to do too well here.
134 | 
135 | 
136 | # EXTRA ASSIGNMENT
137 | 
138 | # An additional parameter in the K-Means 
139 | # algorithm is the number of random starts to 
140 | # use. This is controlled with the parameter 
141 | # nstart in the function kmeans. Try different 
142 | # values for nstart. Does it improve the 
143 | # algorithm?
144 | 
145 | 
146 | 
147 | 
148 | 
149 | 
150 | 
151 | 
152 | 
153 | 
154 | 
155 | 


--------------------------------------------------------------------------------
/4-graphs/code/section2.R:
--------------------------------------------------------------------------------
 1 | ##################################################################
 2 | # Section 2 -- Network Visualization
 3 | ##################################################################
 4 | 
 5 | # Let's start out by seeing what exactly is returned when we run a
 6 | # graph layout algorithm. Of course, we have longitude/latitude information
 7 | # for airports, so we're doing this more as an exercise in looking at
 8 | # graph layout algorithms. Later in the section we'll layout nodes based on
 9 | # geography.
10 | layout1 <- layout.fruchterman.reingold(g)
11 | dim(layout1)
12 | head(layout1)
13 | 
14 | # It's just a set of 2-d points, one for each vertex. We could get a
15 | # higher-dimensional layout with the "dim" parameter. Force-directed
16 | # layouts are typically optimized from a random starting location, so
17 | # we would expect a different layout if we ran it again (this is one of
18 | # the complaints people have with these sorts of layouts). We could use
19 | # set.seed() to ensure the same value for multiple runs of the algorithm.
20 | layout1 <- layout.fruchterman.reingold(g)
21 | head(layout1)
22 | 
23 | # We can plot with our selected layout with the plot() function.
24 | plot(g, layout=layout1)  # Can cancel with escape key
25 | 
26 | # It takes a long time to plot the graph to the R display, so we can
27 | # instead plot it to a file and then open the file.
28 | png("plot1.png")
29 | plot(g, layout=layout1)
30 | dev.off()
31 | 
32 | # Most first attempts at plotting a graph look pretty bad. We need to
33 | # do the following:
34 | # 1) Remove the vertex names
35 | # 2) Make the vertices smaller
36 | # 3) Remove the arrowheads (almost all edges will be bidirectional)
37 | # We'll need to look at ?igraph.plotting to figure out how to do this
38 | ?igraph.plotting
39 | png("plot2.png")
40 | plot(g, layout=layout1, vertex.size=3, edge.arrow.mode=0, vertex.label=NA)
41 | dev.off()
42 | 
43 | # So far we set all the plotting properties vertex.size, vertex.label,
44 | # and edge.arrow.mode to single values, meaning that value applied for
45 | # all vertices/edges. We can also set values dynamically based on
46 | # vertex/edge metadata, providing one value for each node or edge.
47 | # First, let's use a color gradient based on metadata. We'll make vertices
48 | # darker gray if they have more volume and lighter gray if they have less.
49 | 
50 | # colorRamp returns a function that will convert values between 0
51 | # and 1 into colors between our color endpoints. It returns a matrix
52 | # the three columns are red, green, and blue; we can convert this into
53 | # a vector with the rgb() function.
54 | grad.fxn <- colorRamp(c("lightgray", "black"))
55 | grad.fxn
56 | grad.fxn(c(0, .2, .5, 1))
57 | rgb(grad.fxn(c(0, .2, .5, 1)), max=255)
58 | color.mat <- grad.fxn(V(g)$NumFlights / max(V(g)$NumFlights))
59 | head(color.mat)
60 | dim(color.mat)
61 | vertex.colors <- rgb(color.mat, max=255)
62 | head(vertex.colors)
63 | length(vertex.colors)
64 | 
65 | png("plot3.png")
66 | plot(g, layout=layout.lgl(g), vertex.size=3, edge.arrow.mode=0, vertex.label=NA, vertex.color=vertex.colors)
67 | dev.off()
68 | 
69 | # One difficulty with plotting graphs is there being a mass of edges. One
70 | # approach would be to remove low-volume edges or diminish their width
71 | # (we'll do this in a bit); another is to change color and transparency to
72 | # draw attention to important edges. Here, we'll make edges red if at least
73 | # 50% of departures on this link are late and transparent light gray
74 | # otherwise.
75 | 
76 | # A convenient way to specify colors is with hexidecimal (we just saw this
77 | # when outputting vertex.colors). A standard color would be something like
78 | # #00FF80, which means hexidecimal 00 (0) for red, FF (255) for green, and
79 | # 80 (128) for blue. Because transparency is not specified it is assumed to
80 | # be non-transparent. If we add a pair of hexidecimal digits at the end
81 | # they represent the transparency proportion. #00FF80FF is non-transparent,
82 | # #00FF8080 is partially transparent, and #00FF8000 is fully transparent
83 | # aka invisible. Our light gray color will be #EEEEEE22, which is mostly
84 | # transparent.
85 | 
86 | # We now want different colors conditional on the value of E(g)$LateDep.
87 | # This is typically done with the ifelse() function.
88 | head(E(g)$LateDep)
89 | head(ifelse(E(g)$LateDep >= 0.5, "red", "#EEEEEE22"))
90 | edge.colors <- ifelse(E(g)$LateDep >= 0.5, "red", "#EEEEEE22")
91 | table(edge.colors)
92 | png("plot4.png")
93 | plot(g, layout=layout.lgl(g), vertex.size=3, edge.arrow.mode=0, vertex.label=NA, vertex.color=vertex.colors, edge.color=edge.colors)
94 | dev.off()
95 | 


--------------------------------------------------------------------------------
/4-graphs/code/section4.R:
--------------------------------------------------------------------------------
  1 | ##################################################################
  2 | # Section 4 -- Network Resilience 
  3 | ##################################################################
  4 | 
  5 | # Any theories about the behavior of uniform random site percolation and
  6 | # targeted site percolation in our network?
  7 | 
  8 | # First we'll compute a random sample of a proportion phi of the nodes
  9 | phi <- 0.8
 10 | vcount(g)
 11 | sample(vcount(g), phi*vcount(g))
 12 | 
 13 | # We can compute subgraphs of a network in which we only keep the
 14 | # indicated nodes and edges connected to them with the induced.subgraph()
 15 | # function.
 16 | induced.subgraph(g, sample(vcount(g), phi*vcount(g)))
 17 | 
 18 | # We want to compute the size of the biggest cluster, so let's first use
 19 | # the clusters() function to get all the cluster memberships.
 20 | clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))
 21 | 
 22 | # We can access the "csize" element of the list and compute its maximum
 23 | max(clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))$csize)
 24 | 
 25 | # A problem with this is when we delete all the vertices. Then csize will
 26 | # be blank, causing a warning with our code.
 27 | phi <- 0
 28 | max(clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))$csize)
 29 | 
 30 | # Let's fix it by adding 0 to csize. This will make max return 0 when
 31 | # there are no vertices and return the maximum component size when there
 32 | # are vertices.
 33 | max(c(0, clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))$csize))
 34 | phi <- 0.8
 35 | max(c(0, clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))$csize))
 36 | 
 37 | # Because this is random we want to replicate the computation and take
 38 | # the average across the replications, which we can do with replicate()
 39 | # and mean(). You'll see more sophisticated simulation in the simulation
 40 | # module.
 41 | reps <- 100
 42 | replicate(reps, max(c(0, clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))$csize)))
 43 | mean(replicate(reps, max(c(0, clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))$csize))))
 44 | 
 45 | # Let's normalize by the original size of the graph
 46 | mean(replicate(reps, max(c(0, clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))$csize)))) / vcount(g)
 47 | 
 48 | # Finally, let's make a function with our code.
 49 | random.site.percolation <- function(g, phi, reps) {
 50 | 	mean(replicate(reps, max(c(0, clusters(induced.subgraph(g, sample(vcount(g), phi*vcount(g))))$csize)))) / vcount(g)
 51 | }
 52 | 
 53 | # Now we can build a data frame that contains the size of the giant
 54 | # component after random site percolation with different phi values.
 55 | # We'll sample a grid from 0 to 1 and use sapply to run for each.
 56 | phis <- seq(0, 1, .01)
 57 | rs.perc <- data.frame(phi=phis, perc=sapply(phis, random.site.percolation, g=g, reps=100))
 58 | head(rs.perc)
 59 | 
 60 | # Now we can plot our results along with a line indicating the maximum
 61 | # possible size of the giant component, which would be achieved if g were
 62 | # a complete graph.
 63 | plot(rs.perc)
 64 | abline(0, 1)
 65 | 
 66 | # Now we want to model an adversarial situation in which the nodes with
 67 | # the highest degree are removed first. We'll do this by taking the degree
 68 | # ordering of the nodes in the original graph g and using it throughout,
 69 | # though another approach would be to recompute the degrees each time you
 70 | # remove the highest-degree node. The first step is to sort the
 71 | # nodes in the network by degree using the order() function. This returns
 72 | # indices in the vertex list, sorted by degree.
 73 | order(degree(g))
 74 | 
 75 | # This will have ordered in increasing order, so we can check that the
 76 | # last few indices are airports we recognize:
 77 | degree(g)[20]
 78 | degree(g)[221]
 79 | 
 80 | # We want to keep phi proportion of the airports, limiting to the ones
 81 | # with smallest degree. We can get this by taking the first phi
 82 | # proportion of the ordered vertices
 83 | head(order(degree(g)), phi*vcount(g))
 84 | 
 85 | # As before we can compute the normalized size of the giant component.
 86 | # There's no need for replication because we didn't use any random
 87 | # selection in the procedure.
 88 | max(c(0, clusters(induced.subgraph(g, head(order(degree(g)), phi*vcount(g))))$csize)) / vcount(g)
 89 | 
 90 | # Finally we can create our function that does the targeted percolation.
 91 | targeted.site.percolation <- function(g, phi) {
 92 | 	max(c(0, clusters(induced.subgraph(g, head(order(degree(g)), phi*vcount(g))))$csize)) / vcount(g)
 93 | }
 94 | 
 95 | # As before we can compute the targeted rates and plot the survivability.
 96 | ts.perc <- data.frame(phi=phis, perc=sapply(phis, targeted.site.percolation, g=g))
 97 | head(ts.perc)
 98 | 
 99 | plot(ts.perc)
100 | abline(0, 1)
101 | 


--------------------------------------------------------------------------------
/6-nonlinear-opt/Nonlinear-JuMP.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language": "Julia",
  4 |   "name": "",
  5 |   "signature": "sha256:eb67d64c267bde1c8acbbd3549e149ea7a8e8566f70baa255126dc8a172db3b0"
  6 |  },
  7 |  "nbformat": 3,
  8 |  "nbformat_minor": 0,
  9 |  "worksheets": [
 10 |   {
 11 |    "cells": [
 12 |     {
 13 |      "cell_type": "heading",
 14 |      "level": 2,
 15 |      "metadata": {},
 16 |      "source": [
 17 |       "Nonlinear Optimization"
 18 |      ]
 19 |     },
 20 |     {
 21 |      "cell_type": "markdown",
 22 |      "metadata": {},
 23 |      "source": [
 24 |       "Consider the unconstrained minimization problem\n",
 25 |       "$$\n",
 26 |       "\\min_{x > 0} x^2 - \\log(x)\n",
 27 |       "$$\n",
 28 |       "The objective function is strictly convex (why?), so from high school calculus we find the minimizer when\n",
 29 |       "$$\n",
 30 |       "0 = \\frac{d}{dx} [x^2 - \\log(x)] = 2x - \\frac{1}{x}\n",
 31 |       "$$\n",
 32 |       "$$\n",
 33 |       "\\rightarrow x = \\frac{1}{\\sqrt{2}}\n",
 34 |       "$$"
 35 |      ]
 36 |     },
 37 |     {
 38 |      "cell_type": "markdown",
 39 |      "metadata": {},
 40 |      "source": [
 41 |       ">**\\[Exercise\\]**: Plot it\n",
 42 |       "\n",
 43 |       "> Plot the function $x^2-\\log(x)$, for $x$ between 0 and 3. You may use ``Gadfly`` or ``PyPlot``.\n",
 44 |       "\n",
 45 |       "> _Be careful not to take the log of zero_"
 46 |      ]
 47 |     },
 48 |     {
 49 |      "cell_type": "markdown",
 50 |      "metadata": {},
 51 |      "source": [
 52 |       "#### Let's see how to formulate this problem in JuMP"
 53 |      ]
 54 |     },
 55 |     {
 56 |      "cell_type": "code",
 57 |      "collapsed": false,
 58 |      "input": [
 59 |       "using JuMP\n",
 60 |       "using Ipopt"
 61 |      ],
 62 |      "language": "python",
 63 |      "metadata": {},
 64 |      "outputs": []
 65 |     },
 66 |     {
 67 |      "cell_type": "code",
 68 |      "collapsed": false,
 69 |      "input": [
 70 |       "m = Model()\n",
 71 |       "@defVar(m, x >= 0, start = 1) # provide an initial starting point, we don't want to start at zero!\n",
 72 |       "@setNLObjective(m, Min, x^2 - log(x))\n",
 73 |       "status = solve(m)"
 74 |      ],
 75 |      "language": "python",
 76 |      "metadata": {},
 77 |      "outputs": []
 78 |     },
 79 |     {
 80 |      "cell_type": "code",
 81 |      "collapsed": false,
 82 |      "input": [
 83 |       "getValue(x)"
 84 |      ],
 85 |      "language": "python",
 86 |      "metadata": {},
 87 |      "outputs": []
 88 |     },
 89 |     {
 90 |      "cell_type": "code",
 91 |      "collapsed": false,
 92 |      "input": [
 93 |       "abs(getValue(x)-1/sqrt(2))"
 94 |      ],
 95 |      "language": "python",
 96 |      "metadata": {},
 97 |      "outputs": []
 98 |     },
 99 |     {
100 |      "cell_type": "markdown",
101 |      "metadata": {},
102 |      "source": [
103 |       "Pretty accurate!"
104 |      ]
105 |     },
106 |     {
107 |      "cell_type": "markdown",
108 |      "metadata": {},
109 |      "source": [
110 |       "### Now for some constrained optimization\n",
111 |       "\n",
112 |       "We will add the constraint $x \\geq c$. When $c \\leq \\frac{1}{\\sqrt{2}}$, this constraint has no effect. Otherwise the optimal solution is $c$."
113 |      ]
114 |     },
115 |     {
116 |      "cell_type": "code",
117 |      "collapsed": false,
118 |      "input": [
119 |       "using Interact"
120 |      ],
121 |      "language": "python",
122 |      "metadata": {},
123 |      "outputs": []
124 |     },
125 |     {
126 |      "cell_type": "code",
127 |      "collapsed": false,
128 |      "input": [
129 |       "@manipulate for c in 0.1:0.01:2.0\n",
130 |       "    m = Model(solver=IpoptSolver(print_level=0))\n",
131 |       "    @defVar(m, x >= 0, start = 1)\n",
132 |       "    @setNLObjective(m, Min, x^2 - log(x))\n",
133 |       "    @addConstraint(m, x >= c)\n",
134 |       "    status = solve(m)\n",
135 |       "    round(getValue(x),2)\n",
136 |       "end"
137 |      ],
138 |      "language": "python",
139 |      "metadata": {},
140 |      "outputs": []
141 |     },
142 |     {
143 |      "cell_type": "markdown",
144 |      "metadata": {},
145 |      "source": [
146 |       "### Differences with linear/quadratic JuMP:\n",
147 |       "- Use ``@setNLObjective`` and ``@addNLConstraint`` instead of ``@setObjective`` and ``@addConstraint``.\n",
148 |       "- Important to set a starting value for each variable.\n",
149 |       "- Different [solvers](http://jump.readthedocs.org/en/release-0.7/installation.html#getting-solvers):\n",
150 |       "  - [Ipopt](https://github.com/JuliaOpt/Ipopt.jl) is open source, widely used\n",
151 |       "  - [KNITRO](https://github.com/JuliaOpt/KNITRO.jl) commercial, general nonlinear\n",
152 |       "  - [Mosek](https://github.com/JuliaOpt/Mosek.jl) commercial, convex problems only\n",
153 |       "- Currently working on expanding support for mixed-integer nonlinear (MINLP) solvers"
154 |      ]
155 |     }
156 |    ],
157 |    "metadata": {}
158 |   }
159 |  ]
160 | }


--------------------------------------------------------------------------------
/2-intermediate-R/SecondHalf.R:
--------------------------------------------------------------------------------
  1 | #make sure your working directory contains flights_condensed.csv
  2 | #first we'll read in the flight data
  3 | flights = read.csv("flights_condensed.csv")
  4 | 
  5 | #for our purposes, we want to limit ourselves to flights between the top 20 airports
  6 | #this makes the data set smaller (examples run faster)
  7 | top20 = c("ATL","LAX","ORD","DFW","DEN","JFK","SFO","CLT","LAS","PHX","MIA","IAH","EWR","MCO","SEA","MSP","DTW","BOS","PHL","LGA")
  8 | flights = subset(flights, Origin %in% top20 & Dest %in% top20) #%in% is like is.element
  9 | 
 10 | ###
 11 | #joins
 12 | ###
 13 | 
 14 | #We're going to join some location data to the flights data so we can try to see the jet stream
 15 | #to do this, we need to know the change in longitude of each flight
 16 | #first we load up the airport location data
 17 | latlong = read.csv("Airport_Codes_mapped_to_Latitude_Longitude_in_the_United_States.csv",header=TRUE)
 18 | longitudes = latlong[,c(1,3)] #we only need longitudes
 19 | 
 20 | #now we'll do the actual join
 21 | #in base R, this is done using the merge() function
 22 | flights = merge(flights,longitudes,by.x="Origin",by.y="locationID")
 23 | #let's take a look at the data frame now
 24 | #see that the column we've just merged in is called "Longitude"
 25 | #but since we merged on origin, it's really the origin longitude.
 26 | #So we rename it:
 27 | names(flights)[match("Longitude",names(flights))]="Origin.Long"
 28 | #same for destination longitude
 29 | flights = merge(flights,longitudes,by.x="Dest",by.y="locationID")
 30 | names(flights)[match("Longitude",names(flights))]="Dest.Long"
 31 | 
 32 | #we'll now compute flight speeds and changes in longitude
 33 | flights$Speed = flights$Distance / flights$AirTime
 34 | summary(flights$Speed) #uhoh
 35 | #some flights have no speed (perhaps they never made it off the ground)
 36 | flights = subset(flights,AirTime>0)
 37 | flights$DiffLong = flights$Dest.Long - flights$Origin.Long
 38 | 
 39 | #can we see the jet stream in action?
 40 | plot(flights$DiffLong, flights$Speed,pch=".")
 41 | js.effect = cor(flights$DiffLong, flights$Speed)
 42 | 
 43 | ###
 44 | #Joins assignment
 45 | ###
 46 | 
 47 | # 1) Join airport latitudes to the flight data. What was the largest change in latitude for any flight?
 48 | # 2) (optional) Find a flight (may not be unique) which experienced this largest change in latitude. 
 49 | #    Hint: use the order() function to sort a data frame
 50 | # 3) (optional) Re-do the jet stream example using latitudes instead of longitudes.
 51 | #    Is there a relationship between change in latitude and flight speed?
 52 | 
 53 | ###
 54 | #Joins with split-apply-combine
 55 | ###
 56 | 
 57 | #Here we do a more complicated joins example
 58 | #the join is on multiple columns
 59 | #and the analysis uses split-apply-combine
 60 | #our goal is to find, for each airport, the average weather delay per .1 mm precipitation
 61 | 
 62 | #read data to be joined
 63 | weather = read.csv("prcp_pretty.csv")
 64 | 
 65 | #merge in precipitation data
 66 | #rows must match on day of month AND airport
 67 | flights = merge(flights,weather,by.x=c("Origin","DayofMonth"),by.y=c("Airport","DayOfMonth"))
 68 | 
 69 | #for this analysis, we only want entries with a number for weather delay (no NA)
 70 | #we also limit to days with precipitation
 71 | flights.rain = subset(flights, !is.na(WeatherDelay) & prcp>0)
 72 | flights.rain$DelayRatio = flights.rain$WeatherDelay / flights.rain$prcp
 73 | 
 74 | #split apply combine to find average weather delay per inch of precipitation
 75 | #first we split
 76 | #we must discard unused factors or we will get empty data frames for airports not in the top 20
 77 | flights.rain$Origin = factor(flights.rain$Origin)
 78 | flights.rain.split = split(flights.rain,flights.rain$Origin)
 79 | 
 80 | #define a function
 81 | process.airport = function(df){
 82 |   airport.name = df$Origin[1]
 83 |   avg.ratio = mean(df$DelayRatio)
 84 |   return(data.frame(Airport=airport.name, Avg.delay.ratio=avg.ratio))
 85 | }
 86 | 
 87 | flights.rain.split = lapply(flights.rain.split,process.airport)
 88 | airport.info = do.call(rbind,flights.rain.split)
 89 | 
 90 | #let's order the resulting data frame
 91 | airport.info = airport.info[order(airport.info$Avg.delay.ratio),]
 92 | 
 93 | 
 94 | ###
 95 | #Second joins assignment
 96 | ###
 97 | #Is there a relationship between airport latitude and average delay ratio?
 98 | 
 99 | 
100 | 
101 | 
102 | ###
103 | #sqldf
104 | ###
105 | #side by side examples of:
106 | #subsetting
107 | flights.bos = subset(flights, Dest=="BOS")
108 | flights.bos = sqldf("select * from flights where Dest='BOS'")
109 | #subset and only keep only selected columns
110 | flights.fast = subset(flights, Speed>mean(flights$Speed))[,c("Origin","Dest")]
111 | flights.fast = sqldf("select Origin, Dest from flights where Speed>(select avg(Speed) from flights)")
112 | #inner join - note differences in columns returned
113 | A = airport.info[,1:2] #discard location data
114 | airport.info = sqldf("select * from A inner join latlong where A.Airport = latlong.locationID")
115 | airport.info = merge(A,latlong,by.x="Airport",by.y="locationID")
116 | 


--------------------------------------------------------------------------------
/1-intro-R/1-2.R:
--------------------------------------------------------------------------------
  1 | # IAP 2015
  2 | # 15.S60 Software Tools for Operations Research
  3 | # Lecture 1: Introduction to R
  4 | 
  5 | # Script file 1-2.R
  6 | # In this script file, we cover linear regression
  7 | # and logistic regression.
  8 | 
  9 | #######################
 10 | ## LINEAR REGRESSION ##
 11 | #######################
 12 | 
 13 | # Load CEOcomp dataset if you haven't already
 14 | CEOcomp = read.csv(file = "CEOcomp.csv", header = TRUE)
 15 | 
 16 | # Use lm to create a linear regression model
 17 | CEO.linReg <- lm(TotalCompensation ~ Years + ChangeStockPrice + ChangeCompanySales + MBA, data = CEOcomp)
 18 | 
 19 | # First argument is the formula, second argument
 20 | # is the data.  Notice that you don't need $ here 
 21 | # since we are specifying the dataset in the function call
 22 | 
 23 | # Use summary to take a look at the model
 24 | summary(CEO.linReg)
 25 | 
 26 | # Which variables are significant predictors of
 27 | # TotalCompensation at the p = .05 level?
 28 | 
 29 | # Check out some other useful outputs of a 
 30 | # linear regression
 31 | CEO.linReg$coefficients
 32 | CEO.linReg$residuals
 33 | confint(CEO.linReg, level = 0.95)
 34 | 
 35 | # We can also compute correlation between variables
 36 | cor(CEOcomp$TotalCompensation, CEOcomp$Years)
 37 | 
 38 | # Or create a correlation table (note: all columns
 39 | # must be numeric to compute correlation of
 40 | # the entire dataset)
 41 | cor(CEOcomp)
 42 | 
 43 | # We can also get more data on pairwise correlation:
 44 | cor.test(CEOcomp$TotalCompensation, CEOcomp$Years)
 45 | 
 46 | ################################################
 47 | ## SPLITTING DATA INTO TRAINING AND TEST SETS ##
 48 | ################################################
 49 | 
 50 | # Load the dataset of interest
 51 | TitanicPassengers = read.csv("TitanicPassengers.csv")
 52 | str(TitanicPassengers)
 53 | 
 54 | # We first need to install a package to help
 55 | # us split the data.  Note that this only
 56 | # needs to be done once per machine!
 57 | install.packages("caTools")
 58 | 
 59 | # Now load the library.  This needs to be done
 60 | # every time you wish to use the library.
 61 | library(caTools)
 62 | 
 63 | 
 64 | # Now split the dataset into training and testing
 65 | split <- sample.split(TitanicPassengers$Survived, SplitRatio = 0.6)
 66 | TitanicTrain <- TitanicPassengers[split, ]
 67 | TitanicTest <- TitanicPassengers[!split, ]
 68 | 
 69 | #########################
 70 | ## LOGISTIC REGRESSION ##
 71 | #########################
 72 | 
 73 | # Run a logistic regression using general linear model
 74 | Titanic.logReg = glm(Survived ~ Class + Age + Sex, data = TitanicTrain, family = binomial)
 75 | summary(Titanic.logReg)
 76 | 
 77 | # Compute predicted probabilities on training data
 78 | Titanic.logPred = predict(Titanic.logReg, type = "response")
 79 | 
 80 | # Build a classification table to check accuracy on 
 81 | # training set. Note that due to randomness of split, 
 82 | # classification matrices may be slightly different
 83 | table(TitanicTrain$Survived, round(Titanic.logPred))
 84 | 
 85 | # We now do the same for the test set
 86 | Titanic.logPredTest = predict(Titanic.logReg, newdata = TitanicTest, type = "response")
 87 | test.table <- table(TitanicTest$Survived, round(Titanic.logPredTest))
 88 | test.table
 89 | 
 90 | # Compute percentage correct (overall accuracy)
 91 | sum(diag(test.table))/nrow(TitanicTest)
 92 | 
 93 | ################
 94 | ## ASSIGNMENT ##
 95 | ################
 96 | 
 97 | # 1a) Load the dataset LettersBinary.csv and check its structure.
 98 | 
 99 | 
100 | 
101 | #     Doesn't make much sense, huh? Each observation 
102 | #     in this dataset is a capital letter H or R, in one
103 | #     of a variety of fonts, and distorted in various 
104 | #     ways. The attributes x1 ... x16 are all properties
105 | #     of the resultant transformation.  In this 
106 | #     assignment, we wish to see if these attributes 
107 | #     can be useful predictors of what the original 
108 | #     letter was.
109 | 
110 | #  b) Split the dataset into training and test sets 
111 | #     such that the training set is comprised of 60% 
112 | #     of the original data.
113 | 
114 | 
115 | 
116 | 
117 | #  c) Build a logistic regression model to predict 
118 | #     the letter based on the attributes. Then create a 
119 | #     classification matrix and determine the 
120 | #     accuracy of the model on the test set.
121 | 
122 | letters.formula <- formula(Letter ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15 + x16)
123 | 
124 | 
125 | #     You can use letters.formula in place of 
126 | #     typing the formula
127 | 
128 | 
129 | 
130 | 
131 | 
132 | # EXTRA ASSIGNMENT: For linear regression, there are 
133 | # several tests that should be done to make sure a model 
134 | # is valid. We already did one of them (computed the 
135 | # correlations). Here we will go through the others.
136 | 
137 | # *1) Plot the residuals to see if they are normally 
138 | #     distributed (testing normality of the error 
139 | #     distribution):
140 | 
141 | 
142 | # *2) Plot the observed vs. predicted values to see if 
143 | #     they are symmetrically distributed around a diagonal 
144 | #     line (testing the linear relationship between the 
145 | #     dependent and independent variables)
146 | 
147 | 
148 | # *3) Plot the residuals as a function of the predicted 
149 | #     values (testing for heteroscedasticity)
150 | 
151 | 
152 | 
153 | 
154 | 
155 | 
156 | 
157 | 
158 | 
159 | 
160 | 
161 | 
162 | 
163 | 
164 | 
165 | 
166 | 
167 | 
168 | 
169 | 
170 | 
171 | 
172 | 
173 | 
174 | 
175 | 
176 | 
177 | 


--------------------------------------------------------------------------------
/5-simulation/distributed.jl:
--------------------------------------------------------------------------------
  1 | # Start up multiple processors
  2 | # addprocs(8)
  3 | 
  4 | # can also start up via command line, ie julia -p 4
  5 | 
  6 | # View the running worker processors
  7 | workers()
  8 | 
  9 | # Run a simple job on a worker
 10 | ref = @spawn rand()
 11 | 
 12 | # ref contains a reference to the data:
 13 | #   -- ref.where contains proc id of where the data is stored
 14 | #   -- ref.whence contains the master proc's id
 15 | #   -- ref.id is a unique ID
 16 | 
 17 | # To see the result locally, run fetch:
 18 | fetch(ref)
 19 | 
 20 | # If we want to specify the proc the code runs on
 21 | ref = @spawnat 3 rand()
 22 | 
 23 | # Suppose we define a our own function 
 24 | function estimatePi(n)
 25 | 	count = 0;
 26 | 	for i in 1:n
 27 | 		if rand()^2 + rand()^2 < 1
 28 | 			count += 1
 29 | 		end
 30 | 	end
 31 | 	return count
 32 | end
 33 | 
 34 | 
 35 | # Works fine locally
 36 | n = 1000
 37 | piEst = 4 * estimatePi(n)/1000
 38 | println("Pi is approximately $piEst")
 39 | 
 40 | # What happens here?
 41 | # @spawnat 2 estimatePi(1000)
 42 | 
 43 | # To run code on all workers, use @everywhere
 44 | @everywhere function estimatePi(n)
 45 | 	count = 0;
 46 | 	for i in 1:n
 47 | 		if rand()^2 + rand()^2 < 1
 48 | 			count += 1
 49 | 		end
 50 | 	end
 51 | 	return count
 52 | end
 53 | 
 54 | # Now it works
 55 | n = 1000
 56 | piEst = 4/n * remotecall_fetch(2,estimatePi,n) # spawn f on proc 2 and fetch results
 57 | 
 58 | # Assignment: Write a function that runs the simulation in bank_11.jl and returns how long it
 59 | # took to process all the customers. Run this function on a different core
 60 | 
 61 | 	# Hint: After the simulation is run, sim.time contains the time of the last scheduled event
 62 | 
 63 | 
 64 | 
 65 | 
 66 | 
 67 | # Using all cores
 68 | 
 69 | # We want each processsor to run some simulations, and then return its results
 70 | 
 71 | # Could do it manually:
 72 | 
 73 | nCpus = length(workers())
 74 | totalSims = 8 * 10^7
 75 | sims_per_cpu = div(totalSims,nCpus) # integer arithmatic
 76 | 
 77 | results = cell(nCpus)
 78 | for i in 1:nCpus
 79 | 	results[i] = @spawnat i estimatePi(sims_per_cpu)
 80 | end
 81 | for i in 1:length(results)
 82 | 	results[i] = fetch(results[i]);
 83 | end
 84 | total = 0
 85 | for i in 1:nCpus
 86 | 	total += results[i]
 87 | end
 88 | piEst = 4 * total / totalSims
 89 | println("Pi is approximately $piEst")
 90 | 
 91 | # Julia also has a built-in method to help us 
 92 | help("map") # like apply in R
 93 | 
 94 | input = sims_per_cpu * ones(nCpus)
 95 | results = map(estimatePi,input)
 96 | 
 97 | help("pmap")
 98 | results = pmap(estimatePi,input)
 99 | 
100 | total = 0
101 | for i in 1:nCpus
102 | 	total += results[i]
103 | end
104 | piEst = 4 * total / totalSims
105 | println("Pi is approximately $piEst")
106 | 
107 | function benchmark(n)
108 | 	input = sims_per_cpu * ones(nCpus)
109 | 	@time map(estimatePi,input)
110 | 	@time pmap(estimatePi,input)
111 | 	return
112 | end
113 | 
114 | benchmark(10)
115 | benchmark(10^2)
116 | benchmark(10^7)
117 | 
118 | # Assignment 2: Use PMAP to run bank_11.jl in parallel to estimate the mean time to process all the customers
119 | 	# Hint: define a function that takes the random seed as the input and returns the time
120 | 
121 | 
122 | 
123 | 
124 | 
125 | 
126 | 
127 | 
128 | # When doesn't pmap scale well? 
129 | 
130 | # pmap must send input to each proc, and get output <- lots of communication
131 | # Okay if few, big jobs, but not good for many small jobs
132 | 
133 | # Solution: MapReduce
134 | # Send "batches" to each proc (Map)
135 | # Each proc runs batch and creates batch summary (reduce)
136 | # Each proc returns batch summary 
137 | # Master proc compiles summary files from batch summaries (reduce)
138 | 
139 | # Syntax:
140 | # @parallel [reducer] for ...
141 | # [code]
142 | # end
143 | 
144 | # Adds a whole bunch of random numbers together
145 | @parallel (+) for i in 1:10^8
146 | 	rand()
147 | end
148 | 
149 | # Easier than our pmap example above
150 | count = @parallel (+) for i in totalSims
151 | 	estimatePi(1)
152 | end
153 | 
154 | estPi = 4 * count / totalSims
155 | println("Pi is approximately $piEst")
156 | 
157 | ## How to write a custom reducer
158 | 
159 | # Reducer takes in two arguments of the same type, and returns that same type
160 | # e.g. the "+" method takes in two real numbers and returns a real number
161 | 
162 | # Suppose we want to numerically estimate the mean and standard error of our distribution
163 | # E[x] = 1/n sum x_i
164 | # E[x^2] = 1/n sum x_i^2
165 | # var[x] = E[x^2] - E^2[x]
166 | 
167 | @everywhere type Results
168 | 	estimate
169 | 	estimateSq
170 | end
171 | @everywhere Results(x) = Results(x,x^2)
172 | 
173 | # And then modify our simulate function to return Results...
174 | @everywhere function runSims(n)
175 | 	count = estimatePi(n)
176 | 	piEst = 4 * count / n
177 | 	return Results(piEst)
178 | end
179 | 
180 | # And now we can write our reducer
181 | 
182 | @everywhere function myReduce(a::Results,b::Results)
183 | 	return Results(a.estimate + b.estimate, 
184 | 		           a.estimateSq + b.estimateSq)
185 | end
186 | 
187 | # Now we can do our MapReduce!
188 | n = 10^3
189 | results = @parallel myReduce for i in 1:n
190 | 	runSims(1000)
191 | end
192 | 
193 | # And now to process our results
194 | function process(results::Results, n)
195 | 	mean = results.estimate / n
196 | 	stdev = sqrt(results.estimateSq / n - mean^2)
197 | 
198 | 	println("Grand mean: $mean")
199 | 	println("Std  Error: $stdev")
200 | end
201 | 
202 | process(results,n)
203 | 
204 | # Assignment: Write your own MapReduce implementation to calculate the mean and standard devation
205 | # of the time to process all the customers in bank_11.jl
206 | 
207 | 
208 | 
209 | 
210 | 
211 | 
212 | 
213 | 
214 | 	


--------------------------------------------------------------------------------
/8-project/Historical_Route.csv:
--------------------------------------------------------------------------------
  1 | FLIGHT_ID,FLIGHT_NUMBER,ORIGIN,DESTINATION,TAIL_NUMBER,SCH_DEP,SCH_ARR
  2 | 4,AS583,LAX,SFO,N703AS,19461060,19461143
  3 | 49,AS391,SEA,GEG,N705AS,19461360,19461419
  4 | 63,AS366,GEG,SEA,N705AS,19461460,19461521
  5 | 123,AS141,SEA,FAI,N705AS,19461887,19461939
  6 | 142,AS140,FAI,ANC,N705AS,19462000,19462053
  7 | 18,AS423,SNA,SEA,N708AS,19461045,19461234
  8 | 44,AS334,SEA,OAK,N708AS,19461274,19461400
  9 | 66,AS357,OAK,SEA,N708AS,19461436,19461555
 10 | 107,AS424,SEA,SNA,N708AS,19461666,19461828
 11 | 138,AS497,SNA,SEA,N708AS,19461863,19462034
 12 | 31,AS550,SEA,SAN,N713AS,19461180,19461343
 13 | 67,AS575,SAN,SEA,N713AS,19461383,19461559
 14 | 5,AS362,PDX,SJC,N754AS,19461040,19461152
 15 | 25,AS313,SJC,PDX,N754AS,19461190,19461297
 16 | 56,AS716,PDX,PHX,N754AS,19461337,19461484
 17 | 91,AS707,PHX,PDX,N754AS,19461542,19461714
 18 | 115,AS310,PDX,SFO,N754AS,19461757,19461867
 19 | 139,AS325,SFO,SEA,N754AS,19461916,19462039
 20 | 24,AS723,PHX,SEA,N755AS,19461083,19461271
 21 | 53,AS630,SEA,LAS,N755AS,19461312,19461460
 22 | 16,AS081,SEA,ANC,N760AS,19461000,19461226
 23 | 30,AS081,ANC,FAI,N760AS,19461280,19461340
 24 | 51,AS082,FAI,ANC,N760AS,19461374,19461440
 25 | 89,AS082,ANC,SEA,N760AS,19461484,19461697
 26 | 109,AS498,SEA,SFO,N760AS,19461721,19461851
 27 | 128,AS498,SFO,PSP,N760AS,19461888,19461970
 28 | 37,AS152,ANC,OME,N762AS,19461259,19461359
 29 | 52,AS152,OME,OTZ,N762AS,19461399,19461441
 30 | 70,AS152,OTZ,ANC,N762AS,19461481,19461573
 31 | 85,AS032,ANC,ADQ,N762AS,19461618,19461678
 32 | 101,AS033,ADQ,ANC,N762AS,19461718,19461772
 33 | 119,AS045,ANC,BET,N762AS,19461820,19461896
 34 | 133,AS046,BET,ANC,N762AS,19461936,19462003
 35 | 14,AS143,ANC,FAI,N763AS,19461150,19461206
 36 | 29,AS143,FAI,BRW,N763AS,19461246,19461330
 37 | 50,AS143,BRW,SCC,N763AS,19461375,19461430
 38 | 69,AS143,SCC,ANC,N763AS,19461470,19461572
 39 | 93,AS146,ANC,SCC,N763AS,19461618,19461721
 40 | 106,AS146,SCC,BRW,N763AS,19461762,19461814
 41 | 125,AS146,BRW,FAI,N763AS,19461860,19461940
 42 | 137,AS146,FAI,ANC,N763AS,19461980,19462033
 43 | 1,AS073,SIT,JNU,N764AS,19461060,19461104
 44 | 20,AS073,JNU,ANC,N764AS,19461144,19461251
 45 | 88,AS384,ANC,SFO,N764AS,19461571,19461695
 46 | 140,AS710,SFO,PHX,N764AS,19461895,19462042
 47 | 10,AS236,PDX,SFO,N767AS,19461045,19461159
 48 | 134,AS324,SFO,SJC,N767AS,19461883,19462009
 49 | 3,AS041,ANC,BET,N768AS,19461065,19461143
 50 | 21,AS042,BET,ANC,N768AS,19461184,19461257
 51 | 9,AS531,SJC,SEA,N769AS,19461030,19461158
 52 | 47,AS496,SEA,SNA,N769AS,19461248,19461408
 53 | 74,AS421,SNA,PDX,N769AS,19461444,19461592
 54 | 100,AS352,PDX,SNA,N769AS,19461630,19461768
 55 | 117,AS321,SNA,OAK,N769AS,19461805,19461888
 56 | 141,AS321,OAK,SEA,N769AS,19461925,19462043
 57 | 33,AS812,SEA,DFW,N771AS,19461120,19461348
 58 | 86,AS813,DFW,SEA,N771AS,19461403,19461678
 59 | 112,AS470,SEA,OAK,N771AS,19461739,19461861
 60 | 129,AS470,OAK,SNA,N771AS,19461896,19461980
 61 | 81,AS363,SEA,GEG,N772AS,19461600,19461655
 62 | 97,AS361,GEG,SEA,N772AS,19461695,19461761
 63 | 127,AS682,SEA,LAS,N772AS,19461815,19461956
 64 | 146,AS693,LAS,SEA,N772AS,19461998,19462152
 65 | 6,AS399,PSP,SFO,N773AS,19461060,19461153
 66 | 28,AS399,SFO,SEA,N773AS,19461188,19461319
 67 | 8,AS642,SEA,LAS,N774AS,19461010,19461158
 68 | 36,AS663,LAS,SEA,N774AS,19461195,19461356
 69 | 64,AS428,SEA,SJC,N774AS,19461405,19461535
 70 | 83,AS379,SJC,PDX,N774AS,19461570,19461673
 71 | 105,AS368,PDX,SMF,N774AS,19461710,19461798
 72 | 122,AS0389,SMF,PDX,N774AS,19461833,19461925
 73 | 17,AS349,LGB,SEA,N775AS,19461045,19461230
 74 | 121,AS069,SEA,KTN,N775AS,19461787,19461912
 75 | 135,AS069,KTN,JNU,N775AS,19461952,19462012
 76 | 22,AS065,SEA,KTN,N776AS,19461127,19461260
 77 | 32,AS065,KTN,WRG,N776AS,19461300,19461347
 78 | 48,AS065,WRG,PSG,N776AS,19461382,19461411
 79 | 59,AS065,PSG,JNU,N776AS,19461452,19461495
 80 | 80,AS065,JNU,ANC,N776AS,19461540,19461645
 81 | 120,AS096,ANC,SEA,N776AS,19461695,19461911
 82 | 145,AS376,SEA,GEG,N776AS,19462020,19462079
 83 | 65,AS598,SEA,LAX,N778AS,19461390,19461550
 84 | 95,AS485,LAX,SEA,N778AS,19461585,19461746
 85 | 132,AS720,SEA,PHX,N778AS,19461832,19461994
 86 | 79,AS494,SEA,OAK,N779AS,19461507,19461631
 87 | 104,AS359,OAK,SEA,N779AS,19461671,19461791
 88 | 11,AS529,LAX,SEA,N780AS,19461015,19461189
 89 | 38,AS402,SEA,BOI,N780AS,19461290,19461369
 90 | 58,AS381,BOI,SEA,N780AS,19461404,19461495
 91 | 87,AS067,SEA,KTN,N780AS,19461563,19461687
 92 | 102,AS067,KTN,SIT,N780AS,19461727,19461784
 93 | 114,AS067,SIT,JNU,N780AS,19461824,19461866
 94 | 136,AS067,JNU,ANC,N780AS,19461907,19462013
 95 | 35,AS097,SEA,ANC,N783AS,19461120,19461353
 96 | 61,AS064,ANC,JNU,N783AS,19461403,19461510
 97 | 75,AS064,JNU,PSG,N783AS,19461550,19461596
 98 | 82,AS064,PSG,WRG,N783AS,19461636,19461663
 99 | 94,AS064,WRG,KTN,N783AS,19461700,19461738
100 | 118,AS064,KTN,SEA,N783AS,19461778,19461893
101 | 43,AS690,SFO,PSP,N785AS,19461300,19461383
102 | 62,AS543,PSP,SFO,N785AS,19461423,19461518
103 | 84,AS543,SFO,SEA,N785AS,19461553,19461676
104 | 116,AS374,SEA,ONT,N785AS,19461730,19461880
105 | 144,AS445,ONT,SEA,N785AS,19461922,19462079
106 | 23,AS526,SEA,PSP,N786AS,19461110,19461266
107 | 55,AS447,PSP,SEA,N786AS,19461307,19461473
108 | 15,AS719,PHX,PDX,N788AS,19461035,19461211
109 | 42,AS606,PDX,LAS,N788AS,19461251,19461382
110 | 68,AS697,LAS,PDX,N788AS,19461420,19461565
111 | 90,AS418,PDX,SJC,N788AS,19461607,19461714
112 | 111,AS301,SJC,PDX,N788AS,19461750,19461855
113 | 60,AS354,SEA,SFO,N791AS,19461379,19461507
114 | 78,AS354,SFO,PSP,N791AS,19461542,19461625
115 | 96,AS685,PSP,SFO,N791AS,19461665,19461760
116 | 39,AS411,LAX,PDX,N792AS,19461230,19461374
117 | 72,AS462,PDX,LAX,N792AS,19461444,19461581
118 | 98,AS409,LAX,PDX,N792AS,19461616,19461761
119 | 130,AS608,PDX,LAS,N792AS,19461858,19461985
120 | 147,AS695,LAS,PDX,N792AS,19462020,19462159
121 | 2,AS062,FAI,ANC,N793AS,19461060,19461130
122 | 26,AS062,ANC,JNU,N793AS,19461190,19461298
123 | 41,AS062,JNU,SIT,N793AS,19461338,19461382
124 | 54,AS062,SIT,KTN,N793AS,19461422,19461471
125 | 77,AS062,KTN,SEA,N793AS,19461511,19461621
126 | 113,AS468,SEA,LAX,N793AS,19461704,19461863
127 | 7,AS151,ANC,OTZ,N794AS,19461060,19461157
128 | 19,AS151,OTZ,OME,N794AS,19461197,19461242
129 | 40,AS151,OME,ANC,N794AS,19461282,19461375
130 | 76,AS043,ANC,BET,N794AS,19461522,19461601
131 | 92,AS044,BET,ANC,N794AS,19461642,19461715
132 | 110,AS153,ANC,OTZ,N794AS,19461760,19461854
133 | 124,AS153,OTZ,OME,N794AS,19461894,19461939
134 | 143,AS153,OME,ANC,N794AS,19461980,19462066
135 | 108,AS115,SEA,ANC,N795AS,19461616,19461844
136 | 131,AS070,ANC,JNU,N795AS,19461890,19461991
137 | 13,AS060,JNU,KTN,N796AS,19461132,19461195
138 | 34,AS060,KTN,SEA,N796AS,19461235,19461353
139 | 99,AS431,SEA,SNA,N796AS,19461600,19461767
140 | 27,AS061,SEA,JNU,N797AS,19461148,19461308
141 | 45,AS061,JNU,YAK,N797AS,19461348,19461407
142 | 57,AS061,YAK,CDV,N797AS,19461443,19461493
143 | 71,AS061,CDV,ANC,N797AS,19461528,19461576
144 | 126,AS128,ANC,FAI,N797AS,19461896,19461948
145 | 12,AS766,PDX,PHX,N799AS,19461030,19461190
146 | 46,AS751,PHX,PDX,N799AS,19461228,19461407
147 | 73,AS602,PDX,LAS,N799AS,19461457,19461584
148 | 103,AS0641,LAS,SEA,N799AS,19461621,19461785
149 | 


--------------------------------------------------------------------------------
/8-project/Flight_Alaska.csv:
--------------------------------------------------------------------------------
  1 | FLIGHT_ID,FLIGHT_NUMBER,ORIGIN,DESTINATION,TAIL_NUMBER,SCH_DEP,SCH_ARR,ARRIVAL_DELAY
  2 | 1,AS073,SIT,JNU,N764AS,19461060,19461104,0
  3 | 2,AS062,FAI,ANC,N793AS,19461060,19461130,0
  4 | 3,AS041,ANC,BET,N768AS,19461065,19461143,0
  5 | 4,AS583,LAX,SFO,N703AS,19461060,19461143,16
  6 | 5,AS362,PDX,SJC,N754AS,19461040,19461152,0
  7 | 6,AS399,PSP,SFO,N773AS,19461060,19461153,143
  8 | 7,AS151,ANC,OTZ,N794AS,19461060,19461157,0
  9 | 8,AS642,SEA,LAS,N774AS,19461010,19461158,0
 10 | 9,AS531,SJC,SEA,N769AS,19461030,19461158,0
 11 | 10,AS236,PDX,SFO,N767AS,19461045,19461159,3
 12 | 11,AS529,LAX,SEA,N780AS,19461015,19461189,4
 13 | 12,AS766,PDX,PHX,N799AS,19461030,19461190,0
 14 | 13,AS060,JNU,KTN,N796AS,19461132,19461195,0
 15 | 14,AS143,ANC,FAI,N763AS,19461150,19461206,0
 16 | 15,AS719,PHX,PDX,N788AS,19461035,19461211,0
 17 | 16,AS081,SEA,ANC,N760AS,19461000,19461226,123
 18 | 17,AS349,LGB,SEA,N775AS,19461045,19461230,0
 19 | 18,AS423,SNA,SEA,N708AS,19461045,19461234,0
 20 | 19,AS151,OTZ,OME,N794AS,19461197,19461242,0
 21 | 20,AS073,JNU,ANC,N764AS,19461144,19461251,231
 22 | 21,AS042,BET,ANC,N768AS,19461184,19461257,0
 23 | 22,AS065,SEA,KTN,N776AS,19461127,19461260,0
 24 | 23,AS526,SEA,PSP,N786AS,19461110,19461266,0
 25 | 24,AS723,PHX,SEA,N755AS,19461083,19461271,6
 26 | 25,AS313,SJC,PDX,N754AS,19461190,19461297,0
 27 | 26,AS062,ANC,JNU,N793AS,19461190,19461298,0
 28 | 27,AS061,SEA,JNU,N797AS,19461148,19461308,0
 29 | 28,AS399,SFO,SEA,N773AS,19461188,19461319,0
 30 | 29,AS143,FAI,BRW,N763AS,19461246,19461330,0
 31 | 30,AS081,ANC,FAI,N760AS,19461280,19461340,0
 32 | 31,AS550,SEA,SAN,N713AS,19461180,19461343,0
 33 | 32,AS065,KTN,WRG,N776AS,19461300,19461347,54
 34 | 33,AS812,SEA,DFW,N771AS,19461120,19461348,0
 35 | 34,AS060,KTN,SEA,N796AS,19461235,19461353,0
 36 | 35,AS097,SEA,ANC,N783AS,19461120,19461353,0
 37 | 36,AS663,LAS,SEA,N774AS,19461195,19461356,33
 38 | 37,AS152,ANC,OME,N762AS,19461259,19461359,0
 39 | 38,AS402,SEA,BOI,N780AS,19461290,19461369,13
 40 | 39,AS411,LAX,PDX,N792AS,19461230,19461374,25
 41 | 40,AS151,OME,ANC,N794AS,19461282,19461375,0
 42 | 41,AS062,JNU,SIT,N793AS,19461338,19461382,9
 43 | 42,AS606,PDX,LAS,N788AS,19461251,19461382,3
 44 | 43,AS690,SFO,PSP,N785AS,19461300,19461383,0
 45 | 44,AS334,SEA,OAK,N708AS,19461274,19461400,0
 46 | 45,AS061,JNU,YAK,N797AS,19461348,19461407,5
 47 | 46,AS751,PHX,PDX,N799AS,19461228,19461407,0
 48 | 47,AS496,SEA,SNA,N769AS,19461248,19461408,0
 49 | 48,AS065,WRG,PSG,N776AS,19461382,19461411,32
 50 | 49,AS391,SEA,GEG,N705AS,19461360,19461419,6
 51 | 50,AS143,BRW,SCC,N763AS,19461375,19461430,0
 52 | 51,AS082,FAI,ANC,N760AS,19461374,19461440,0
 53 | 52,AS152,OME,OTZ,N762AS,19461399,19461441,0
 54 | 53,AS630,SEA,LAS,N755AS,19461312,19461460,3
 55 | 54,AS062,SIT,KTN,N793AS,19461422,19461471,0
 56 | 55,AS447,PSP,SEA,N786AS,19461307,19461473,0
 57 | 56,AS716,PDX,PHX,N754AS,19461337,19461484,0
 58 | 57,AS061,YAK,CDV,N797AS,19461443,19461493,0
 59 | 58,AS381,BOI,SEA,N780AS,19461404,19461495,23
 60 | 59,AS065,PSG,JNU,N776AS,19461452,19461495,26
 61 | 60,AS354,SEA,SFO,N791AS,19461379,19461507,0
 62 | 61,AS064,ANC,JNU,N783AS,19461403,19461510,0
 63 | 62,AS543,PSP,SFO,N785AS,19461423,19461518,0
 64 | 63,AS366,GEG,SEA,N705AS,19461460,19461521,9
 65 | 64,AS428,SEA,SJC,N774AS,19461405,19461535,27
 66 | 65,AS598,SEA,LAX,N778AS,19461390,19461550,48
 67 | 66,AS357,OAK,SEA,N708AS,19461436,19461555,0
 68 | 67,AS575,SAN,SEA,N713AS,19461383,19461559,1
 69 | 68,AS697,LAS,PDX,N788AS,19461420,19461565,10
 70 | 69,AS143,SCC,ANC,N763AS,19461470,19461572,0
 71 | 70,AS152,OTZ,ANC,N762AS,19461481,19461573,0
 72 | 71,AS061,CDV,ANC,N797AS,19461528,19461576,0
 73 | 72,AS462,PDX,LAX,N792AS,19461444,19461581,3
 74 | 73,AS602,PDX,LAS,N799AS,19461457,19461584,0
 75 | 74,AS421,SNA,PDX,N769AS,19461444,19461592,0
 76 | 75,AS064,JNU,PSG,N783AS,19461550,19461596,16
 77 | 76,AS043,ANC,BET,N794AS,19461522,19461601,0
 78 | 77,AS062,KTN,SEA,N793AS,19461511,19461621,0
 79 | 78,AS354,SFO,PSP,N791AS,19461542,19461625,0
 80 | 79,AS494,SEA,OAK,N779AS,19461507,19461631,45
 81 | 80,AS065,JNU,ANC,N776AS,19461540,19461645,20
 82 | 81,AS363,SEA,GEG,N772AS,19461600,19461655,18
 83 | 82,AS064,PSG,WRG,N783AS,19461636,19461663,0
 84 | 83,AS379,SJC,PDX,N774AS,19461570,19461673,53
 85 | 84,AS543,SFO,SEA,N785AS,19461553,19461676,0
 86 | 85,AS032,ANC,ADQ,N762AS,19461618,19461678,17
 87 | 86,AS813,DFW,SEA,N771AS,19461403,19461678,0
 88 | 87,AS067,SEA,KTN,N780AS,19461563,19461687,23
 89 | 88,AS384,ANC,SFO,N764AS,19461571,19461695,0
 90 | 89,AS082,ANC,SEA,N760AS,19461484,19461697,6
 91 | 90,AS418,PDX,SJC,N788AS,19461607,19461714,0
 92 | 91,AS707,PHX,PDX,N754AS,19461542,19461714,1
 93 | 92,AS044,BET,ANC,N794AS,19461642,19461715,0
 94 | 93,AS146,ANC,SCC,N763AS,19461618,19461721,3
 95 | 94,AS064,WRG,KTN,N783AS,19461700,19461738,0
 96 | 95,AS485,LAX,SEA,N778AS,19461585,19461746,77
 97 | 96,AS685,PSP,SFO,N791AS,19461665,19461760,0
 98 | 97,AS361,GEG,SEA,N772AS,19461695,19461761,20
 99 | 98,AS409,LAX,PDX,N792AS,19461616,19461761,21
100 | 99,AS431,SEA,SNA,N796AS,19461600,19461767,32
101 | 100,AS352,PDX,SNA,N769AS,19461630,19461768,0
102 | 101,AS033,ADQ,ANC,N762AS,19461718,19461772,9
103 | 102,AS067,KTN,SIT,N780AS,19461727,19461784,15
104 | 103,AS0641,LAS,SEA,N799AS,19461621,19461785,47
105 | 104,AS359,OAK,SEA,N779AS,19461671,19461791,55
106 | 105,AS368,PDX,SMF,N774AS,19461710,19461798,41
107 | 106,AS146,SCC,BRW,N763AS,19461762,19461814,0
108 | 107,AS424,SEA,SNA,N708AS,19461666,19461828,0
109 | 108,AS115,SEA,ANC,N795AS,19461616,19461844,0
110 | 109,AS498,SEA,SFO,N760AS,19461721,19461851,65
111 | 110,AS153,ANC,OTZ,N794AS,19461760,19461854,0
112 | 111,AS301,SJC,PDX,N788AS,19461750,19461855,0
113 | 112,AS470,SEA,OAK,N771AS,19461739,19461861,7
114 | 113,AS468,SEA,LAX,N793AS,19461704,19461863,13
115 | 114,AS067,SIT,JNU,N780AS,19461824,19461866,85
116 | 115,AS310,PDX,SFO,N754AS,19461757,19461867,0
117 | 116,AS374,SEA,ONT,N785AS,19461730,19461880,3
118 | 117,AS321,SNA,OAK,N769AS,19461805,19461888,29
119 | 118,AS064,KTN,SEA,N783AS,19461778,19461893,0
120 | 119,AS045,ANC,BET,N762AS,19461820,19461896,12
121 | 120,AS096,ANC,SEA,N776AS,19461695,19461911,22
122 | 121,AS069,SEA,KTN,N775AS,19461787,19461912,8
123 | 122,AS0389,SMF,PDX,N774AS,19461833,19461925,61
124 | 123,AS141,SEA,FAI,N705AS,19461887,19461939,0
125 | 124,AS153,OTZ,OME,N794AS,19461894,19461939,0
126 | 125,AS146,BRW,FAI,N763AS,19461860,19461940,0
127 | 126,AS128,ANC,FAI,N797AS,19461896,19461948,11
128 | 127,AS682,SEA,LAS,N772AS,19461815,19461956,0
129 | 128,AS498,SFO,PSP,N760AS,19461888,19461970,52
130 | 129,AS470,OAK,SNA,N771AS,19461896,19461980,0
131 | 130,AS608,PDX,LAS,N792AS,19461858,19461985,0
132 | 131,AS070,ANC,JNU,N795AS,19461890,19461991,0
133 | 132,AS720,SEA,PHX,N778AS,19461832,19461994,67
134 | 133,AS046,BET,ANC,N762AS,19461936,19462003,4
135 | 134,AS324,SFO,SJC,N767AS,19461883,19462009,34
136 | 135,AS069,KTN,JNU,N775AS,19461952,19462012,1
137 | 136,AS067,JNU,ANC,N780AS,19461907,19462013,91
138 | 137,AS146,FAI,ANC,N763AS,19461980,19462033,0
139 | 138,AS497,SNA,SEA,N708AS,19461863,19462034,15
140 | 139,AS325,SFO,SEA,N754AS,19461916,19462039,12
141 | 140,AS710,SFO,PHX,N764AS,19461895,19462042,0
142 | 141,AS321,OAK,SEA,N769AS,19461925,19462043,16
143 | 142,AS140,FAI,ANC,N705AS,19462000,19462053,0
144 | 143,AS153,OME,ANC,N794AS,19461980,19462066,0
145 | 144,AS445,ONT,SEA,N785AS,19461922,19462079,102
146 | 145,AS376,SEA,GEG,N776AS,19462020,19462079,3
147 | 146,AS693,LAS,SEA,N772AS,19461998,19462152,36
148 | 147,AS695,LAS,PDX,N792AS,19462020,19462159,43
149 | 


--------------------------------------------------------------------------------
/2-intermediate-R/FirstHalf.R:
--------------------------------------------------------------------------------
  1 | Intermediate R: Data Wrangling
  2 | 
  3 | ##################################
  4 | # Section 1: Load data frame
  5 | 
  6 | # First, load datasets. It's often more convenient to just keep strings as
  7 | # strings, so we pass stringsAsFactors=FALSE.
  8 | flights = read.csv("flights.csv", stringsAsFactors=FALSE)
  9 | 
 10 | # Let's familiarize ourselves a bit with the data
 11 | str(flights)
 12 | 
 13 | ###################################
 14 | # Section 2: tapply/table with built-in commands
 15 | 
 16 | # We're going to be doing a lot of tapply, so let's make sure we remember how
 17 | # to use it.
 18 | # [[Pretty picture of how tapply() works, in slides]]
 19 | 
 20 | # Let's look at the ArrDelayMinutes column
 21 | summary(flights$ArrDelayMinutes)
 22 | 
 23 | # Why the NAs?
 24 | table(flights$Cancelled,is.na(flights$ArrDelayMinutes))
 25 | 
 26 | # To ask questions about delays, we need to exclude the NAs
 27 | flightsFlown = subset(flights, !is.na(flights$ArrDelayMinutes))
 28 | 
 29 | # There are some huge outliers
 30 | hist(flightsFlown$ArrDelayMinutes)
 31 | flightsFlown = subset(flightsFlown, flights$ArrDelayMinutes < 1000)
 32 | 
 33 | # What is the average arrival delay by day of month?
 34 | tapply(flightsFlown$ArrDelayMinutes, flightsFlown$DayofMonth, mean)
 35 | 
 36 | # What is the average arrival delay by airline?
 37 | # What about standard deviation of arrival delay by airline?
 38 | tapply(flightsFlown$ArrDelayMinutes, flightsFlown$Carrier, mean)
 39 | tapply(flightsFlown$ArrDelayMinutes, flightsFlown$Carrier, sd)
 40 | 
 41 | ####################################
 42 | # Assignment 1 (Section 2): tapply/table with built-in commands
 43 | 
 44 | # What is the average departure delay by weekday (not counting early
 45 | # departures)?
 46 | tapply(flightsFlown$DepDelayMinutes, flightsFlown$DayOfWeek, mean)
 47 | 
 48 | # What is the maximum taxi-in time by airport (using 'Dest' column)?
 49 | # Hint: R has a 'max' function.
 50 | tapply(flightsFlown$TaxiIn, flightsFlown$Dest, max)
 51 | 
 52 | # Extra question: What is the proportion of cancelled flights by airline?
 53 | # Hint: The average of TRUE/FALSE values is the proportion that are TRUE.
 54 | # Which airlines have the highest and lowest proportions of cancelled flights? 
 55 | sort(tapply(flights$Cancelled, flights$Carrier, mean))
 56 | 
 57 | #########################################
 58 | # Section 3: tapply with user-defined functions
 59 | 
 60 | # Often we need to write our own functions to answer specific questions we have 
 61 | # about the data. We will write a function that finds the most common origin  
 62 | # airport over a data set of flights.
 63 | 
 64 | # Let's look at how frequently each origin appears in the data set
 65 | tab = sort(table(flights$Origin))
 66 | 
 67 | # Reminder: names function
 68 | names(tab)
 69 | 
 70 | # Writing a function that returns the most common origin given a data set
 71 | most.common = function(x) {
 72 | 	tab = sort(table(x), decreasing = TRUE)
 73 | 	common.origin = names(tab)[1]
 74 | 	return(common.origin)
 75 | }
 76 | 
 77 | # Apply most.common to each carrier using tapply
 78 | tapply(flights$Origin,flights$Carrier,most.common)
 79 | 
 80 | # #########################
 81 | # Assignment 2 (Section 3)
 82 | 
 83 | # One simple way to measure the “skew level” of a distribution is 
 84 | # to subtract the median from the mean Write a function that calculates 
 85 | # this measure of skew for arrival delays (ArrDelayMinutes) and use
 86 | # tapply to calculate it for each carrier.
 87 | # Hint: use the 'median' function.
 88 | 
 89 | shift = function(x) {
 90 | 	mean(x) - median(x)
 91 | }
 92 | 
 93 | tapply(flightsFlown$ArrDelayMinutes, flightsFlown$Carrier, shift)
 94 | 
 95 | # Extra: What is the most common Origin-Destination pair for
 96 | # each carrier? (Hint: use the paste() function. What would you
 97 | give as the first argument for tapply?)
 98 | 
 99 | tapply(paste(flights$Origin,flights$Dest), flights$Carrier, most.common)
100 | 
101 | ##########################
102 | # Section 4: Split-apply-combine
103 | 
104 | # We want to create a new data frame with delay information about each origin 
105 | # airport. Some of the data (about 150,000 entries) has information about 
106 | # causes of the delays. We'll take one more subset of the data to exclude 
107 | # all entries without delay type information.
108 | 
109 | # Which entries to delete?
110 | summary(flights$WeatherDelay)
111 | summary(flights$CarrierDelay)
112 | flightsDelayInfo = subset(flights, !is.na(flights$WeatherDelay))
113 | 
114 | # Is the total of the delay columns equal to departure or arrival delay?
115 | summary(flightsDelayInfo$LateAircraftDelay + flightsDelayInfo$NASDelay +
116 | flightsDelayInfo$CarrierDelay + flightsDelayInfo$WeatherDelay + 
117 | flightsDelayInfo$SecurityDelay == flightsDelayInfo$ArrDelayMinutes)
118 | 
119 | # [[Picture of split-apply-combine; split breaks large df into smaller ones,
120 | #    lapply converts small data frames into 1-row data frames; do.call(rbind)
121 | #    combines them into a single data frame.]]
122 | 
123 | # Let's first split by origin.
124 | spl = split(flightsDelayInfo, flightsDelayInfo$Origin)
125 | str(spl[[1]])
126 | str(spl[[2]])
127 | # spl is a list of data frames
128 | 
129 | # Re-writing the delay proportion function and expanding to include more delay categories:
130 | delay.prop.df = function(x) {
131 | 	prop.carrier = sum(x$CarrierDelay)/sum(x$ArrDelayMinutes)
132 | 	prop.weather = sum(x$WeatherDelay)/sum(x$ArrDelayMinutes)
133 | 	prop.nas = sum(x$NASDelay)/sum(x$ArrDelayMinutes)
134 | 	prop.security = sum(x$SecurityDelay)/sum(x$ArrDelayMinutes)
135 | 	prop.late = sum(x$LateAircraftDelay)/sum(x$ArrDelayMinutes)
136 | 	return(data.frame(Origin = x$Origin[1], prop.carrier = prop.carrier, prop.weather = prop.weather, prop.nas = prop.nas, prop.security = prop.security, prop.late = prop.late))
137 | } 
138 | 
139 | #Testing on a few split up data frames
140 | delay.prop.df(spl[[1]])
141 | c(sum(spl[[1]]$CarrierDelay)/sum(spl[[1]]$ArrDelayMinutes),sum(spl[[1]]$WeatherDelay)/sum(spl[[1]]$ArrDelayMinutes),sum(spl[[1]]$NASDelay)/sum(spl[[1]]$ArrDelayMinutes),sum(spl[[1]]$SecurityDelay)/sum(spl[[1]]$ArrDelayMinutes),sum(spl[[1]]$LateAircraftDelay)/sum(spl[[1]]$ArrDelayMinutes))
142 | 
143 | # Use lapply (apply a function to a list) to convert elements of spl to 1-row summary 
144 | # data frames.
145 | spl2 = lapply(spl, delay.prop.df)
146 | spl2[[1]]
147 | spl2[[2]]
148 | 
149 | # Last step is to combine everything together. We could manually combine with
150 | # rbind:
151 | rbind(spl2[[1]], spl2[[2]], spl2[[3]])
152 | 
153 | # do.call is a nifty function that passes all of the elements of its second
154 | # argument to its first argument, which is a function
155 | flights.delay.info = do.call(rbind, spl2)
156 | head(flights.delay.info)
157 | 
158 | # What are the airports with the highest proportion of weather delays?
159 | flights.delay.info[order(flights.delay.info$prop.weather),]
160 | 
161 | # How about carrier delays?
162 | flights.delay.info[order(flights.delay.info$prop.carrier),]
163 | 
164 | ##########################
165 | # Assignment 3 (Section 4): Split-apply-combine
166 | 
167 | # From the flightsFlown data frame, create a data frame called carrier.info, where each row corresponds
168 | # to one carrier (airline). Include the following variables in your new data frame:
169 | #   - carrier: The carrier code
170 | #   - mean.arr.delay: Average arrival delay time (using ArrDelayMinutes)
171 | #   - longest.delay: Longest flight delay for the month 
172 | #   - most.common.origin: most common origin for the carrier
173 | 
174 | spl = split(flightsFlown, flightsFlown$Carrier)
175 | 
176 | process.carrier = function(x) {
177 | 	carrier = x$Carrier[1]
178 | 	mean.arr.delay = mean(x$ArrDelayMinutes)
179 | 	longest.delay = max(x$ArrDelayMinutes)
180 | 	most.common.origin = most.common(x$Origin)
181 | 	return(data.frame(carrier,mean.arr.delay,longest.delay,most.common.origin))
182 | }
183 | 
184 | spl2 = lapply(spl, process.carrier)
185 | carrier.info = do.call(rbind, spl2)
186 | 


--------------------------------------------------------------------------------
/2-intermediate-R/prcp_pretty.csv:
--------------------------------------------------------------------------------
  1 | "Airport","DayOfMonth","prcp"
  2 | "ATL",1,0
  3 | "ATL",2,43
  4 | "ATL",3,135
  5 | "ATL",4,3
  6 | "ATL",5,10
  7 | "ATL",6,46
  8 | "ATL",7,51
  9 | "ATL",8,33
 10 | "ATL",9,254
 11 | "ATL",10,46
 12 | "ATL",11,0
 13 | "ATL",12,0
 14 | "ATL",13,0
 15 | "ATL",14,137
 16 | "ATL",15,3
 17 | "ATL",16,0
 18 | "ATL",17,0
 19 | "ATL",18,0
 20 | "ATL",19,0
 21 | "ATL",20,0
 22 | "ATL",21,0
 23 | "ATL",22,792
 24 | "ATL",23,20
 25 | "ATL",24,0
 26 | "ATL",25,0
 27 | "ATL",26,0
 28 | "ATL",27,0
 29 | "ATL",28,358
 30 | "ATL",29,51
 31 | "ATL",30,0
 32 | "ATL",31,0
 33 | "BOS",1,69
 34 | "BOS",2,0
 35 | "BOS",3,0
 36 | "BOS",4,0
 37 | "BOS",5,3
 38 | "BOS",6,76
 39 | "BOS",7,23
 40 | "BOS",8,0
 41 | "BOS",9,109
 42 | "BOS",10,18
 43 | "BOS",11,0
 44 | "BOS",12,0
 45 | "BOS",13,0
 46 | "BOS",14,56
 47 | "BOS",15,163
 48 | "BOS",16,0
 49 | "BOS",17,135
 50 | "BOS",18,0
 51 | "BOS",19,0
 52 | "BOS",20,0
 53 | "BOS",21,0
 54 | "BOS",22,3
 55 | "BOS",23,173
 56 | "BOS",24,0
 57 | "BOS",25,0
 58 | "BOS",26,13
 59 | "BOS",27,0
 60 | "BOS",28,0
 61 | "BOS",29,335
 62 | "BOS",30,0
 63 | "BOS",31,0
 64 | "CLT",1,0
 65 | "CLT",2,0
 66 | "CLT",3,3
 67 | "CLT",4,5
 68 | "CLT",5,94
 69 | "CLT",6,0
 70 | "CLT",7,10
 71 | "CLT",8,10
 72 | "CLT",9,79
 73 | "CLT",10,36
 74 | "CLT",11,0
 75 | "CLT",12,0
 76 | "CLT",13,0
 77 | "CLT",14,251
 78 | "CLT",15,3
 79 | "CLT",16,0
 80 | "CLT",17,0
 81 | "CLT",18,0
 82 | "CLT",19,0
 83 | "CLT",20,0
 84 | "CLT",21,0
 85 | "CLT",22,366
 86 | "CLT",23,490
 87 | "CLT",24,0
 88 | "CLT",25,0
 89 | "CLT",26,0
 90 | "CLT",27,0
 91 | "CLT",28,3
 92 | "CLT",29,462
 93 | "CLT",30,0
 94 | "CLT",31,0
 95 | "DEN",1,0
 96 | "DEN",2,0
 97 | "DEN",3,3
 98 | "DEN",4,28
 99 | "DEN",5,0
100 | "DEN",6,0
101 | "DEN",7,0
102 | "DEN",8,5
103 | "DEN",9,0
104 | "DEN",10,0
105 | "DEN",11,0
106 | "DEN",12,0
107 | "DEN",13,0
108 | "DEN",14,0
109 | "DEN",15,0
110 | "DEN",16,0
111 | "DEN",17,0
112 | "DEN",18,0
113 | "DEN",19,0
114 | "DEN",20,0
115 | "DEN",21,8
116 | "DEN",22,5
117 | "DEN",23,5
118 | "DEN",24,0
119 | "DEN",25,0
120 | "DEN",26,0
121 | "DEN",27,0
122 | "DEN",28,10
123 | "DEN",29,0
124 | "DEN",30,0
125 | "DEN",31,0
126 | "DFW",1,0
127 | "DFW",2,0
128 | "DFW",3,0
129 | "DFW",4,0
130 | "DFW",5,102
131 | "DFW",6,216
132 | "DFW",7,0
133 | "DFW",8,0
134 | "DFW",9,0
135 | "DFW",10,0
136 | "DFW",11,0
137 | "DFW",12,0
138 | "DFW",13,3
139 | "DFW",14,0
140 | "DFW",15,0
141 | "DFW",16,0
142 | "DFW",17,0
143 | "DFW",18,0
144 | "DFW",19,0
145 | "DFW",20,5
146 | "DFW",21,376
147 | "DFW",22,0
148 | "DFW",23,0
149 | "DFW",24,0
150 | "DFW",25,0
151 | "DFW",26,0
152 | "DFW",27,0
153 | "DFW",28,0
154 | "DFW",29,0
155 | "DFW",30,0
156 | "DFW",31,0
157 | "DTW",1,0
158 | "DTW",2,0
159 | "DTW",3,23
160 | "DTW",4,0
161 | "DTW",5,0
162 | "DTW",6,0
163 | "DTW",7,0
164 | "DTW",8,8
165 | "DTW",9,10
166 | "DTW",10,0
167 | "DTW",11,3
168 | "DTW",12,0
169 | "DTW",13,0
170 | "DTW",14,122
171 | "DTW",15,8
172 | "DTW",16,20
173 | "DTW",17,8
174 | "DTW",18,0
175 | "DTW",19,0
176 | "DTW",20,86
177 | "DTW",21,224
178 | "DTW",22,64
179 | "DTW",23,5
180 | "DTW",24,0
181 | "DTW",25,3
182 | "DTW",26,13
183 | "DTW",27,0
184 | "DTW",28,0
185 | "DTW",29,0
186 | "DTW",30,0
187 | "DTW",31,20
188 | "EWR",1,0
189 | "EWR",2,0
190 | "EWR",3,0
191 | "EWR",4,0
192 | "EWR",5,3
193 | "EWR",6,203
194 | "EWR",7,38
195 | "EWR",8,25
196 | "EWR",9,86
197 | "EWR",10,69
198 | "EWR",11,0
199 | "EWR",12,0
200 | "EWR",13,0
201 | "EWR",14,122
202 | "EWR",15,104
203 | "EWR",16,0
204 | "EWR",17,46
205 | "EWR",18,0
206 | "EWR",19,0
207 | "EWR",20,0
208 | "EWR",21,0
209 | "EWR",22,8
210 | "EWR",23,135
211 | "EWR",24,0
212 | "EWR",25,0
213 | "EWR",26,0
214 | "EWR",27,0
215 | "EWR",28,0
216 | "EWR",29,335
217 | "EWR",30,0
218 | "EWR",31,0
219 | "IAH",1,0
220 | "IAH",2,0
221 | "IAH",3,0
222 | "IAH",4,0
223 | "IAH",5,0
224 | "IAH",6,3
225 | "IAH",7,0
226 | "IAH",8,0
227 | "IAH",9,18
228 | "IAH",10,0
229 | "IAH",11,0
230 | "IAH",12,0
231 | "IAH",13,15
232 | "IAH",14,0
233 | "IAH",15,0
234 | "IAH",16,0
235 | "IAH",17,0
236 | "IAH",18,0
237 | "IAH",19,8
238 | "IAH",20,0
239 | "IAH",21,376
240 | "IAH",22,3
241 | "IAH",23,0
242 | "IAH",24,0
243 | "IAH",25,0
244 | "IAH",26,0
245 | "IAH",27,0
246 | "IAH",28,0
247 | "IAH",29,0
248 | "IAH",30,0
249 | "IAH",31,0
250 | "JFK",1,0
251 | "JFK",2,0
252 | "JFK",3,0
253 | "JFK",4,0
254 | "JFK",5,3
255 | "JFK",6,145
256 | "JFK",7,43
257 | "JFK",8,8
258 | "JFK",9,64
259 | "JFK",10,71
260 | "JFK",11,0
261 | "JFK",12,0
262 | "JFK",13,0
263 | "JFK",14,145
264 | "JFK",15,208
265 | "JFK",16,0
266 | "JFK",17,33
267 | "JFK",18,0
268 | "JFK",19,0
269 | "JFK",20,0
270 | "JFK",21,0
271 | "JFK",22,0
272 | "JFK",23,124
273 | "JFK",24,0
274 | "JFK",25,0
275 | "JFK",26,0
276 | "JFK",27,0
277 | "JFK",28,0
278 | "JFK",29,300
279 | "JFK",30,0
280 | "JFK",31,0
281 | "LAS",1,0
282 | "LAS",2,0
283 | "LAS",3,8
284 | "LAS",4,5
285 | "LAS",5,0
286 | "LAS",6,0
287 | "LAS",7,0
288 | "LAS",8,0
289 | "LAS",9,0
290 | "LAS",10,0
291 | "LAS",11,0
292 | "LAS",12,0
293 | "LAS",13,0
294 | "LAS",14,0
295 | "LAS",15,0
296 | "LAS",16,0
297 | "LAS",17,0
298 | "LAS",18,0
299 | "LAS",19,0
300 | "LAS",20,0
301 | "LAS",21,0
302 | "LAS",22,0
303 | "LAS",23,0
304 | "LAS",24,0
305 | "LAS",25,0
306 | "LAS",26,0
307 | "LAS",27,0
308 | "LAS",28,0
309 | "LAS",29,0
310 | "LAS",30,0
311 | "LAS",31,0
312 | "LAX",1,0
313 | "LAX",2,0
314 | "LAX",3,0
315 | "LAX",4,0
316 | "LAX",5,0
317 | "LAX",6,0
318 | "LAX",7,66
319 | "LAX",8,0
320 | "LAX",9,0
321 | "LAX",10,0
322 | "LAX",11,0
323 | "LAX",12,0
324 | "LAX",13,0
325 | "LAX",14,0
326 | "LAX",15,0
327 | "LAX",16,0
328 | "LAX",17,0
329 | "LAX",18,0
330 | "LAX",19,10
331 | "LAX",20,0
332 | "LAX",21,0
333 | "LAX",22,0
334 | "LAX",23,0
335 | "LAX",24,0
336 | "LAX",25,0
337 | "LAX",26,0
338 | "LAX",27,0
339 | "LAX",28,0
340 | "LAX",29,0
341 | "LAX",30,0
342 | "LAX",31,0
343 | "LGA",1,0
344 | "LGA",2,0
345 | "LGA",3,0
346 | "LGA",4,0
347 | "LGA",5,3
348 | "LGA",6,185
349 | "LGA",7,33
350 | "LGA",8,13
351 | "LGA",9,53
352 | "LGA",10,56
353 | "LGA",11,0
354 | "LGA",12,0
355 | "LGA",13,0
356 | "LGA",14,112
357 | "LGA",15,196
358 | "LGA",16,0
359 | "LGA",17,41
360 | "LGA",18,0
361 | "LGA",19,0
362 | "LGA",20,0
363 | "LGA",21,0
364 | "LGA",22,3
365 | "LGA",23,135
366 | "LGA",24,0
367 | "LGA",25,0
368 | "LGA",26,0
369 | "LGA",27,0
370 | "LGA",28,0
371 | "LGA",29,305
372 | "LGA",30,0
373 | "LGA",31,0
374 | "MCO",1,0
375 | "MCO",2,0
376 | "MCO",3,0
377 | "MCO",4,0
378 | "MCO",5,0
379 | "MCO",6,0
380 | "MCO",7,0
381 | "MCO",8,0
382 | "MCO",9,0
383 | "MCO",10,0
384 | "MCO",11,0
385 | "MCO",12,0
386 | "MCO",13,0
387 | "MCO",14,3
388 | "MCO",15,46
389 | "MCO",16,0
390 | "MCO",17,0
391 | "MCO",18,0
392 | "MCO",19,0
393 | "MCO",20,0
394 | "MCO",21,0
395 | "MCO",22,0
396 | "MCO",23,0
397 | "MCO",24,0
398 | "MCO",25,0
399 | "MCO",26,0
400 | "MCO",27,3
401 | "MCO",28,3
402 | "MCO",29,15
403 | "MCO",30,0
404 | "MCO",31,0
405 | "MIA",1,53
406 | "MIA",2,0
407 | "MIA",3,0
408 | "MIA",4,0
409 | "MIA",5,0
410 | "MIA",6,0
411 | "MIA",7,33
412 | "MIA",8,15
413 | "MIA",9,0
414 | "MIA",10,0
415 | "MIA",11,10
416 | "MIA",12,0
417 | "MIA",13,3
418 | "MIA",14,3
419 | "MIA",15,0
420 | "MIA",16,0
421 | "MIA",17,0
422 | "MIA",18,0
423 | "MIA",19,0
424 | "MIA",20,0
425 | "MIA",21,0
426 | "MIA",22,0
427 | "MIA",23,0
428 | "MIA",24,0
429 | "MIA",25,23
430 | "MIA",26,952
431 | "MIA",27,64
432 | "MIA",28,30
433 | "MIA",29,0
434 | "MIA",30,0
435 | "MIA",31,0
436 | "MSP",1,0
437 | "MSP",2,30
438 | "MSP",3,20
439 | "MSP",4,132
440 | "MSP",5,0
441 | "MSP",6,0
442 | "MSP",7,0
443 | "MSP",8,18
444 | "MSP",9,0
445 | "MSP",10,15
446 | "MSP",11,0
447 | "MSP",12,0
448 | "MSP",13,5
449 | "MSP",14,15
450 | "MSP",15,0
451 | "MSP",16,25
452 | "MSP",17,0
453 | "MSP",18,0
454 | "MSP",19,8
455 | "MSP",20,5
456 | "MSP",21,0
457 | "MSP",22,8
458 | "MSP",23,0
459 | "MSP",24,58
460 | "MSP",25,5
461 | "MSP",26,3
462 | "MSP",27,0
463 | "MSP",28,0
464 | "MSP",29,0
465 | "MSP",30,23
466 | "MSP",31,0
467 | "ORD",1,0
468 | "ORD",2,18
469 | "ORD",3,5
470 | "ORD",4,5
471 | "ORD",5,0
472 | "ORD",6,0
473 | "ORD",7,0
474 | "ORD",8,61
475 | "ORD",9,0
476 | "ORD",10,0
477 | "ORD",11,13
478 | "ORD",12,0
479 | "ORD",13,3
480 | "ORD",14,58
481 | "ORD",15,0
482 | "ORD",16,13
483 | "ORD",17,3
484 | "ORD",18,0
485 | "ORD",19,15
486 | "ORD",20,79
487 | "ORD",21,61
488 | "ORD",22,51
489 | "ORD",23,0
490 | "ORD",24,3
491 | "ORD",25,18
492 | "ORD",26,0
493 | "ORD",27,0
494 | "ORD",28,0
495 | "ORD",29,3
496 | "ORD",30,10
497 | "ORD",31,76
498 | "PHL",1,0
499 | "PHL",2,0
500 | "PHL",3,0
501 | "PHL",4,0
502 | "PHL",5,0
503 | "PHL",6,196
504 | "PHL",7,18
505 | "PHL",8,145
506 | "PHL",9,249
507 | "PHL",10,69
508 | "PHL",11,0
509 | "PHL",12,0
510 | "PHL",13,0
511 | "PHL",14,178
512 | "PHL",15,18
513 | "PHL",16,0
514 | "PHL",17,13
515 | "PHL",18,0
516 | "PHL",19,0
517 | "PHL",20,0
518 | "PHL",21,0
519 | "PHL",22,10
520 | "PHL",23,124
521 | "PHL",24,0
522 | "PHL",25,0
523 | "PHL",26,0
524 | "PHL",27,0
525 | "PHL",28,0
526 | "PHL",29,302
527 | "PHL",30,0
528 | "PHL",31,0
529 | "PHX",1,0
530 | "PHX",2,0
531 | "PHX",3,0
532 | "PHX",4,0
533 | "PHX",5,0
534 | "PHX",6,0
535 | "PHX",7,0
536 | "PHX",8,0
537 | "PHX",9,0
538 | "PHX",10,0
539 | "PHX",11,0
540 | "PHX",12,0
541 | "PHX",13,0
542 | "PHX",14,0
543 | "PHX",15,0
544 | "PHX",16,0
545 | "PHX",17,0
546 | "PHX",18,0
547 | "PHX",19,41
548 | "PHX",20,58
549 | "PHX",21,0
550 | "PHX",22,0
551 | "PHX",23,0
552 | "PHX",24,0
553 | "PHX",25,0
554 | "PHX",26,0
555 | "PHX",27,0
556 | "PHX",28,0
557 | "PHX",29,0
558 | "PHX",30,0
559 | "PHX",31,0
560 | "SEA",1,30
561 | "SEA",2,46
562 | "SEA",3,0
563 | "SEA",4,0
564 | "SEA",5,0
565 | "SEA",6,0
566 | "SEA",7,0
567 | "SEA",8,0
568 | "SEA",9,0
569 | "SEA",10,0
570 | "SEA",11,0
571 | "SEA",12,69
572 | "SEA",13,5
573 | "SEA",14,0
574 | "SEA",15,13
575 | "SEA",16,3
576 | "SEA",17,0
577 | "SEA",18,13
578 | "SEA",19,0
579 | "SEA",20,56
580 | "SEA",21,56
581 | "SEA",22,107
582 | "SEA",23,15
583 | "SEA",24,0
584 | "SEA",25,0
585 | "SEA",26,0
586 | "SEA",27,3
587 | "SEA",28,0
588 | "SEA",29,0
589 | "SEA",30,3
590 | "SEA",31,5
591 | "SFO",1,0
592 | "SFO",2,0
593 | "SFO",3,0
594 | "SFO",4,0
595 | "SFO",5,0
596 | "SFO",6,74
597 | "SFO",7,15
598 | "SFO",8,0
599 | "SFO",9,0
600 | "SFO",10,0
601 | "SFO",11,0
602 | "SFO",12,0
603 | "SFO",13,0
604 | "SFO",14,0
605 | "SFO",15,0
606 | "SFO",16,0
607 | "SFO",17,0
608 | "SFO",18,0
609 | "SFO",19,0
610 | "SFO",20,0
611 | "SFO",21,0
612 | "SFO",22,0
613 | "SFO",23,0
614 | "SFO",24,0
615 | "SFO",25,0
616 | "SFO",26,0
617 | "SFO",27,0
618 | "SFO",28,0
619 | "SFO",29,0
620 | "SFO",30,0
621 | "SFO",31,0
622 | 


--------------------------------------------------------------------------------
/1-intro-R/.Rapp.history:
--------------------------------------------------------------------------------
  1 | par(mfrow = c(2,2))
  2 | plot(anscombe$x1, anscombe$y1)
  3 | abline(a1)
  4 | plot(anscombe$x1, anscombe$y1)#
  5 | abline(a1)#
  6 | #
  7 | plot(anscombe$x2, anscombe$y2)#
  8 | abline(a2)#
  9 | #
 10 | plot(anscombe$x3, anscombe$y3)#
 11 | abline(a3)#
 12 | #
 13 | plot(anscombe$x4, anscombe$y4)#
 14 | abline(a4)
 15 | par(mfrow = c(2,2))#
 16 | plot(anscombe$x1, anscombe$y1)#
 17 | abline(a1)#
 18 | #
 19 | plot(anscombe$x2, anscombe$y2)#
 20 | abline(a2)#
 21 | #
 22 | plot(anscombe$x3, anscombe$y3)#
 23 | abline(a3)#
 24 | #
 25 | plot(anscombe$x4, anscombe$y4)#
 26 | abline(a4)
 27 | ggplot(data = anscombe, aes(x = x1, y = y1))
 28 | ggplot(data = anscombe, aes(x = x1, y = y1)) + geom_point
 29 | library(ggplot2)#
 30 | library(maps)#
 31 | library(ggmap)#
 32 | data(anscombe)#
 33 | str(anscombe)
 34 | ggplot(data = anscombe, aes(x = x1, y = y1)) + geom_point
 35 | ggplot(data = anscombe, aes(x = x1, y = y1)) + geom_point()
 36 | ggplot(data = anscombe, aes(x = x1, y = y1)) + geom_line()
 37 | ggplot(data = anscombe, aes(x = x1, y = y1)) + geom_line(color = "blue", size = 3, shape = 17)
 38 | ggplot(data = anscombe, aes(x = x1, y = y1)) + geom_point(color = "blue", size = 3, shape = 17)
 39 | ggplot(data = anscombe, aes(x = x1, y = y1)) + geom_point(color = "blue", size = 3, shape = 17) + ggtitle("Anscombe #1")
 40 | pdf("MyPlot.pdf")
 41 | ggsave()
 42 | anscombe_plot = ggplot(data = anscombe, aes(x = x1, y = y1)) + geom_point(color = "blue", size = 3, shape = 17) + ggtitle("Anscombe #1")
 43 | print(anscombe_plot)
 44 | data(iris)
 45 | str(iris)
 46 | iris_plot = ggplot(data = iris, aes(x = Petal.Length, y = Sepal.Length)) + geom_point(color = "red", size = 3, shape = 16) + ggtitle("Sepal Length vs. Petal Length")
 47 | print(iris_plot)
 48 | iris_plot
 49 | ggplot(data = iris, aes(x = Petal.Length, y = Sepal.Length)) + geom_point(color = "red", size = 3, shape = 16) + ggtitle("Sepal Length vs. Petal Length")
 50 | library(stats)
 51 | library(stats)#
 52 | lm_test <- lm(mpg ~ hp + cyl + wt + gear, data = mtcars)#
 53 | summary(lm_test)
 54 | source("Assignment.R")
 55 | getwd()
 56 | setwd("~/Desktop/OR-software-tools-2015/1-intro-R/")
 57 | source("Assignment.R")
 58 | 3^(6-4)
 59 | 22/7
 60 | 16^(1/4)
 61 | 6*9 ==
 62 | 42
 63 | 6*9 ==
 64 | 54
 65 | sqrt(2)
 66 | abs(-2)
 67 | sin(pi/2)
 68 | cos(0)
 69 | exp(-1)
 70 | (1 - 1/100)^100
 71 | log(exp(1))
 72 | help(log)
 73 | ?log
 74 | x <- 2^3
 75 | y = 6
 76 | x
 77 | y
 78 | print(x)
 79 | print(y)
 80 | ls()
 81 | z <- seq(1:10)
 82 | z <- 1:10 #this also works
 83 | z
 84 | z[5]
 85 | sum(z)
 86 | double_z <- z^2
 87 | double_z
 88 | airports = c("BOS", "JFK", "ORD", "SFO", "ATL")
 89 | capacities = c(20, 45, 50, 35, 55)
 90 | cbind(airports, capacities)
 91 | df1 = data.frame(airports, capacities)
 92 | df1
 93 | class(airports)
 94 | str(airports)
 95 | class(capacities)
 96 | str(capacities)
 97 | class(df.runways)
 98 | str(df.runways)
 99 | df.runways = rbind(df1, df2)
100 | df1 = data.frame(airports, capacities)
101 | capacities = c(3, 2, 5, 1, 3)
102 | df2 = data.frame(airports, capacities)
103 | airports = c("BOS", "JFK", "ORD", "SFO", "ATL")#
104 | capacities = c(20, 45, 50, 35, 55)#
105 | #
106 | # Place vectors together as a matrix using bind#
107 | #
108 | # bind together as columns#
109 | cbind(airports, capacities)#
110 | #
111 | # bind together as rows#
112 | rbind(airports, capacities)#
113 | #
114 | # Create a data frame#
115 | df1 = data.frame(airports, capacities)#
116 | #
117 | # Add additional runways#
118 | capacities = c(3, 2, 5, 1, 3)#
119 | #
120 | # Create another data frame#
121 | df2 = data.frame(airports, capacities)#
122 | #
123 | # Append rows of the second data frame to those of the first#
124 | df.runways = rbind(df1, df2)
125 | class(df.runways)
126 | str(df.runways)
127 | df.runways
128 | df.runways$locations
129 | df.runways$airports
130 | summary(df.runways)
131 | summary(df.runways$airports)
132 | df.runways$airports
133 | summary(df.runways)
134 | summary(df.runways$airports)
135 | runwaysBOS = subset(df.runways, locations=="BOS")
136 | runwaysBOS = subset(df.runways, airports=="BOS")
137 | runwaysBOS
138 | runwaysBOS = df.runways[c(1,6), ]
139 | str(runwaysBOS)
140 | runwaysBOS$airports = factor(runwaysBOS$airports)
141 | str(runwaysBOS)
142 | sum(runwaysBOS$capacities)
143 | airports = c("BOS", "JFK", "ORD", "SFO", "ATL")
144 | capacities = c(20, 45, 50, 35, 55)
145 | cbind(airports, capacities)
146 | rbind(airports, capacities)
147 | df1 = data.frame(airports, capacities)
148 | capacities = c(3, 2, 5, 1, 3)
149 | df2 = data.frame(airports, capacities)
150 | df.runways = rbind(df1, df2)
151 | class(airports)
152 | str(airports)
153 | class(capacities)
154 | str(capacities)
155 | class(df.runways)
156 | str(df.runways)
157 | df.runways
158 | df.runways$airports
159 | summary(df.runways)
160 | summary(df.runways$airports)
161 | runwaysBOS = subset(df.runways, airports=="BOS")
162 | runwaysBOS
163 | runwaysBOS = df.runways[c(1,6), ]
164 | str(runwaysBOS)
165 | runwaysBOS$airports = factor(runwaysBOS$airports)
166 | str(runwaysBOS)
167 | sum(runwaysBOS$capacities)
168 | CEOcomp = read.csv(file = "CEOcomp.csv", header = TRUE)
169 | str(CEOcomp)
170 | names(CEOcomp)
171 | CEOcomp$Years
172 | CEOcomp$MBA
173 | attach(CEOcomp)
174 | Years
175 | MBA
176 | detach(CEOcomp)
177 | mean(CEOcomp$Years)
178 | sd(CEOcomp$Years)
179 | summary(CEOcomp$Years)
180 | plot(CEOcomp$Years, CEOcomp$TotalCompensation)
181 | plot(CEOcomp$Years, CEOcomp$TotalCompensation, main="Total Compensation by Year", xlab = "Years of Experience", ylab = "Total Compensation (thousand USD)")
182 | plot(CEOcomp$Years, CEOcomp$TotalCompensation)
183 | plot(CEOcomp$Years, CEOcomp$TotalCompensation, main="Total Compensation by Year", xlab = "Years of Experience", ylab = "Total Compensation (thousand USD)")
184 | tapply(CEOcomp$TotalCompensation, CEOcomp$MBA, mean)
185 | table(CEOcomp$Year, CEOcomp$MBA)
186 | CEOmissing = read.csv("CEOmissing.csv")
187 | summary(CEOmissing)
188 | str(CEOmissing)
189 | 5 == NA
190 | NA == NA
191 | is.na(5)
192 | is.na(NA)
193 | CEOnomissing = subset(CEOmissing, !is.na(TotalCompensation) & !is.na(Years) & !is.na(ChangeStockPrice) & !is.na(ChangeCompanySales) & !is.na(MBA))
194 | summary(CEOnomissing)
195 | str(CEOnomissing)
196 | CEOomitmissing = na.omit(CEOmissing)
197 | summary(CEOomitmissing)
198 | str(CEOomitmissing)
199 | save.image("eg.RData")
200 | save(CEOcomp, file = "CEOcomp.RData")
201 | ?seq
202 | seq(from = 2, to = 20, by = 2)
203 | seq(2, 20, 2)
204 | 2*(1:10)
205 | hist(CEOcomp$Years)
206 | hist(CEOcomp$Years, main = "Years of Experience", xlab= "Years", ylab = "freq")
207 | otp = read.csv("~/Desktop/otp.csv")
208 | str(otp)
209 | summary(otp$Origin)
210 | summary(otp$Origin)
211 | summary(otp$Origin)[1:10]
212 | names(summary(otp$Origin))[1:10]
213 | names(summary(otp$Dest))[1:10]
214 | topten = names(summary(otp$Dest))[1:10]
215 | truncated = subset(otp, is.element(otp$Dest, topten) & is.element(otp$Origin, topten))
216 | table(truncated$Origin, truncated$Dest)
217 | truncated$Origin = factor(truncated$Origin)
218 | truncated$Dest = factor(truncated$Dest)
219 | table(truncated$Origin, truncated$Dest)
220 | LB = read.csv("LettersBinary.csv")
221 | CEOcomp = read.csv(file = "CEOcomp.csv", header = TRUE)
222 | CEO.linReg <- lm(TotalCompensation ~ Years + ChangeStockPrice + ChangeCompanySales + MBA, data = CEOcomp)
223 | summary(CEO.linReg)
224 | CEO.linReg$coefficients
225 | CEO.linReg$residuals
226 | confint(CEO.linReg, level = 0.95)
227 | cor(CEOcomp$TotalCompensation, CEOcomp$Years)
228 | cor(CEOcomp)
229 | cor.test(CEOcomp$TotalCompensation, CEOcomp$Years)
230 | TitanicPassengers = read.csv("TitanicPassengers.csv")
231 | str(TitanicPassengers)
232 | library(caTools)
233 | split <- sample.split(TitanicPassengers$Survived, SplitRatio = 0.6)
234 | split
235 | TitanicTrain <- TitanicPassengers[split, ]
236 | TitanicTest <- TitanicPassengers[!split, ]
237 | Titanic.logReg = glm(Survived ~ Class + Age + Sex, data = TitanicTrain, family = binomial)
238 | summary(Titanic.logReg)
239 | Titanic.logPred = predict(Titanic.logReg, type = "response")
240 | split = sample.split(LB, SplitRatio = 0.6)
241 | LB.train = LB[split, ]
242 | LB.test = LB[!split, ]
243 | str(LB.train)
244 | str(LB.test)
245 | letters.formula <- formula(Letter ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15 + x16)
246 | log.LB = glm(letters.formula, data = LB.train, family = binomial)
247 | predict(log.LB, newdata = LB.test, type = "response")
248 | str(LB.test)
249 | table(LB.test$Letter, round(predict(log.LB, newdata = LB.test, type= "response")))
250 | library(rpart)
251 | Titanic.CART = rpart(Survived ~ Class + Age + Sex, data = TitanicTrain, method = "class", control = rpart.control(minbucket = 10))
252 | Titanic.CART = rpart(Survived ~ Class + Age + Sex, data = TitanicTrain, method = "class", control = rpart.control(minbucket = 10))
253 | plot(Titanic.CART)
254 | text(Titanic.CART, pretty = 0)
255 | Titanic.CARTpredTest = predict(Titanic.CART, newdata = TitanicTest, type = "class")
256 | CARTpredTable <- table(TitanicTest$Survived, Titanic.CARTpredTest)
257 | CARTpredTable
258 | sum(diag(CARTpredTable))/nrow(TitanicTest)
259 | CEOcomp.CART = rpart(TotalCompensation ~ Years + ChangeStockPrice + ChangeCompanySales + MBA, data = CEOcomp, method = "anova", control = rpart.control(minsplit = 5))
260 | predict(CEOcomp.CART)
261 | CEOcomp$TotalCompensation
262 | library(randomForest)
263 | install.packages("randomForest")
264 | library(randomForest)
265 | Titanic.forest = randomForest(Survived ~ Class + Age + Sex, data = TitanicTrain, nodesize = 10, ntree = 200)
266 | str(TitanicTrain$Survived)
267 | TitanicTrain$Survived <- factor(TitanicTrain$Survived)
268 | TitanicTest$Survived <- factor(TitanicTest$Survived)
269 | Titanic.forest = randomForest(Survived ~ Class + Age + Sex, data = TitanicTrain, nodesize = 10, ntree = 200)
270 | forest.table <- table(TitanicTest$Survived, Titanic.forestPred)
271 | Titanic.forest = randomForest(Survived ~ Class + Age + Sex, data = TitanicTrain, nodesize = 10, ntree = 200)
272 | Titanic.forestPred = predict(Titanic.forest, newdata = TitanicTest)
273 | forest.table <- table(TitanicTest$Survived, Titanic.forestPred)
274 | forest.table
275 | sum(diag(forest.table))/nrow(TitanicTest)
276 | ?randomForest
277 | data()
278 | data(iris)
279 | str(iris)
280 | IrisDist = dist(iris[1:4], method = "euclidean")
281 | IrisHC = hclust(IrisDist, method = "ward")
282 | IrisHC = hclust(IrisDist, method = "ward.D")
283 | plot(IrisHC)
284 | IrisDist
285 | rect.hclust(IrisHC, k = 3, border = "red")
286 | plot(IrisHC)
287 | rect.hclust(IrisHC, k = 3, border = "red")
288 | IrisHCGroups = cutree(IrisHC, k = 3)
289 | table(iris$Species, IrisHCGroups)
290 | tapply(iris$Petal.Length, IrisHCGroups, mean)
291 | IrisKMC = kmeans(iris[1:4], centers = 3, iter.max = 100)
292 | str(IrisKMC)
293 | IrisKMCGroups = IrisKMC$cluster
294 | table(iris$Species, IrisKMCGroups)
295 | IrisKMC = kmeans(iris[1:4], centers = 3, iter.max = 10000)
296 | IrisKMCGroups = IrisKMC$cluster
297 | table(iris$Species, IrisKMCGroups)
298 | IrisKMC$centers
299 | 


--------------------------------------------------------------------------------
/1-intro-R/1-1.R:
--------------------------------------------------------------------------------
  1 | # IAP 2015
  2 | # 15.S60 Software Tools for Operations Research
  3 | # Lecture 1: Introduction to R
  4 | 
  5 | # Script file 1-1.R
  6 | # In this script file, we cover the basics of using R.
  7 | 
  8 | ###################################################
  9 | ## RUNNING R AT THE COMMAND LINE, SCRIPTING, AND ##
 10 | ## SETTING THE WORKING DIRECTORY                 ##
 11 | ###################################################
 12 | 
 13 | # Using the R console (command line):
 14 | # - You can type directly into the R console (at '>') and 
 15 | #   execute by pressing Enter
 16 | # - Previous lines can be accessed using the up and down arrows
 17 | # - Tabs can be used for auto-completion
 18 | # - Incomplete commands will be further prompted by '+'
 19 | 
 20 | # Using R scripts in conjunction with the console:
 21 | # - We are currently in a script ("1-1.R")
 22 | # - Individual lines (or multiple) in this script can be executed 
 23 | #   by placing the cursor on the line (or selecting) and typing 
 24 | #   Ctrl + r on PC or Cmd + Enter on Mac
 25 | # - An entire script file can also be run by Edit -> Run All on PC 
 26 | #   or Edit -> Source on Mac or typing the following:
 27 | source("Assignment.R")
 28 | 
 29 | # Oops! We need to set our working directory.  
 30 | # Check the current directory (also in the upper part of console)
 31 | getwd()
 32 | 
 33 | # Set your directory path here! Where did you save the folder?
 34 | setwd("~/Desktop/")
 35 | 
 36 | # Alternatively, you can do File -> Change dir... on PC or 
 37 | # Misc -> Change Working Directory... on Mac 
 38 | 
 39 | ################################################
 40 | ## BASICS: CALCULATIONS, FUNCTIONS, VARIABLES ##
 41 | ################################################
 42 | 
 43 | # You can use R as a calculator.  E.g.:
 44 | 3^(6-4)
 45 | 22/7
 46 | 16^(1/4)
 47 | 
 48 | 6*9 == 
 49 | 
 50 | # What happened with that last one? Check the R console!
 51 | # Let's see if it's equal to 42...
 52 | 
 53 | # Use the arrow keys to recall the command and check to see
 54 | # if 54 will give you the answer you expect.
 55 | 
 56 | # Other useful functions:
 57 | sqrt(2)
 58 | abs(-2)
 59 | 
 60 | sin(pi/2)
 61 | cos(0)
 62 | 
 63 | exp(-1)
 64 | (1 - 1/100)^100
 65 | 
 66 | log(exp(1))
 67 | 
 68 | # The help function can explain certain functions
 69 | # What if we forgot if log was base 10 or natural log?
 70 | help(log)
 71 | ?log
 72 | 
 73 | # You can save values, calculations, or function outputs to variables
 74 | # with either <- or = 
 75 | x <- 2^3
 76 | y = 6
 77 | 
 78 | # Use just the variable name to display the output
 79 | x
 80 | y
 81 | 
 82 | # Note! If you run a script using source(""), output will be 
 83 | # suppressed, unless you use the print function
 84 | print(x)
 85 | print(y)
 86 | 
 87 | # Rules for variable names 
 88 | # - Can include letters, numbers
 89 | # - Can have periods, underscores
 90 | # - CANNOT begin with a number
 91 | # - Case-sensitive
 92 | # - CANNOT use spaces
 93 | 
 94 | # Use the ls() function to see what variables are available
 95 | ls()
 96 | 
 97 | ########################################
 98 | ## VECTORS, MATRICES, AND DATA FRAMES ##
 99 | ########################################
100 | 
101 | # Create a vector of numbers from 1 through 10, access an index,
102 | # and sum all of them
103 | z <- seq(1:10)
104 | z <- 1:10 #this also works
105 | z[5]
106 | sum(z)
107 | double_z <- z^2
108 | 
109 | # Create vectors of airports and capacities
110 | airports = c("BOS", "JFK", "ORD", "SFO", "ATL")
111 | capacities = c(20, 45, 50, 35, 55)
112 | 
113 | # Place vectors together as a matrix using bind
114 | 
115 | # bind together as columns
116 | cbind(airports, capacities)
117 | 
118 | # bind together as rows
119 | rbind(airports, capacities)
120 | 
121 | # Create a data frame
122 | df1 = data.frame(airports, capacities)
123 | 
124 | # Add additional runways
125 | capacities = c(3, 2, 5, 1, 3)
126 | 
127 | # Create another data frame
128 | df2 = data.frame(airports, capacities)
129 | 
130 | # Append rows of the second data frame to those of the first
131 | df.runways = rbind(df1, df2)
132 | 
133 | # Check out the class and structure of various variables
134 | class(airports)
135 | str(airports)
136 | 
137 | class(capacities)
138 | str(capacities)
139 | 
140 | class(df.runways)
141 | str(df.runways)
142 | # Notice that there are 5 different values for airports.  These 
143 | # fall under different "categories" or "factors"
144 | 
145 | df.runways
146 | 
147 | # Use data.frame$col to extract the column col from a data frame
148 | df.runways$airports
149 | 
150 | # The summary function can often give you useful information 
151 | summary(df.runways)
152 | summary(df.runways$airports)
153 | 
154 | # Use the subset function to extract rows of interest from 
155 | # a data frame (first argument is the data frame, second
156 | # argument is the criterion on which to select)
157 | runwaysBOS = subset(df.runways, airports=="BOS")
158 | runwaysBOS
159 | 
160 | # Alternatively, since we know that rows 1 and 6 correspond
161 | # to BOS, we can extract runwaysBOS from df.runways as follows:
162 | runwaysBOS = df.runways[c(1,6), ]
163 | 
164 | str(runwaysBOS)
165 | # Notice that even though we used subset and runwaysBOS only
166 | # has one factor level for the airports column, the str function
167 | # still tells us that there are 5 levels.  We can fix this using the
168 | # factor function.
169 | 
170 | runwaysBOS$airports = factor(runwaysBOS$airports)
171 | str(runwaysBOS)
172 | 
173 | # Find the total runway capacity in Boston
174 | sum(runwaysBOS$capacities)
175 | 
176 | ############################
177 | ## WORKING WITH CSV FILES ##
178 | ############################
179 | 
180 | # Load csv files using read.csv
181 | # header = TRUE is usually ASSUMED, so not strictly necessary
182 | CEOcomp = read.csv(file = "CEOcomp.csv", header = TRUE)
183 | 
184 | # Use str to look at variable names
185 | str(CEOcomp)
186 | 
187 | # Use names() to extract column names
188 | names(CEOcomp)
189 | 
190 | # Use the $ command to look at specific variables
191 | CEOcomp$Years
192 | CEOcomp$MBA
193 | 
194 | # If you only have one dataset, you can attach the name of the
195 | # data frame.  This isn't generally recommended practice, though!
196 | attach(CEOcomp)
197 | Years
198 | MBA
199 | detach(CEOcomp)
200 | 
201 | ####################################################
202 | ## BASIC STATISTICS, PLOTTING, AND SUMMARY TABLES ##
203 | ####################################################
204 | 
205 | # Calculate the mean, standard deviation, and other statistics
206 | mean(CEOcomp$Years)
207 | sd(CEOcomp$Years)
208 | summary(CEOcomp$Years)
209 | 
210 | # Plot compensation versus years of experience
211 | plot(CEOcomp$Years, CEOcomp$TotalCompensation)
212 | 
213 | # Plot with a title, x- and y-axis labels
214 | plot(CEOcomp$Years, CEOcomp$TotalCompensation, main="Total Compensation by Year", xlab = "Years of Experience", ylab = "Total Compensation (thousand USD)")
215 | 
216 | # For other plots and information about the graphics package
217 | library(help = "graphics")
218 | 
219 | # Create a table to summarize the data
220 | # Here, we look at mean CEO compensation, based on whether or not 
221 | # the CEO has an MBA
222 | tapply(CEOcomp$TotalCompensation, CEOcomp$MBA, mean)
223 | 
224 | # We can also create a table to look at counts 
225 | table(CEOcomp$Year, CEOcomp$MBA)
226 | 
227 | # In our dataset, how many CEOs have 7 years of experience and 
228 | # an MBA?
229 | 
230 | ###############################
231 | ## DEALING WITH MISSING DATA ##
232 | ###############################
233 | 
234 | # Often in real datasets we encounter missing data.  For instance,
235 | # in a survey, not all respondents might answer all questions.  Here,
236 | # we will just remove any rows with any missing data (e.g., removing
237 | # respondents who did not answer all questions).  More sophisticated
238 | # methods for dealing with missing data exist, but we will not go
239 | # into detail here.
240 | 
241 | # Load the CEOmissing dataset. This is just the previous dataset
242 | # with some entries missing.
243 | CEOmissing = read.csv("CEOmissing.csv")
244 | 
245 | # Use the summary function to see how much missing data there is.
246 | summary(CEOmissing)
247 | str(CEOmissing)
248 | 
249 | # Let's remove all of the rows where there is an entry missing. (The entry is NA)
250 | # First note that we cannot use '==' to check if an element is an NA
251 | 5 == NA
252 | NA == NA
253 | 
254 | # Instead, we use the is.na() function.
255 | is.na(5)
256 | is.na(NA)
257 | 
258 | # Now let's only select rows where all of the data is present
259 | CEOnomissing = subset(CEOmissing, !is.na(TotalCompensation) & !is.na(Years) & !is.na(ChangeStockPrice) & !is.na(ChangeCompanySales) & !is.na(MBA))
260 | summary(CEOnomissing)
261 | str(CEOnomissing)
262 | 
263 | # Alternatively, we could use the na.omit function
264 | CEOomitmissing = na.omit(CEOmissing)
265 | summary(CEOomitmissing)
266 | str(CEOomitmissing)
267 | 
268 | ################################
269 | ## UNDERSTANDING R WORKSPACES ##
270 | ################################
271 | 
272 | # You may save an entire workspace, including variables using the
273 | # following command (alternatively, you can use the Workspace tab
274 | # in the menu bar):
275 | save.image("eg.RData")
276 | 
277 | # To load, you can run the following:
278 | load("eg.RData")
279 | 
280 | # You should save the image if you are working on a large project
281 | # and are taking a pause from working on it.  This way, when you
282 | # come back to R, you can just load the workspace and continue
283 | # as before
284 | 
285 | # You can also save individual variables as follows:
286 | save(CEOcomp, file = "CEOcomp.RData")
287 | 
288 | # This is useful when the variable is given the result of 
289 | # a computation that takes a lot of time (e.g., loading
290 | # very large data sets, result of running multiple SVMs, etc.)
291 | 
292 | #################
293 | ## ASSIGNMENTS ##
294 | #################
295 | 
296 | # 1a) Use the help function on seq to assign the variable 'evens'
297 | #     to be the even numbers from 2 through 20, inclusive.
298 | 
299 | 
300 | 
301 | #  b) Propose an alternative way to get 'evens' to be the even 
302 | #     numbers from 2 through 20, inclusive, with perhaps more
303 | #     than one command.  Write down the commands.
304 | 
305 | 
306 | 
307 | 
308 | 
309 | ##
310 | # 2a) Try out a few other basic statistics and graphing functions
311 | 
312 | min(CEOcomp$Years)
313 | median(CEOcomp$Years)
314 | max(CEOcomp$Years)
315 | 
316 | sum(CEOcomp$MBA)
317 | 
318 | hist(CEOcomp$Years)
319 | boxplot(CEOcomp$Years)
320 | 
321 | #  b) Edit the histogram plot above to ensure that it has a title
322 | #     and that the x-axis is labeled properly
323 | 
324 | ##
325 | # 3) Use the tapply() function on df.runways to obtain a table
326 | #    detailing the total capacity at each airport (Hint: use the sum() function)
327 | 
328 | 
329 | ##
330 | # 4a) Load the on-time performance dataset "otp.csv"
331 | 
332 | 
333 | 
334 | 
335 | #  b) Take a look at the structure of the on-time performance dataset. This 
336 | #     dataset gives the on-time performance of airplanes in September of 2014.
337 | 
338 | 
339 | 
340 | 
341 | #  c) Find the airport with the most departing flights during this time period. 
342 | #     (Use the Origin column)
343 | 
344 | 
345 | 
346 | #  d**) Determine the ten airports that have the highest number of departing
347 | #     and arriving flights.  Use the "Origin" and "Dest" columns.  Create a table
348 | # 	  that contains the number of flights between these top ten airports.
349 | #     (Hint: some of the following functions might be useful -- 
350 | #     summary, table, subset, factor, names, is.element, sort)
351 | 
352 | 
353 | 
354 | 
355 | 
356 | 
357 | 
358 | 
359 | 
360 | 
361 | 
362 | 
363 | 
364 | 


--------------------------------------------------------------------------------
/6-nonlinear-opt/IJulia intro.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language": "Julia",
  4 |   "name": "",
  5 |   "signature": "sha256:c37017d552407ab6a927b8378934d1402d8da140a5d659517c98c5f30f8dd1a5"
  6 |  },
  7 |  "nbformat": 3,
  8 |  "nbformat_minor": 0,
  9 |  "worksheets": [
 10 |   {
 11 |    "cells": [
 12 |     {
 13 |      "cell_type": "markdown",
 14 |      "metadata": {},
 15 |      "source": [
 16 |       "# Introduction to IJulia"
 17 |      ]
 18 |     },
 19 |     {
 20 |      "cell_type": "markdown",
 21 |      "metadata": {},
 22 |      "source": [
 23 |       "## Navigating IJulia notebooks\n",
 24 |       "_Click `Help -> User Interface Tour`_\n",
 25 |       "\n",
 26 |       "**Think of the notebook as a document that can interact with your computer.** The document relies only on a modern browser for rendering. When you connect the document to a Julia kernel and terminal instance on a computer, however, the document can send any command to the computer and show any output (text or graphics). \n",
 27 |       "\n",
 28 |       "* Each notebook is composed of cells\n",
 29 |       "* Two modes:\n",
 30 |       "    * Command Mode for creating or deleting cells, saving or renaming the notebook, and other application-level functions\n",
 31 |       "    * Edit Mode for manipulating text in individual cells\n",
 32 |       "* Create a cell by:\n",
 33 |       "    * Clicking `Insert -> Insert Cell`\n",
 34 |       "    * Pressing `a` or `b` in Command Mode\n",
 35 |       "    * Pressing `Alt+Enter` in Edit Mode\n",
 36 |       "* Delete a cell by:\n",
 37 |       "    * Clicking `Edit -> Delete Cell`\n",
 38 |       "    * Pressing `dd`\n",
 39 |       "* Execute a cell by:\n",
 40 |       "    * Clicking `Cell -> Run`\n",
 41 |       "    * Pressing `Ctrl+Enter`\n",
 42 |       "\n",
 43 |       "Other functions:\n",
 44 |       "* Undo last text edit with `Ctrl+z` in Edit Mode\n",
 45 |       "* Undo last cell manipulation with `z` in Command Mode\n",
 46 |       "* Save notebook with `Ctrl+s` in Edit Mode\n",
 47 |       "* Save notebook with `s` in Command Mode\n",
 48 |       "\n",
 49 |       "Though notebooks rely on your browser to work, they do not require an internet connection. The only online tool that is consistently used is MathJax (for math rendering).\n",
 50 |       "\n"
 51 |      ]
 52 |     },
 53 |     {
 54 |      "cell_type": "markdown",
 55 |      "metadata": {},
 56 |      "source": [
 57 |       "### Get comfortable with the notebook\n",
 58 |       "Notebooks are designed to not be fragile. If you try to close a notebook with unsaved changes, the browser will warn you.\n",
 59 |       "\n",
 60 |       "Try the following exercises:"
 61 |      ]
 62 |     },
 63 |     {
 64 |      "cell_type": "markdown",
 65 |      "metadata": {},
 66 |      "source": [
 67 |       ">**\\[Exercise\\]**: Close/open\n",
 68 |       "\n",
 69 |       ">1. Save the notebook\n",
 70 |       ">2. Copy the address\n",
 71 |       ">3. Close the tab\n",
 72 |       ">4. Paste the address into a new tab (or re-open the last closed tab with `Ctrl+Shift+T` on Chrome)\n",
 73 |       "\n",
 74 |       ">_The document is still there, and the Julia kernel is still alive! Nothing is lost._"
 75 |      ]
 76 |     },
 77 |     {
 78 |      "cell_type": "markdown",
 79 |      "metadata": {},
 80 |      "source": [
 81 |       ">**\\[Exercise\\]**: Zoom\n",
 82 |       "\n",
 83 |       ">Try changing the magnification of the web page (`Ctrl+, Ctrl-` on Chrome).\n",
 84 |       "\n",
 85 |       ">_Text and math scale well (so do graphics if you use an SVG or PDF backend)._"
 86 |      ]
 87 |     },
 88 |     {
 89 |      "cell_type": "markdown",
 90 |      "metadata": {},
 91 |      "source": [
 92 |       ">**\\[Exercise\\]**: MathJax\n",
 93 |       ">1. Create a new cell.\n",
 94 |       ">2. Type an opening \\$, your favorite mathematical expression, and a closing \\$.\n",
 95 |       ">3. Run the cell to render the $\\LaTeX$ expression.\n",
 96 |       ">4. Right-click the rendered expression."
 97 |      ]
 98 |     },
 99 |     {
100 |      "cell_type": "markdown",
101 |      "metadata": {},
102 |      "source": [
103 |       "## Navigating Julia"
104 |      ]
105 |     },
106 |     {
107 |      "cell_type": "markdown",
108 |      "metadata": {},
109 |      "source": [
110 |       "Use the ``?name`` syntax to access the documentation for Julia functions"
111 |      ]
112 |     },
113 |     {
114 |      "cell_type": "code",
115 |      "collapsed": false,
116 |      "input": [
117 |       "?print"
118 |      ],
119 |      "language": "python",
120 |      "metadata": {},
121 |      "outputs": []
122 |     },
123 |     {
124 |      "cell_type": "code",
125 |      "collapsed": false,
126 |      "input": [
127 |       "?sum"
128 |      ],
129 |      "language": "python",
130 |      "metadata": {},
131 |      "outputs": []
132 |     },
133 |     {
134 |      "cell_type": "markdown",
135 |      "metadata": {},
136 |      "source": [
137 |       "The ``methods`` function lists all of the different implementations of a function depending on the input types.\n",
138 |       "Click on the link to see the Julia source code."
139 |      ]
140 |     },
141 |     {
142 |      "cell_type": "code",
143 |      "collapsed": false,
144 |      "input": [
145 |       "methods(lufact)"
146 |      ],
147 |      "language": "python",
148 |      "metadata": {},
149 |      "outputs": []
150 |     },
151 |     {
152 |      "cell_type": "markdown",
153 |      "metadata": {},
154 |      "source": [
155 |       "The ``methodswith`` function lists all of the different functions which may be applied to a given type."
156 |      ]
157 |     },
158 |     {
159 |      "cell_type": "code",
160 |      "collapsed": false,
161 |      "input": [
162 |       "methodswith(Complex)"
163 |      ],
164 |      "language": "python",
165 |      "metadata": {},
166 |      "outputs": []
167 |     },
168 |     {
169 |      "cell_type": "markdown",
170 |      "metadata": {},
171 |      "source": [
172 |       "Use tab completion to search for function names.\n",
173 |       "Try ``eig<TAB>`` for eigenvalues, ``read<TAB>`` for file input"
174 |      ]
175 |     },
176 |     {
177 |      "cell_type": "code",
178 |      "collapsed": false,
179 |      "input": [
180 |       "eig"
181 |      ],
182 |      "language": "python",
183 |      "metadata": {},
184 |      "outputs": []
185 |     },
186 |     {
187 |      "cell_type": "code",
188 |      "collapsed": false,
189 |      "input": [
190 |       "read"
191 |      ],
192 |      "language": "python",
193 |      "metadata": {},
194 |      "outputs": []
195 |     },
196 |     {
197 |      "cell_type": "markdown",
198 |      "metadata": {},
199 |      "source": [
200 |       "## Plotting\n",
201 |       "There are several Julia plotting packages. \n",
202 |       "\n",
203 |       "* [PyPlot.jl][4] is a Julia interface to Matplotlib, and should feel familiar to both MATLAB and Python users.\n",
204 |       "* [Winston][3] and [Gadfly][1] are written entirely in Julia.  Winston is for general-purpose 2D plotting, and Gadfly (inspired by ggplot2) concentrates on statistical graphics.\n",
205 |       "* [Plotly supports Julia][2].\n",
206 |       "\n",
207 |       "[1]: https://github.com/dcjones/Gadfly.jl\n",
208 |       "[2]: https://plot.ly/julia/\n",
209 |       "[3]: https://github.com/nolta/Winston.jl\n",
210 |       "[4]: https://github.com/stevengj/PyPlot.jl"
211 |      ]
212 |     },
213 |     {
214 |      "cell_type": "code",
215 |      "collapsed": false,
216 |      "input": [
217 |       "using PyPlot\n",
218 |       "\n",
219 |       "# Example from PyPlot documentation:\n",
220 |       "x = linspace(0,2*pi,1000)\n",
221 |       "y = sin(3*x + 4*cos(2*x))\n",
222 |       "plot(x, y,  color=\"red\", \n",
223 |       "            linewidth=2.0, \n",
224 |       "            linestyle=\"--\")\n",
225 |       "title(\"A sinusoidally modulated sinusoid\");"
226 |      ],
227 |      "language": "python",
228 |      "metadata": {},
229 |      "outputs": []
230 |     },
231 |     {
232 |      "cell_type": "markdown",
233 |      "metadata": {},
234 |      "source": [
235 |       "## Interactivity\n",
236 |       "\n",
237 |       "The [Interact](https://github.com/JuliaLang/Interact.jl) package enables interactivity in IJulia through the ``@manipulate`` macro."
238 |      ]
239 |     },
240 |     {
241 |      "cell_type": "code",
242 |      "collapsed": false,
243 |      "input": [
244 |       "using Interact\n",
245 |       "@manipulate for x in 0:0.01:\u03c0\n",
246 |       "    sin(x)\n",
247 |       "end"
248 |      ],
249 |      "language": "python",
250 |      "metadata": {},
251 |      "outputs": []
252 |     },
253 |     {
254 |      "cell_type": "markdown",
255 |      "metadata": {},
256 |      "source": [
257 |       "You can have multiple manipulators with continuous or discrete choices:"
258 |      ]
259 |     },
260 |     {
261 |      "cell_type": "code",
262 |      "collapsed": false,
263 |      "input": [
264 |       "@manipulate for x in 0:0.01:\u03c0, f in [:sin, :cos]\n",
265 |       "    if f == :sin\n",
266 |       "        sin(x)\n",
267 |       "    else\n",
268 |       "        cos(x)\n",
269 |       "    end\n",
270 |       "end"
271 |      ],
272 |      "language": "python",
273 |      "metadata": {},
274 |      "outputs": []
275 |     },
276 |     {
277 |      "cell_type": "markdown",
278 |      "metadata": {},
279 |      "source": [
280 |       "**Note**: only the final value is updated"
281 |      ]
282 |     },
283 |     {
284 |      "cell_type": "code",
285 |      "collapsed": false,
286 |      "input": [
287 |       "@manipulate for x in 0:0.01:\u03c0\n",
288 |       "    println(\"My input was $x\")\n",
289 |       "    sin(x)\n",
290 |       "end"
291 |      ],
292 |      "language": "python",
293 |      "metadata": {},
294 |      "outputs": []
295 |     },
296 |     {
297 |      "cell_type": "markdown",
298 |      "metadata": {},
299 |      "source": [
300 |       "You can embed a plot inside ``@manipulate`` for interactive visualizations."
301 |      ]
302 |     },
303 |     {
304 |      "cell_type": "code",
305 |      "collapsed": false,
306 |      "input": [
307 |       "f = figure()\n",
308 |       "@manipulate for z in 0:0.01:1; withfig(f) do\n",
309 |       "        x = linspace(0,2\u03c0,1000)\n",
310 |       "        y = z*sin(x)\n",
311 |       "        ylim(-1,1)\n",
312 |       "        xlim(0,2\u03c0)\n",
313 |       "        plot(x, y,  color=\"blue\", \n",
314 |       "            linewidth=2.0, \n",
315 |       "            linestyle=\"-\")\n",
316 |       "    end\n",
317 |       "end"
318 |      ],
319 |      "language": "python",
320 |      "metadata": {},
321 |      "outputs": []
322 |     },
323 |     {
324 |      "cell_type": "markdown",
325 |      "metadata": {},
326 |      "source": [
327 |       "Here's the same using the ``Gadfly`` package instead of ``PyPlot``."
328 |      ]
329 |     },
330 |     {
331 |      "cell_type": "code",
332 |      "collapsed": false,
333 |      "input": [
334 |       "using Gadfly"
335 |      ],
336 |      "language": "python",
337 |      "metadata": {},
338 |      "outputs": []
339 |     },
340 |     {
341 |      "cell_type": "code",
342 |      "collapsed": false,
343 |      "input": [
344 |       "@manipulate for z in 0:0.01:1\n",
345 |       "    x = linspace(0,2\u03c0,1000)\n",
346 |       "    y = z*sin(x)\n",
347 |       "    Gadfly.plot(x=x,y=y, Geom.line, Scale.y_continuous(minvalue=-1, maxvalue=1), Scale.x_continuous(minvalue=0, maxvalue=2\u03c0))\n",
348 |       "end"
349 |      ],
350 |      "language": "python",
351 |      "metadata": {},
352 |      "outputs": []
353 |     },
354 |     {
355 |      "cell_type": "markdown",
356 |      "metadata": {},
357 |      "source": [
358 |       ">**\\[Exercise\\]**: Gaussian density\n",
359 |       "\n",
360 |       "> Plot the Gaussian density $\\frac{1}{\\sigma \\sqrt{2\\pi} } e^{ -\\frac{(x-\\mu)^2}{2\\sigma^2} }$ with manipulators for both the mean $\\mu$ and standard deviation $\\sigma$\n"
361 |      ]
362 |     },
363 |     {
364 |      "cell_type": "markdown",
365 |      "metadata": {},
366 |      "source": [
367 |       "## Sharing notebooks"
368 |      ]
369 |     },
370 |     {
371 |      "cell_type": "markdown",
372 |      "metadata": {},
373 |      "source": [
374 |       "Notebooks are self contained and standalone (unless they explicitly ``include()`` other Julia code). You can email them to friends, professors, and even post them online in viewable read-only form.\n",
375 |       "\n",
376 |       "_Click `File -> Download as -> IPython Notebook (.ipynb)` to save a copy of the notebook._"
377 |      ]
378 |     },
379 |     {
380 |      "cell_type": "markdown",
381 |      "metadata": {},
382 |      "source": [
383 |       ">**\\[Exercise\\]**: Share your code\n",
384 |       "\n",
385 |       "> 1. Create a new notebook with some text, figures, and code.\n",
386 |       "> 2. Save it to your desktop.\n",
387 |       "> 3. Open the .ipynb file with a text editor, select all the text and copy it to the clipboard.\n",
388 |       "> 4. Open [gist.github.com](https://gist.github.com/) and paste the text.\n",
389 |       "> 5. Name the file foo.ipynb and click \"Create public Gist\".\n",
390 |       "> 6. On the next page, click on \"Raw\".\n",
391 |       "> 7. Copy the URL to the clipboard and open a new tab with [nbviewer](http://nbviewer.ipython.org/).\n",
392 |       "> 8. Paste the URL to the raw ipynb into the text box and click \"Go!\"\n",
393 |       "\n",
394 |       "> _You now have an emailable link to share your notebook. An installation of IJulia is not required to view it!_\n",
395 |       "\n",
396 |       "The ``http://nbviewer.ipython.org/urls/..`` link is permanent so long as the original source (gist) exists."
397 |      ]
398 |     },
399 |     {
400 |      "cell_type": "markdown",
401 |      "metadata": {},
402 |      "source": [
403 |       "-------\n",
404 |       "\n",
405 |       "Some content in this notebook was adapted from materials by [Jonas Kersulis](https://github.com/kersulis/IJulia-WPS)"
406 |      ]
407 |     }
408 |    ],
409 |    "metadata": {}
410 |   }
411 |  ]
412 | }


--------------------------------------------------------------------------------
/6-nonlinear-opt/Nonlinear-DCP.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language": "Julia",
  4 |   "name": "",
  5 |   "signature": "sha256:27ed86b6429b1263a82aa23b801efdfdbf55ce8097de482f33eda074de9519dc"
  6 |  },
  7 |  "nbformat": 3,
  8 |  "nbformat_minor": 0,
  9 |  "worksheets": [
 10 |   {
 11 |    "cells": [
 12 |     {
 13 |      "cell_type": "heading",
 14 |      "level": 2,
 15 |      "metadata": {},
 16 |      "source": [
 17 |       "Convex optimization"
 18 |      ]
 19 |     },
 20 |     {
 21 |      "cell_type": "markdown",
 22 |      "metadata": {},
 23 |      "source": [
 24 |       "So far we've been thinking about general nonlinear optimization problems of the form\n",
 25 |       "\n",
 26 |       "\\begin{align}\n",
 27 |       "\\min \\quad&f(x)\\\\\n",
 28 |       "\\text{s.t.} \\quad& g(x) = 0, \\\\\n",
 29 |       "& h(x) \\leq 0.\n",
 30 |       "\\end{align}\n",
 31 |       "\n",
 32 |       "and derivative-based methods to solve them.\n",
 33 |       "\n",
 34 |       "A special class of nonlinear optimization problems are *convex* optimization problems where $f$ and $h$ are convex and $g$ is affine. Under some additional regularity assumptions, much of the duality theory from linear programming can be extended to convex optimization, and there exist efficient (polynomial-time) algorithms to solve these problems. With few exceptions, if your problem is convex, you can expect to be able to solve it efficiently."
 35 |      ]
 36 |     },
 37 |     {
 38 |      "cell_type": "markdown",
 39 |      "metadata": {},
 40 |      "source": [
 41 |       "### Detecting convexity\n",
 42 |       "\n",
 43 |       "A function $f: \\mathbb{R}^n \\to \\mathbb{R}$ is convex iff $f(\\theta x + (1-\\theta)y) \\leq \\theta f(x) + (1-\\theta)f(y), \\forall x,y \\in \\mathbb{R}^n \\text{ and } \\theta \\in [0,1]$.\n",
 44 |       "\n",
 45 |       "Given an arbitrary function $f$, detecting if $f$ is convex is [NP-Hard](http://web.mit.edu/~a_a_a/Public/Publications/convexity_nphard.pdf). So how do we know if a problem is convex?\n",
 46 |       "\n",
 47 |       "A reasonable approach is to make sure that a model is built-up in a manner that lets us prove convexity by using a calculus of convex analysis; this is  **Disciplined Convex Programming** (DCP).\n",
 48 |       "\n",
 49 |       "We start with operations that are known to be convex:\n",
 50 |       "- Norms (why?)\n",
 51 |       "- $\\exp(\\cdot)$\n",
 52 |       "- $-\\log(\\cdot)$\n",
 53 |       "- $x^p$ for $p \\geq 1$ and $x \\geq 0$.\n",
 54 |       "- $1/x$ for $x > 0$\n",
 55 |       "- $x^2$\n",
 56 |       "- ...\n",
 57 |       "\n",
 58 |       "Then add composition rules, e.g., $f(g(\\cdot))$ is convex when $f$ is convex and\n",
 59 |       "- $g$ is linear or affine\n",
 60 |       "- $f$ is monotonic increasing and $g$ is convex\n",
 61 |       "\n",
 62 |       "Also, $f_1+f_2$ and $\\max\\{f_1,f_2\\}$ are convex when $f_1$ and $f_2$ are convex.\n",
 63 |       "\n",
 64 |       "So our previous example of $x^2 - \\log(x)$ is convex by these rules, because it is the sum of convex functions. So is $\\max\\{e^x,1/x\\}$ ([plot](http://www.wolframalpha.com/input/?i=max%28exp%28x%29%2C1%2Fx%29+for+x+%3E+0)).\n",
 65 |       "\n",
 66 |       "Note that these rules are *sufficient* but not *necessary* to prove convexity. \n",
 67 |       "\n",
 68 |       "There are a lot of existing materials on DCP which we won't try to reproduce here. Let's head over to http://dcp.stanford.edu/."
 69 |      ]
 70 |     },
 71 |     {
 72 |      "cell_type": "markdown",
 73 |      "metadata": {},
 74 |      "source": [
 75 |       ">**\\[Exercise\\]**: DCP Quiz\n",
 76 |       "\n",
 77 |       "> Play the [DCP quiz](http://dcp.stanford.edu/quiz). Turn up the difficulty to hard for extra fun!"
 78 |      ]
 79 |     },
 80 |     {
 81 |      "cell_type": "markdown",
 82 |      "metadata": {},
 83 |      "source": [
 84 |       "### Solving \"DCP-compliant\" problems\n",
 85 |       "\n",
 86 |       "DCP rules are useful not just for proving convexity, but also for *solving* the problems.\n",
 87 |       "\n",
 88 |       "For example, we (should) know that the following problem\n",
 89 |       "\\begin{align}\n",
 90 |       "\\min \\quad& {||}x||_1\\\\\n",
 91 |       "\\text{s.t.} \\quad& Ax = b, \\\\\n",
 92 |       "& x \\geq 0,\n",
 93 |       "\\end{align}\n",
 94 |       "\n",
 95 |       "where $||x||_1 = \\sum_i |x_i|$ can be solved by using linear programming.\n",
 96 |       "\n",
 97 |       "Just introduce auxiliary variables $z$ and solve\n",
 98 |       "\\begin{align}\n",
 99 |       "\\min \\quad& \\sum_i z_i\\\\\n",
100 |       "\\text{s.t.} \\quad&z_i \\geq x_i, \\forall i\\\\\n",
101 |       "& z_i \\geq -x_i, \\forall i\\\\\n",
102 |       "& Ax = b, \\\\\n",
103 |       "& x \\geq 0,\n",
104 |       "\\end{align}\n",
105 |       "\n",
106 |       "Similarly\n",
107 |       "\\begin{align}\n",
108 |       "\\min \\quad& {||}x||_\\infty\\\\\n",
109 |       "\\text{s.t.} \\quad& Ax = b, \\\\\n",
110 |       "& x \\geq 0,\n",
111 |       "\\end{align}\n",
112 |       "\n",
113 |       "where $||x||_\\infty = \\max\\{|x_1|,\\cdots,|x_n|\\}$ can be formulated as\n",
114 |       "\n",
115 |       "\\begin{align}\n",
116 |       "\\min \\quad& z\\\\\n",
117 |       "\\text{s.t.} \\quad&z \\geq x_i, \\forall i\\\\\n",
118 |       "& z \\geq -x_i, \\forall i\\\\\n",
119 |       "& Ax = b, \\\\\n",
120 |       "& x \\geq 0,\n",
121 |       "\\end{align}\n",
122 |       "\n",
123 |       "(What do we do when $||\\cdot||_1$ and $||\\cdot||_\\infty$ appear in convex constraints?)\n",
124 |       "\n",
125 |       "Given these results, we might say that $||\\cdot||_1$ and $||\\cdot||_\\infty$ are *LP-representable*, in a sense that can be made rigorous.\n",
126 |       "\n",
127 |       "What about $||\\cdot||_2$? It's SOCP (second-order conic programming) representable, since\n",
128 |       "$$\n",
129 |       "||x||_2 \\leq t\n",
130 |       "$$\n",
131 |       "is precisely a second-order conic constraint that's already supported by Gurobi, CPLEX, MOSEK, ECOS, SCS, ...\n",
132 |       "\n",
133 |       "What about $1/x$? It's also SOCP representable since\n",
134 |       "$$\n",
135 |       "1/x \\leq t\n",
136 |       "$$\n",
137 |       "iff\n",
138 |       "$$\n",
139 |       "||(2,x-t)||_2 \\leq x+t.\n",
140 |       "$$\n",
141 |       "\n",
142 |       "It turns out that [A LOT](http://docs.mosek.com/generic/modeling-letter.pdf) of common convex functions are SOCP-representable.\n",
143 |       "\n",
144 |       "Once we know how to represent basic operations using LPs or SOCPs, we can easily compose them. For example, we would represent\n",
145 |       "\n",
146 |       "\\begin{align}\n",
147 |       "\\min \\quad& \\max\\{||Cx-d||,1/x_1\\}\\\\\n",
148 |       "\\text{s.t.} \\quad& Ax = b, \\\\\n",
149 |       "& x \\geq 0,\n",
150 |       "\\end{align}\n",
151 |       "\n",
152 |       "as\n",
153 |       "\n",
154 |       "\\begin{align}\n",
155 |       "\\min \\quad& t\\\\\n",
156 |       "\\text{s.t.} \\quad& t \\geq z_1 \\\\\n",
157 |       "&t \\geq z_2\\\\\n",
158 |       "&{||}Cx-d|| \\leq z_1\\\\\n",
159 |       "&{||}(2,x_1-z_2)|| \\leq x_1+z_2\\\\\n",
160 |       "& Ax = b, \\\\\n",
161 |       "& x \\geq 0,\n",
162 |       "\\end{align}\n",
163 |       "\n",
164 |       "and hand the problem off to Gurobi as an SOCP."
165 |      ]
166 |     },
167 |     {
168 |      "cell_type": "markdown",
169 |      "metadata": {},
170 |      "source": [
171 |       "### DCP in summary\n",
172 |       "\n",
173 |       "- Represent the model in a way that makes it easy to use DCP rules to prove convexity.\n",
174 |       "- Break down the individual pieces into parts that are representable using LP, SOCP, semidefinite programming, (or exponential cones)\n",
175 |       "- Use composition rules to *automatically* generate a complete formulation that can be given to existing solvers\n",
176 |       "- Note that derivatives aren't used anywhere!\n",
177 |       "\n",
178 |       "The first implementation of DCP was [CVX](http://cvxr.com/cvx/) in MATLAB. More recently, it's been implemented in [cvxpy](https://github.com/cvxgrp/cvxpy) and [Convex.jl](https://github.com/JuliaOpt/Convex.jl)."
179 |      ]
180 |     },
181 |     {
182 |      "cell_type": "heading",
183 |      "level": 2,
184 |      "metadata": {},
185 |      "source": [
186 |       "Support Vector Machines (SVM)"
187 |      ]
188 |     },
189 |     {
190 |      "cell_type": "markdown",
191 |      "metadata": {},
192 |      "source": [
193 |       "[Support vector machines](http://en.wikipedia.org/wiki/Support_vector_machine) are a popular model in machine learning for classification. We'll use this example to illustrate the basic use of Convex.jl.\n",
194 |       "\n",
195 |       "The basic problem is that we are given a set of N points $x_1,x_2,\\ldots, x_N \\in \\mathbb{R}^n$ and labels $y_1, y_2, \\ldots y_n \\in \\{-1,+1\\}$. And we want to find a hyperplane of the form $w^Tx-b = 0$ that *separates* the two classes, i.e. $w^Tx_i - b \\geq 1$ when $y_i = +1$ and $w^Tx_i - b \\leq -1$ when $y_i = -1$. This condition can be written as $y_i(w^Tx_i - b) \\geq 1, \\forall\\, i$.\n",
196 |       "\n",
197 |       "Such a hyperplane will not exist in general if the data overlap, so instead we'll just try to minimize violations of the constraint $y_i(w^Tx_i - b) \\geq 1, \\forall\\, i$ by adding a penalty when it is violated. The optimization problem can be stated as\n",
198 |       "$$\n",
199 |       "\\min_{w,b} \\sum_{i=1}^N \\left[\\max\\{0, 1 - y_i(w^Tx_i - b)\\}\\right] + \\gamma ||w||_2^2\n",
200 |       "$$\n",
201 |       "Note that we penalize the norm of $w$ in order to guarantee a unique solution.\n",
202 |       "\n",
203 |       "Now let's write our own SVM solver!"
204 |      ]
205 |     },
206 |     {
207 |      "cell_type": "code",
208 |      "collapsed": false,
209 |      "input": [
210 |       "using Distributions\n",
211 |       "using PyPlot\n",
212 |       "using Convex\n",
213 |       "using ECOS"
214 |      ],
215 |      "language": "python",
216 |      "metadata": {},
217 |      "outputs": []
218 |     },
219 |     {
220 |      "cell_type": "code",
221 |      "collapsed": false,
222 |      "input": [
223 |       "# Function to generate some random test data\n",
224 |       "function gen_data(N)\n",
225 |       "    # for +1 data, symmetric multivariate normal with center at (1,2)\n",
226 |       "    pos = rand(MvNormal([1.0,2.0],1.0),N)\n",
227 |       "    # for -1 data, symmetric multivariate normal with center at (-1,1)\n",
228 |       "    neg = rand(MvNormal([-1.0,1.0],1.0),N)\n",
229 |       "    x = [pos neg]\n",
230 |       "    y = [fill(+1,N),fill(-1,N)]\n",
231 |       "    return x,y\n",
232 |       "end"
233 |      ],
234 |      "language": "python",
235 |      "metadata": {},
236 |      "outputs": []
237 |     },
238 |     {
239 |      "cell_type": "markdown",
240 |      "metadata": {},
241 |      "source": [
242 |       "Let's see what the data look like."
243 |      ]
244 |     },
245 |     {
246 |      "cell_type": "code",
247 |      "collapsed": false,
248 |      "input": [
249 |       "x,y = gen_data(100)\n",
250 |       "plot(x[1,1:100], x[2,1:100], \"ro\", x[1,101:200], x[2,101:200], \"bo\");"
251 |      ],
252 |      "language": "python",
253 |      "metadata": {},
254 |      "outputs": []
255 |     },
256 |     {
257 |      "cell_type": "markdown",
258 |      "metadata": {},
259 |      "source": [
260 |       "Now we translate the optimization problem into Convex.jl form."
261 |      ]
262 |     },
263 |     {
264 |      "cell_type": "code",
265 |      "collapsed": false,
266 |      "input": [
267 |       "const \u03b3 = 0.005\n",
268 |       "function svm_convex(x,y)\n",
269 |       "    n = size(x,1) # problem dimension\n",
270 |       "    N = size(x,2) # number of points\n",
271 |       "    w = Variable(n)\n",
272 |       "    b = Variable()\n",
273 |       "    \n",
274 |       "    problem = minimize( \u03b3*sum_squares(w) + sum(max(1-y.*(x'*w-b),0)))\n",
275 |       "    solve!(problem, ECOSSolver())\n",
276 |       "    return evaluate(w), evaluate(b)\n",
277 |       "end"
278 |      ],
279 |      "language": "python",
280 |      "metadata": {},
281 |      "outputs": []
282 |     },
283 |     {
284 |      "cell_type": "markdown",
285 |      "metadata": {},
286 |      "source": [
287 |       "And the solution?"
288 |      ]
289 |     },
290 |     {
291 |      "cell_type": "code",
292 |      "collapsed": false,
293 |      "input": [
294 |       "N = 1000\n",
295 |       "x,y = gen_data(N)\n",
296 |       "\n",
297 |       "plot(x[1,1:N], x[2,1:N], \"ro\", x[1,(N+1):2N], x[2,(N+1):2N], \"bo\");\n",
298 |       "w,b = svm_convex(x,y)\n",
299 |       "\n",
300 |       "@show w,b\n",
301 |       "\n",
302 |       "xmin, xmax = xlim()\n",
303 |       "ymin, ymax = ylim()\n",
304 |       "y1 = (1+b-w[1]*xmin)/w[2]\n",
305 |       "y2 = (1+b-w[1]*xmax)/w[2]\n",
306 |       "plot([xmin,xmax], [y1,y2], \"k-\");\n",
307 |       "y1 = (-1+b-w[1]*xmin)/w[2]\n",
308 |       "y2 = (-1+b-w[1]*xmax)/w[2]\n",
309 |       "plot([xmin,xmax], [y1,y2], \"k-\");\n",
310 |       "y1 = (b-w[1]*xmin)/w[2]\n",
311 |       "y2 = (b-w[1]*xmax)/w[2]\n",
312 |       "ylim(ymin,ymax)\n",
313 |       "plot([xmin,xmax], [y1,y2], \"k-\");"
314 |      ],
315 |      "language": "python",
316 |      "metadata": {},
317 |      "outputs": []
318 |     },
319 |     {
320 |      "cell_type": "markdown",
321 |      "metadata": {},
322 |      "source": [
323 |       ">**\\[Exercise\\]**: Sensitivity\n",
324 |       "\n",
325 |       "> Increase the separation between the positive and negative data by modifying the means in ``gen_data``. How does the solution change?\n"
326 |      ]
327 |     },
328 |     {
329 |      "cell_type": "markdown",
330 |      "metadata": {},
331 |      "source": [
332 |       ">**\\[Exercise\\]**: JuMP version\n",
333 |       "\n",
334 |       "> Translate the Convex.jl model into a JuMP model with linear constraints and a quadratic objective. For example, ``sum_squares(w)`` becomes ``sum{w[i]^2,i=1:n}``. Hint: the formulation is given on Wikpedia. (You may want to use ``IpoptSolver`` since ``ECOSSolver`` supports second-order conic constraints but won't directly accept quadratic objectives.)"
335 |      ]
336 |     },
337 |     {
338 |      "cell_type": "markdown",
339 |      "metadata": {},
340 |      "source": [
341 |       "### Discussion\n",
342 |       "\n",
343 |       "- Convex.jl vs. JuMP\n",
344 |       "- Derivative-based nonlinear vs. automatic transformation to LP/SOCP/conic form"
345 |      ]
346 |     }
347 |    ],
348 |    "metadata": {}
349 |   }
350 |  ]
351 | }


--------------------------------------------------------------------------------
/6-nonlinear-opt/Nonlinear-DualNumbers.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language": "Julia",
  4 |   "name": "",
  5 |   "signature": "sha256:7da62c4657919dcad2e91d6a7b71dd3c79a52667ac61d8a9cf06a1c2ce7230c2"
  6 |  },
  7 |  "nbformat": 3,
  8 |  "nbformat_minor": 0,
  9 |  "worksheets": [
 10 |   {
 11 |    "cells": [
 12 |     {
 13 |      "cell_type": "heading",
 14 |      "level": 1,
 15 |      "metadata": {},
 16 |      "source": [
 17 |       "Computing derivatives for nonlinear optimization: Forward mode automatic differentiation"
 18 |      ]
 19 |     },
 20 |     {
 21 |      "cell_type": "markdown",
 22 |      "metadata": {},
 23 |      "source": [
 24 |       "Consider a general constrained nonlinear optimization problem:\n",
 25 |       "$$\n",
 26 |       "\\begin{align}\n",
 27 |       "\\min \\quad&f(x)\\\\\n",
 28 |       "\\text{s.t.} \\quad& g(x) = 0, \\\\\n",
 29 |       "& h(x) \\leq 0.\n",
 30 |       "\\end{align}\n",
 31 |       "$$\n",
 32 |       "where $f : \\mathbb{R}^n \\to \\mathbb{R}, g : \\mathbb{R}^n \\to \\mathbb{R}^r$, and $h: \\mathbb{R}^n \\to \\mathbb{R}^s$.\n",
 33 |       "\n",
 34 |       "When $f$ and $h$ are convex and $g$ is affine, we can hope for a globally optimal solution, otherwise typically we can only ask for a locally optimal solution.\n",
 35 |       "\n",
 36 |       "What approaches can we use to solve this?\n",
 37 |       " - When $r=0$ and $s = 0$ (unconstrained), and $f$ differentiable, most classical approach is [gradient descent](http://en.wikipedia.org/wiki/Gradient_descent), also fancier methods like [Newton's method](http://en.wikipedia.org/wiki/Newton%27s_method) and quasi-newton methods like [BFGS](http://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm).\n",
 38 |       " - When $f$ differentiable and $g$ and $h$ linear, [gradient projection](http://neos-guide.org/content/gradient-projection-methods)\n",
 39 |       " - When $f$, $g$, and $h$ differentiable, [sequential quadratic programming](http://www.neos-guide.org/content/sequential-quadratic-programming)\n",
 40 |       " - When $f$, $g$, and $h$ twice differentiable, [interior-point methods](http://en.wikipedia.org/wiki/Interior_point_method)\n",
 41 |       " - When derivatives \"not available\", [derivative-free optimization](http://rd.springer.com/article/10.1007/s10898-012-9951-y)\n",
 42 |       " \n",
 43 |       "This is not meant to be an exhaustive list, see http://plato.asu.edu/sub/nlores.html#general and http://www.neos-guide.org/content/nonlinear-programming for more details."
 44 |      ]
 45 |     },
 46 |     {
 47 |      "cell_type": "markdown",
 48 |      "metadata": {},
 49 |      "source": [
 50 |       "## How are derivatives computed?\n",
 51 |       "\n",
 52 |       "- Hand-written by applying chain rule\n",
 53 |       "- Finite difference approximation $\\frac{\\partial f}{\\partial x_i} = \\lim_{h\\to 0} \\frac{f(x+h e_i)-f(x)}{h}$\n",
 54 |       "- **Automatic differentiation**\n",
 55 |       "  - Idea: Automatically transform code to compute a function into code to compute its derivatives"
 56 |      ]
 57 |     },
 58 |     {
 59 |      "cell_type": "markdown",
 60 |      "metadata": {},
 61 |      "source": [
 62 |       "## Dual Numbers\n",
 63 |       "\n",
 64 |       "Consider numbers of the form $x + y\\epsilon$ with $x,y \\in \\mathbb{R}$. We *define* $\\epsilon^2 = 0$, so\n",
 65 |       "$$\n",
 66 |       "(x_1 + y_1\\epsilon)(x_2+y_2\\epsilon) = x_1x_2 + (x_1y_2 + x_2y_1)\\epsilon.\n",
 67 |       "$$\n",
 68 |       "These are called the *dual numbers*. Think of $\\epsilon$ as an infinitesimal perturbation (you've probably seen hand-wavy algebra using $(dx)^2 = 0$ when computing integrals - this is the same idea).\n",
 69 |       "\n",
 70 |       "If we are given an infinitely differentiable function in Taylor expanded form\n",
 71 |       "$$\n",
 72 |       "f(x) = \\sum_{k=0}^{\\infty} \\frac{f^{(k)}(a)}{k!} (x-a)^k\n",
 73 |       "$$\n",
 74 |       "it follows that \n",
 75 |       "$$\n",
 76 |       "f(x+y\\epsilon) = \\sum_{k=0}^{\\infty} \\frac{f^{(k)}(a)}{k!} (x-a+y\\epsilon)^k = \\sum_{k=0}^{\\infty} \\frac{f^{(k)}(a)}{k!} (x-a)^k + y\\epsilon\\sum_{k=0}^{\\infty} \\frac{f^{(k)}(a)}{k!}\\binom{k}{1} (x-a)^{k-1} = f(x) + yf'(x)\\epsilon\n",
 77 |       "$$\n",
 78 |       "\n",
 79 |       "Let's unpack what's going on here. We started with a function $f : \\mathbb{R} \\to \\mathbb{R}$. Dual numbers are *not* real numbers, so it doesn't even make sense to ask for the value $f(x+y\\epsilon)$ given $x+y\\epsilon \\in \\mathbb{D}$ (the set of dual numbers). But anyway we plugged the dual number into the Taylor expansion, and by using the algebra rule $\\epsilon^2 = 0$ we found that $f(x+y\\epsilon)$ must be equal to $f(x) + yf'(x)\\epsilon$ if we use the Taylor expansion as the definition of $f : \\mathbb{D} \\to \\mathbb{R}$."
 80 |      ]
 81 |     },
 82 |     {
 83 |      "cell_type": "markdown",
 84 |      "metadata": {},
 85 |      "source": [
 86 |       "Alternatively, for any once differentiable function $f : \\mathbb{R} \\to \\mathbb{R}$, we can *define* its extension to the dual numbers as\n",
 87 |       "$$\n",
 88 |       "f(x+y\\epsilon) = f(x) + yf'(x)\\epsilon.\n",
 89 |       "$$\n",
 90 |       "This is essentially equivalent to the previous definition.\n",
 91 |       "\n",
 92 |       "Let's verify a very basic property, the chain rule, using this definition.\n",
 93 |       "\n",
 94 |       "Suppose $h(x) = f(g(x))$. Then,\n",
 95 |       "$$\n",
 96 |       "h(x+y\\epsilon) = f(g(x+y\\epsilon)) = f(g(x) + yg'(x)\\epsilon) = f(g(x)) + yg'(x)f'(g(x))\\epsilon = h(x) + yh'(x)\\epsilon.\n",
 97 |       "$$\n",
 98 |       "\n",
 99 |       "Maybe that's not too surprising, but it's actually a quite useful observation."
100 |      ]
101 |     },
102 |     {
103 |      "cell_type": "markdown",
104 |      "metadata": {},
105 |      "source": [
106 |       "### Implementation\n",
107 |       "\n",
108 |       "Dual numbers are implemented in the [DualNumbers](https://github.com/JuliaDiff/DualNumbers.jl) package in Julia."
109 |      ]
110 |     },
111 |     {
112 |      "cell_type": "code",
113 |      "collapsed": false,
114 |      "input": [
115 |       "using DualNumbers"
116 |      ],
117 |      "language": "python",
118 |      "metadata": {},
119 |      "outputs": []
120 |     },
121 |     {
122 |      "cell_type": "markdown",
123 |      "metadata": {},
124 |      "source": [
125 |       "You construct $x + y\\epsilon$ with ``Dual(x,y)``. The real and epsilon components are accessed as ``real(d)`` and ``epsilon(d)``:"
126 |      ]
127 |     },
128 |     {
129 |      "cell_type": "code",
130 |      "collapsed": false,
131 |      "input": [
132 |       "d = Dual(2.0,1.0)"
133 |      ],
134 |      "language": "python",
135 |      "metadata": {},
136 |      "outputs": []
137 |     },
138 |     {
139 |      "cell_type": "code",
140 |      "collapsed": false,
141 |      "input": [
142 |       "typeof(d)"
143 |      ],
144 |      "language": "python",
145 |      "metadata": {},
146 |      "outputs": []
147 |     },
148 |     {
149 |      "cell_type": "code",
150 |      "collapsed": false,
151 |      "input": [
152 |       "real(d)"
153 |      ],
154 |      "language": "python",
155 |      "metadata": {},
156 |      "outputs": []
157 |     },
158 |     {
159 |      "cell_type": "code",
160 |      "collapsed": false,
161 |      "input": [
162 |       "epsilon(d)"
163 |      ],
164 |      "language": "python",
165 |      "metadata": {},
166 |      "outputs": []
167 |     },
168 |     {
169 |      "cell_type": "markdown",
170 |      "metadata": {},
171 |      "source": [
172 |       "How is addition of dual numbers defined?"
173 |      ]
174 |     },
175 |     {
176 |      "cell_type": "code",
177 |      "collapsed": false,
178 |      "input": [
179 |       "@which d+Dual(3.0,4.0)"
180 |      ],
181 |      "language": "python",
182 |      "metadata": {},
183 |      "outputs": []
184 |     },
185 |     {
186 |      "cell_type": "markdown",
187 |      "metadata": {},
188 |      "source": [
189 |       "Clicking on the link, we'll see:\n",
190 |       "```julia\n",
191 |       "+(z::Dual, w::Dual) = dual(real(z)+real(w), epsilon(z)+epsilon(w))\n",
192 |       "```"
193 |      ]
194 |     },
195 |     {
196 |      "cell_type": "markdown",
197 |      "metadata": {},
198 |      "source": [
199 |       "Multiplication?"
200 |      ]
201 |     },
202 |     {
203 |      "cell_type": "code",
204 |      "collapsed": false,
205 |      "input": [
206 |       "Dual(2.0,2.0)*Dual(3.0,4.0)"
207 |      ],
208 |      "language": "python",
209 |      "metadata": {},
210 |      "outputs": []
211 |     },
212 |     {
213 |      "cell_type": "code",
214 |      "collapsed": false,
215 |      "input": [
216 |       "@which Dual(2.0,2.0)*Dual(3.0,4.0)"
217 |      ],
218 |      "language": "python",
219 |      "metadata": {},
220 |      "outputs": []
221 |     },
222 |     {
223 |      "cell_type": "markdown",
224 |      "metadata": {},
225 |      "source": [
226 |       "The code is:\n",
227 |       "```julia\n",
228 |       "*(z::Dual, w::Dual) = dual(real(z)*real(w), epsilon(z)*real(w)+real(z)*epsilon(w))\n",
229 |       "```"
230 |      ]
231 |     },
232 |     {
233 |      "cell_type": "markdown",
234 |      "metadata": {},
235 |      "source": [
236 |       "Basic univariate functions?"
237 |      ]
238 |     },
239 |     {
240 |      "cell_type": "code",
241 |      "collapsed": false,
242 |      "input": [
243 |       "log(Dual(2.0,1.0))"
244 |      ],
245 |      "language": "python",
246 |      "metadata": {},
247 |      "outputs": []
248 |     },
249 |     {
250 |      "cell_type": "code",
251 |      "collapsed": false,
252 |      "input": [
253 |       "1/2.0"
254 |      ],
255 |      "language": "python",
256 |      "metadata": {},
257 |      "outputs": []
258 |     },
259 |     {
260 |      "cell_type": "markdown",
261 |      "metadata": {},
262 |      "source": [
263 |       "How is this implemented?"
264 |      ]
265 |     },
266 |     {
267 |      "cell_type": "code",
268 |      "collapsed": false,
269 |      "input": [
270 |       "@code_lowered log(Dual(2.0,1.0))"
271 |      ],
272 |      "language": "python",
273 |      "metadata": {},
274 |      "outputs": []
275 |     },
276 |     {
277 |      "cell_type": "markdown",
278 |      "metadata": {},
279 |      "source": [
280 |       "Trig functions?"
281 |      ]
282 |     },
283 |     {
284 |      "cell_type": "code",
285 |      "collapsed": false,
286 |      "input": [
287 |       "@code_lowered sin(Dual(2.0,1.0))"
288 |      ],
289 |      "language": "python",
290 |      "metadata": {},
291 |      "outputs": []
292 |     },
293 |     {
294 |      "cell_type": "markdown",
295 |      "metadata": {},
296 |      "source": [
297 |       "## Computing derivatives of functions"
298 |      ]
299 |     },
300 |     {
301 |      "cell_type": "markdown",
302 |      "metadata": {},
303 |      "source": [
304 |       "We can define a function in Julia as:"
305 |      ]
306 |     },
307 |     {
308 |      "cell_type": "code",
309 |      "collapsed": false,
310 |      "input": [
311 |       "f(x) = x^2 - log(x)\n",
312 |       "# Or equivalently\n",
313 |       "function f(x)\n",
314 |       "    return x^2 - log(x)\n",
315 |       "end"
316 |      ],
317 |      "language": "python",
318 |      "metadata": {},
319 |      "outputs": []
320 |     },
321 |     {
322 |      "cell_type": "markdown",
323 |      "metadata": {},
324 |      "source": [
325 |       ">**\\[Exercise\\]**: Differentiate it!\n",
326 |       "\n",
327 |       "> 1. Evaluate $f$ at $1 + \\epsilon$. What are $f(1)$ and $f'(1)$?\n",
328 |       "> 2. Evaluate $f$ at $\\frac{1}{\\sqrt{2}} + \\epsilon$. What are $f(\\frac{1}{\\sqrt{2}})$ and $f'(\\frac{1}{\\sqrt{2}})$?\n",
329 |       "> 3. Define a new function ``fprime`` which returns the derivative of ``f`` by using ``DualNumbers``.\n",
330 |       "> 3. Use the finite difference formula $$\n",
331 |       "f'(x) \\approx \\frac{f(x+h)-f(x)}{h}\n",
332 |       "$$\n",
333 |       "to evaluate $f'(\\frac{1}{\\sqrt{2}})$ approximately using a range of values of $h$. Visualize the approximation error using ``@manipulate``, plots, or both!"
334 |      ]
335 |     },
336 |     {
337 |      "cell_type": "markdown",
338 |      "metadata": {},
339 |      "source": [
340 |       "### How general is it?\n",
341 |       "\n",
342 |       "Recall [Newton's iterative method](http://en.wikipedia.org/wiki/Newton%27s_method) for finding zeros:\n",
343 |       "$$\n",
344 |       "x \\leftarrow x - \\frac{f(x)}{f'(x)}\n",
345 |       "$$\n",
346 |       "until $f(x) \\approx 0$."
347 |      ]
348 |     },
349 |     {
350 |      "cell_type": "markdown",
351 |      "metadata": {},
352 |      "source": [
353 |       "Let's use this method to compute $\\sqrt{x}$ by solving $f(z) = 0$ where $f(z) = z^2-x$.\n",
354 |       "So $f'(z) = 2z$, and we can implement the algorithm as:"
355 |      ]
356 |     },
357 |     {
358 |      "cell_type": "code",
359 |      "collapsed": false,
360 |      "input": [
361 |       "function squareroot(x)\n",
362 |       "    z = x # Initial starting point\n",
363 |       "    while abs(z*z - x) > 1e-13\n",
364 |       "        z = z - (z*z-x)/(2z)\n",
365 |       "    end\n",
366 |       "    return z\n",
367 |       "end"
368 |      ],
369 |      "language": "python",
370 |      "metadata": {},
371 |      "outputs": []
372 |     },
373 |     {
374 |      "cell_type": "code",
375 |      "collapsed": false,
376 |      "input": [
377 |       "squareroot(100)"
378 |      ],
379 |      "language": "python",
380 |      "metadata": {},
381 |      "outputs": []
382 |     },
383 |     {
384 |      "cell_type": "markdown",
385 |      "metadata": {},
386 |      "source": [
387 |       "Can we differentiate this code? **Yes!**"
388 |      ]
389 |     },
390 |     {
391 |      "cell_type": "code",
392 |      "collapsed": false,
393 |      "input": [
394 |       "d = squareroot(Dual(100.0,1.0))"
395 |      ],
396 |      "language": "python",
397 |      "metadata": {},
398 |      "outputs": []
399 |     },
400 |     {
401 |      "cell_type": "code",
402 |      "collapsed": false,
403 |      "input": [
404 |       "epsilon(d) # Computed derivative"
405 |      ],
406 |      "language": "python",
407 |      "metadata": {},
408 |      "outputs": []
409 |     },
410 |     {
411 |      "cell_type": "code",
412 |      "collapsed": false,
413 |      "input": [
414 |       "1/(2*sqrt(100)) # The exact derivative"
415 |      ],
416 |      "language": "python",
417 |      "metadata": {},
418 |      "outputs": []
419 |     },
420 |     {
421 |      "cell_type": "code",
422 |      "collapsed": false,
423 |      "input": [
424 |       "abs(epsilon(d)-1/(2*sqrt(100)))"
425 |      ],
426 |      "language": "python",
427 |      "metadata": {},
428 |      "outputs": []
429 |     },
430 |     {
431 |      "cell_type": "markdown",
432 |      "metadata": {},
433 |      "source": [
434 |       "### Multivariate functions?\n",
435 |       "\n",
436 |       "Dual numbers can be used to compute the gradient of a function $f: \\mathbb{R}^n \\to \\mathbb{R}$. This requires $n$ evaluations of $f$ with dual number input, essentially computing the partial derivative in each of the $n$ dimensions. We won't get into the details, but this procedure is [implemented](https://github.com/JuliaOpt/Optim.jl/blob/583907676b5b99cdb2d4cba37f6026a3fe620a49/src/autodiff.jl) in [Optim](https://github.com/JuliaOpt/Optim.jl) with the ``autodiff=true`` keyword."
437 |      ]
438 |     },
439 |     {
440 |      "cell_type": "code",
441 |      "collapsed": false,
442 |      "input": [
443 |       "using Optim\n",
444 |       "rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n",
445 |       "optimize(rosenbrock, [0.0, 0.0], method = :l_bfgs, autodiff = true)"
446 |      ],
447 |      "language": "python",
448 |      "metadata": {},
449 |      "outputs": []
450 |     },
451 |     {
452 |      "cell_type": "markdown",
453 |      "metadata": {},
454 |      "source": [
455 |       "When $n$ is large, there's an alternative procedure called [reverse-mode automatic differentiation](http://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation) which requires only $O(1)$ evaluations of $f$ to compute its gradient. This is the method used internally by JuMP (implemented in [ReverseDiffSparse](https://github.com/mlubin/ReverseDiffSparse.jl))."
456 |      ]
457 |     },
458 |     {
459 |      "cell_type": "markdown",
460 |      "metadata": {},
461 |      "source": [
462 |       "## Conclusions\n",
463 |       "\n",
464 |       "- We can compute numerically exact derivatives of any differentiable function which is implemented by using a sequence of basic operations.\n",
465 |       "- In Julia it's very easy to use dual numbers for this!\n",
466 |       "- Reconsider when derivatives are \"not available.\"\n",
467 |       "\n",
468 |       "This was just an introduction to one technique from the area of automatic differentiation. For more references, see [autodiff.org](http://www.autodiff.org/?module=Introduction&submenu=Selected%20Books)."
469 |      ]
470 |     }
471 |    ],
472 |    "metadata": {}
473 |   }
474 |  ]
475 | }


--------------------------------------------------------------------------------
/7-adv-optimization/Callbacks.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language": "Julia",
  4 |   "name": "",
  5 |   "signature": "sha256:65da469e9272a8bf11bfe753e257d8533eb871dfe5aa21aab4e553fa067b5292"
  6 |  },
  7 |  "nbformat": 3,
  8 |  "nbformat_minor": 0,
  9 |  "worksheets": [
 10 |   {
 11 |    "cells": [
 12 |     {
 13 |      "cell_type": "markdown",
 14 |      "metadata": {},
 15 |      "source": [
 16 |       "# Callbacks in Integer Programming\n",
 17 |       "\n",
 18 |       "As we discussed in the beginning, MIP solvers are complicated combinations of many techniques: cutting planes, heuristics, branching rules, etc.\n",
 19 |       "\n",
 20 |       "Some solvers allow you to customize aspects of the solve process in a deeper way than just setting options for these parameters. You can provide code to be run when certain events happen, and the solver **calls back** to these functions to ask what action(s) should be taken. Why might you want to do this?\n",
 21 |       "\n",
 22 |       "* The solver is struggling to find an integer solution. You know an efficient way to take a fractional solution and convert it to a good, if not optimal, integer solution. You can put this algorithm inside a **heuristic callback** that is called whenever a new fractional solution is found.\n",
 23 |       "* You have done an analysis of the structure of your MIP and have realized that you can find constraints that will cut off fractional solutions so that your LP relaxation is closer to integer points. You can write this as a **cut callback**.\n",
 24 |       "\n",
 25 |       "The particular example we will look at today is in some ways even more critical than these two types, because it enables whole types of problems to be solved that would be very difficult otherwise. In particular, consider a problem that has a very large number of constraints, **most of which will not be binding at the optimal solution**. This suggests that we probably don't need all those constraints to provided explicitly to the solver - instead, we can provide them implicitly with a **lazy constraint/cut callback**.\n",
 26 |       "\n",
 27 |       "*On board: flow chart*"
 28 |      ]
 29 |     },
 30 |     {
 31 |      "cell_type": "markdown",
 32 |      "metadata": {},
 33 |      "source": [
 34 |       "## Application: Robust Portfolio Optimization\n",
 35 |       "\n",
 36 |       "Portfolio optmization is the problem of constructing a portfolio of assets to maximize returns, but usually with some consideration towards the risk of the portfolio. If we maximize for return, we will usually also have the highest chance of losing money. On the other hand, there is often a (very) low risk option that has minimal returns (e.g. US government bonds). We seek to construct optimization models that are let us explore this spectrum of options.\n",
 37 |       "\n",
 38 |       "The \"stochastic programming\" approach would estimate a probability distribution from data for each asset we are considering purchasing, and then we can do things like\n",
 39 |       "\n",
 40 |       "- minimize $StdDev[Profit]$, subject to $Exp[Profit] \\geq P_{min}$\n",
 41 |       "- maximize $E[Profit]$, subject to $StdDev[Profit] \\leq S_{max}$\n",
 42 |       "\n",
 43 |       "**Robust optimization** is an alternative method that, instead of saying that the uncertain return of the assets coming from probability distributions, says the returns are drawn from a bounded set of outcomes: an **uncertainty set**.\n",
 44 |       "\n",
 45 |       "### Setting up the Problem\n",
 46 |       "\n",
 47 |       "We will consider the following robust portfolio problem.\n",
 48 |       "\n",
 49 |       "- Let $0 \\leq x_i \\leq 1$ be the share of our money we put into asset $i$.\n",
 50 |       "  - We need the additional constraint then that $\\mathbf{e}^\\prime \\mathbf{x} = 1$\n",
 51 |       "  - We'll also impose a restriction that we can use no more than a quarter of the assets available.\n",
 52 |       "  - Let $y_i \\in \\{0,1\\}$, $y_i = 1 \\iff x_i > 0$, and $\\mathbf{e}^\\prime \\mathbf{y} \\leq \\frac{N}{4}$\n",
 53 |       "\n",
 54 |       "- Let $p_i$ be the uncertain profit for asset $i$, with $\\mathbf{p}\\in U$, where...\n",
 55 |       "\n",
 56 |       "- $U$ is out uncertainty set. By varying its size and shape of the uncertainty set $U$ we can tradeoff between expected return and the worst-case return. We will assume we have (as data)\n",
 57 |       "  - $\\bar{p}_i$, the expected return of each asset\n",
 58 |       "  - $\\sigma_i$, the standard devition of return for each asset\n",
 59 |       "\n",
 60 |       "We will use the **ellipsoidal uncertainty set**\n",
 61 |       "\n",
 62 |       "$$ U^\\Gamma = \\left\\{ \\mathbf{p} \\mid p_i = \\bar{p}_i + \\sigma_i d_i, \\|\\mathbf{d}\\|\\leq \\Gamma \\right\\}$$\n",
 63 |       "\n",
 64 |       "*on board: diagram*\n",
 65 |       "\n",
 66 |       "So we can write out our problem now as\n",
 67 |       "\n",
 68 |       "$$\n",
 69 |       "\\max_{z, \\mathbf{x}\\geq \\mathbf{0}} z \\quad \\text{subject to}\\\\\n",
 70 |       "z \\leq \\mathbf{p}^\\prime \\mathbf{x} \\quad \\forall \\mathbf{p} \\in U \\\\\n",
 71 |       "\\mathbf{e}^\\prime \\mathbf{x} = 1 \\\\\n",
 72 |       "y_i \\geq x_i \\\\\n",
 73 |       "\\mathbf{e}^\\prime \\mathbf{y} \\leq \\frac{N}{4}\n",
 74 |       "$$\n",
 75 |       "\n",
 76 |       "The problem with the first constraint is that it is actually an **infinite** number of constraints - one for every possible value of $\\mathbf{p}$. We conjecture though that only a small number of them are needed to a get solution that \"mostly\" satisifes that constraint. We'll add them lazily using **lazy constraints** in JuMP with Gurobi. \n",
 77 |       "\n",
 78 |       "Whenever Gurobi finds a new integer-feasible solution $\\left( \\mathbf{x}^\\ast, \\mathbf{y}^\\ast, z^\\ast \\right)$, we will try to generate a new constraint. We do that by solving an **embedded** optimization problem:\n",
 79 |       "\n",
 80 |       "$$CUT(\\mathbf{x}^\\ast) = {\\arg \\min}_{\\mathbf{p}\\in U} \\mathbf{p}^\\prime \\mathbf{x}^\\ast$$\n",
 81 |       "\n",
 82 |       "*on board: diagram*\n",
 83 |       "\n",
 84 |       "We'll only add this new constraint if the it'd be violated by the current solution by more than a tolerance. Today we'll actually solve this embedded problem using Gurobi, but as an **exercise** you can solve it in closed-form - see how much of an improvement in solve times you get!"
 85 |      ]
 86 |     },
 87 |     {
 88 |      "cell_type": "code",
 89 |      "collapsed": false,
 90 |      "input": [
 91 |       "using JuMP, Gurobi\n",
 92 |       "\n",
 93 |       "# Generate data\n",
 94 |       "n = 20\n",
 95 |       "p\u0304 = [1.15 + i*0.05/150 for i in 1:n]\n",
 96 |       "\u03c3 = [0.05/450*\u221a(2*i*n*(n+1)) for i in 1:n]\n",
 97 |       "\n",
 98 |       "function solve_portfolio()\n",
 99 |       "    port = Model(solver=GurobiSolver())\n",
100 |       "    \n",
101 |       "    @defVar(port, z \u2264 maximum(p\u0304))\n",
102 |       "    @setObjective(port, Max, z)\n",
103 |       "    @defVar(port, 0 \u2264 x[1:n] \u2264 1)\n",
104 |       "    @addConstraint(port, sum(x) == 1)\n",
105 |       "    \n",
106 |       "    @defVar(port, y[1:n], Bin)\n",
107 |       "    for i in 1:n\n",
108 |       "        @addConstraint(port, y[i] \u2265 x[i])\n",
109 |       "    end\n",
110 |       "    @addConstraint(port, sum(y) \u2264 div(n,4))\n",
111 |       "    \n",
112 |       "    # Link z to x\n",
113 |       "    function portobj(cb)\n",
114 |       "        # Get values of z and x\n",
115 |       "        zval = getValue(z)\n",
116 |       "        xval = getValue(x)[:]\n",
117 |       "    \n",
118 |       "        # Find most pessimistic value of p'x\n",
119 |       "        # over all p in the uncertainty set\n",
120 |       "        rob = Model(solver=GurobiSolver(OutputFlag=0))\n",
121 |       "        @defVar(rob, p[i=1:n])\n",
122 |       "        @defVar(rob, d[i=1:n])\n",
123 |       "        @setObjective(rob, Min, dot(xval,p))\n",
124 |       "        \u0393 = sqrt(10)\n",
125 |       "        @addConstraint(rob, sum{d[i]^2,i=1:n} \u2264 \u0393^2)\n",
126 |       "        for i in 1:n\n",
127 |       "            @addConstraint(rob, p[i] == p\u0304[i] + \u03c3[i]*d[i])\n",
128 |       "        end\n",
129 |       "        solve(rob)\n",
130 |       "        worst_z = getObjectiveValue(rob)\n",
131 |       "        @show (zval, worst_z)\n",
132 |       "        worst_p = getValue(p)[:]\n",
133 |       "        \n",
134 |       "        # Is this worst_p going to change the objective\n",
135 |       "        # because worst_z is worse than the current z?\n",
136 |       "        if worst_z < zval - 1e-2\n",
137 |       "            # Yep, we've made things worse!\n",
138 |       "            # Gurobi should try to find a better portfolio now\n",
139 |       "            @addLazyConstraint(cb, z \u2264 dot(worst_p,x))\n",
140 |       "        end\n",
141 |       "    end\n",
142 |       "    setLazyCallback(port, portobj)\n",
143 |       "    \n",
144 |       "    solve(port)\n",
145 |       "    \n",
146 |       "    return getValue(x)[:]\n",
147 |       "end\n",
148 |       "\n",
149 |       "solve_portfolio()"
150 |      ],
151 |      "language": "python",
152 |      "metadata": {},
153 |      "outputs": [
154 |       {
155 |        "output_type": "stream",
156 |        "stream": "stdout",
157 |        "text": [
158 |         "Optimize a model with 22 rows, 41 columns and 80 nonzeros\n",
159 |         "(zval,worst_z) => (1.1566666666666665,1.1119444893019914)"
160 |        ]
161 |       },
162 |       {
163 |        "output_type": "stream",
164 |        "stream": "stdout",
165 |        "text": [
166 |         "\n",
167 |         "Presolve time: 0.00s\n",
168 |         "Presolved: 22 rows, 41 columns, 80 nonzeros\n",
169 |         "Variable types: 21 continuous, 20 integer (20 binary)\n",
170 |         "(zval,worst_z) => (1.1566666666666665,1.1111246725600388)\n",
171 |         "\n",
172 |         "Root relaxation: objective 1.156333e+00, 6 iterations, 0.00 seconds\n",
173 |         "(zval,worst_z) => (1.1563333333333332,1.1119444893019914)\n",
174 |         "(zval,worst_z) => (1.1559999999999997,1.1127950730240155)\n",
175 |         "(zval,worst_z) => (1.1556666666666666,1.1136790262540286)\n",
176 |         "(zval,worst_z) => (1.1553333333333333,1.1145993405294434)\n",
177 |         "(zval,worst_z) => (1.1549999999999998,1.1155594867652385)\n",
178 |         "(zval,worst_z) => (1.1546666666666665,1.1165635147016044)\n",
179 |         "(zval,worst_z) => (1.154333333333333,1.1176162164470733)\n",
180 |         "(zval,worst_z) => (1.154,1.1187233395982223)\n",
181 |         "(zval,worst_z) => (1.1536666666666666,1.1198918413411922)\n",
182 |         "(zval,worst_z) => (1.153333333333333,1.1211303081530426)\n",
183 |         "(zval,worst_z) => (1.1529999999999998,1.1224495373588008)\n",
184 |         "(zval,worst_z) => (1.1526666666666663,1.1238634270548151)\n",
185 |         "(zval,worst_z) => (1.1523333333333332,1.1253903875468847)\n",
186 |         "(zval,worst_z) => (1.152,1.1270557049613767)\n",
187 |         "\n",
188 |         "    Nodes    |    Current Node    |     Objective Bounds      |     Work\n",
189 |         " Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time\n",
190 |         "\n",
191 |         "     0     0    1.15174    0   11          -    1.15174     -      -    0s\n",
192 |         "(zval,worst_z) => (1.1470227328301037,1.1396039397398352)\n",
193 |         "H    0     0                       1.1470227    1.15174  0.41%     -    0s\n",
194 |         "     0     0    1.15174    0   11    1.14702    1.15174  0.41%     -    0s\n",
195 |         "     0     0    1.15174    0   11    1.14702    1.15174  0.41%     -    0s\n",
196 |         "(zval,worst_z) => (1.1516666666666666,1.1288956656137035)\n",
197 |         "(zval,worst_z) => (1.1513333333333569,1.1309663308247797)\n",
198 |         "(zval,worst_z) => (1.151000000000038,1.133361658942341)\n",
199 |         "(zval,worst_z) => (1.1470300811632392,1.1395542704160788)\n",
200 |         "*   44     2              10       1.1470301    1.15168  0.41%   3.8    0s\n",
201 |         "(zval,worst_z) => (1.147047478737521,1.139475366906823)\n",
202 |         "*   68     5              10       1.1470475    1.15167  0.40%   3.6    0s\n",
203 |         "(zval,worst_z) => (1.1470682530994947,1.1404573657558754)\n",
204 |         "*   94    15               9       1.1470683    1.15167  0.40%   3.6    0s\n",
205 |         "(zval,worst_z) => (1.1506666666666323,1.1362650225868365)\n",
206 |         "(zval,worst_z) => (1.1471399470276338,1.1397880697939513)\n",
207 |         "*  180     2              12       1.1471399    1.15166  0.39%   3.6    0s\n",
208 |         "(zval,worst_z) => (1.1471438839894306,1.140552062533531)\n",
209 |         "*  206     5              12       1.1471439    1.15166  0.39%   3.5    0s\n",
210 |         "(zval,worst_z) => (1.1471459914930557,1.1407606097364476)\n",
211 |         "*  323    47              11       1.1471460    1.15166  0.39%   3.5    0s\n",
212 |         "(zval,worst_z) => (1.1471684136484515,1.141024846239318)\n",
213 |         "*  340    58              11       1.1471684    1.15166  0.39%   3.5    0s\n",
214 |         "(zval,worst_z) => "
215 |        ]
216 |       },
217 |       {
218 |        "output_type": "stream",
219 |        "stream": "stdout",
220 |        "text": [
221 |         "(1.147173218445699,1.1404116811811653)\n",
222 |         "H  394    68                       1.1471732    1.15166  0.39%   3.5    0s\n",
223 |         "(zval,worst_z) => (1.1471839603170648,1.141472852545052)\n",
224 |         "*  527   118              11       1.1471840    1.15166  0.39%   3.5    0s\n",
225 |         "(zval,worst_z) => (1.1503333333333334,1.140149865710194)\n",
226 |         "(zval,worst_z) => (1.147273659552563,1.14110125789888)\n",
227 |         "*  544    20              14       1.1472737    1.15161  0.38%   3.5    0s\n",
228 |         "(zval,worst_z) => (1.147288763313607,1.1410423565132275)\n",
229 |         "*  569    22              14       1.1472888    1.15161  0.38%   3.5    0s\n",
230 |         "(zval,worst_z) => (1.147316766144706,1.141406256858547)\n",
231 |         "*  594    22              13       1.1473168    1.15161  0.37%   3.5    0s\n",
232 |         "(zval,worst_z) => (1.1473939997497258,1.1423996455925112)\n",
233 |         "*  629    26              12       1.1473940    1.15161  0.37%   3.5    0s\n",
234 |         "(zval,worst_z) => (1.1476045644702562,1.1416959456972795)\n",
235 |         "* 1328   137              16       1.1476046    1.15161  0.35%   3.4    0s\n",
236 |         "(zval,worst_z) => (1.1477106005691295,1.1428023401764515)\n",
237 |         "* 1407   137              15       1.1477106    1.15152  0.33%   3.3    0s\n",
238 |         "(zval,worst_z) => (1.1477847196402158,1.1433278504310909)\n",
239 |         "* 3060   394              30       1.1477847    1.15141  0.32%   2.7    0s\n",
240 |         "(zval,worst_z) => "
241 |        ]
242 |       },
243 |       {
244 |        "output_type": "stream",
245 |        "stream": "stdout",
246 |        "text": [
247 |         "(1.147798868429544,1.1432980916291928)\n",
248 |         "H 5386   979                       1.1477989    1.15141  0.31%   2.3    0s\n",
249 |         "(zval,worst_z) => (1.1478152209575423,1.143279716904396)\n",
250 |         "* 7213  1220              30       1.1478152    1.15127  0.30%   2.1    0s\n",
251 |         "(zval,worst_z) => (1.14785301792211,1.1426888719571178)\n",
252 |         "*12139  1794              30       1.1478530    1.15070  0.25%   1.9    0s\n",
253 |         "(zval,worst_z) => "
254 |        ]
255 |       },
256 |       {
257 |        "output_type": "stream",
258 |        "stream": "stdout",
259 |        "text": [
260 |         "(1.1478601755755777,1.1432619688471866)\n",
261 |         "*15951  1738              30       1.1478602    1.15043  0.22%   1.9    0s\n",
262 |         "(zval,worst_z) => (1.1478937108549316,1.1426547380907053)\n",
263 |         "H20488  1199                       1.1478937    1.15017  0.20%   1.8    0s\n",
264 |         "(zval,worst_z) => "
265 |        ]
266 |       },
267 |       {
268 |        "output_type": "stream",
269 |        "stream": "stdout",
270 |        "text": [
271 |         "(1.1479043342581046,1.142650419192153)\n",
272 |         "*28539    40              30       1.1479043    1.14956  0.14%   1.7    1s\n",
273 |         "\n",
274 |         "Explored 29166 nodes (49935 simplex iterations) in 1.00 seconds\n",
275 |         "Thread count was 8 (of 8 available processors)\n",
276 |         "\n",
277 |         "Optimal solution found (tolerance 1.00e-04)\n",
278 |         "Best objective 1.147904334258e+00, best bound 1.147904334258e+00, gap 0.0%\n"
279 |        ]
280 |       },
281 |       {
282 |        "metadata": {},
283 |        "output_type": "pyout",
284 |        "prompt_number": 1,
285 |        "text": [
286 |         "20-element Array{Float64,1}:\n",
287 |         "  0.417392   \n",
288 |         "  0.29514    \n",
289 |         "  0.0        \n",
290 |         "  0.0        \n",
291 |         "  0.0        \n",
292 |         " -2.35922e-15\n",
293 |         "  0.0        \n",
294 |         "  0.0        \n",
295 |         "  0.0        \n",
296 |         "  0.0        \n",
297 |         "  0.0        \n",
298 |         "  0.0        \n",
299 |         "  0.0        \n",
300 |         "  0.0        \n",
301 |         "  0.0        \n",
302 |         "  0.0        \n",
303 |         "  0.0        \n",
304 |         "  0.09838    \n",
305 |         "  0.0957561  \n",
306 |         "  0.0933315  "
307 |        ]
308 |       }
309 |      ],
310 |      "prompt_number": 1
311 |     },
312 |     {
313 |      "cell_type": "markdown",
314 |      "metadata": {},
315 |      "source": [
316 |       "### Exercise: Replace inner model with closed form expression\n",
317 |       "\n",
318 |       "The cutting plane problem was:\n",
319 |       "\n",
320 |       "$${\\min}_{\\mathbf{p}\\in U} \\mathbf{p}^\\prime \\mathbf{x}^\\ast$$\n",
321 |       "\n",
322 |       "$$ U^\\Gamma = \\left\\{ \\mathbf{p} \\mid p_i = \\bar{p}_i + \\sigma_i d_i, \\|\\mathbf{d}\\|\\leq \\Gamma \\right\\}$$\n",
323 |       "\n",
324 |       "Lets do a little rearrangement, so instead it is\n",
325 |       "\n",
326 |       "$$ U^\\Gamma = \\left\\{ \\mathbf{p} \\mid \\sqrt{\\sum_{i=1}^n \\left( \\frac{p_i - \\bar{p}_i}{\\sigma_i} \\right)} \\leq \\Gamma \\right\\}$$\n",
327 |       "\n",
328 |       "So the problem is maximizing a linear function over an ellipse, which if you go through the KKT conditions you'll find has a nice closed form solution:\n",
329 |       "\n",
330 |       "$$ p^\\ast_i = \\bar{p}_i + \\frac{\\Gamma}{\\| diag(\\sigma) \\mathbf{x}^\\ast \\|} \\sigma^2_i x^\\ast_i$$"
331 |      ]
332 |     },
333 |     {
334 |      "cell_type": "markdown",
335 |      "metadata": {},
336 |      "source": [
337 |       "\n",
338 |       "## Application: Travelling Salesman\n",
339 |       "\n",
340 |       "The most famous application of this is the **Travelling Salesman Problem**. TSP is the problem of finding a tour of shortest length that visits all the nodes in a graph. The decision variables in the MIP formulation correspond to whether we use an arc or not. If there $N$ nodes, we have $N^2$ variables. We will need $N$ constraints to make sure that each city is visited once. However, if you solve this you will find it is not sufficient:\n",
341 |       "\n",
342 |       "![subtours](http://i.imgur.com/rX9EYAr.png)\n",
343 |       "\n",
344 |       "To make sure these subtours don't occur, we need **subtour elimination constraints**. Unfortunately, there are $2^N$ possible subtour elimination constraints, which grows very very fast. The solution is to only add these constraints **lazily**: whenever the MIP solver finds an integer solution, we check for subtours. If we find them, we return a new constraint that will \"break\" the subtours. We then keep solving from there and repeat until no subtours our found. In practice we need far fewer than $2^N$ constraints, which is why it is possible to solve TSPs with 1000s of variables to optimality.\n",
345 |       "\n"
346 |      ]
347 |     },
348 |     {
349 |      "cell_type": "code",
350 |      "collapsed": false,
351 |      "input": [],
352 |      "language": "python",
353 |      "metadata": {},
354 |      "outputs": []
355 |     }
356 |    ],
357 |    "metadata": {}
358 |   }
359 |  ]
360 | }


--------------------------------------------------------------------------------