├── README.md
├── papers
├── baggy.pdf
├── brop.pdf
├── klee.pdf
├── nacl.pdf
├── okws.pdf
├── urweb.pdf
├── android.pdf
├── capsicum.pdf
├── kerberos.pdf
├── forcehttps.pdf
├── medical-sw.pdf
├── passwords.pdf
├── taintdroid.pdf
├── tor-design.pdf
├── owasp-top-10.pdf
├── trajectories.pdf
├── brumley-timing.pdf
├── confused-deputy.pdf
├── lookback-tcpip.pdf
├── private-browsing.pdf
├── passwords-extended.pdf
└── .htaccess
├── Makefile
├── .htaccess
├── old-quizzes.md
├── old-quizzes.html
├── quiz2-tor.md
├── quiz2-tor.html
├── README.html
├── quiz2-medical-dev.md
├── index.md
├── previous-years
├── l12-resin.txt
├── l14-resin.txt
├── l22-usability-2.txt
├── l21-captcha.txt
├── l20-bots.txt
├── l23-voting.txt
├── l21-dropbox.txt
├── l18-dealloc.txt
├── l19-backtracker.txt
├── l20-traceback.txt
├── l17-vanish.txt
├── l07-xfi.txt
├── l11-spins.html
├── l08-browser-security.txt
├── l22-usability.txt
├── l19-cryptdb.txt
├── l06-java.txt
└── l10-memauth.html
├── quiz2-medical-dev.html
├── index.html
└── l08-my-web-security.md
/README.md:
--------------------------------------------------------------------------------
1 | index.md
--------------------------------------------------------------------------------
/papers/baggy.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/baggy.pdf
--------------------------------------------------------------------------------
/papers/brop.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/brop.pdf
--------------------------------------------------------------------------------
/papers/klee.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/klee.pdf
--------------------------------------------------------------------------------
/papers/nacl.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/nacl.pdf
--------------------------------------------------------------------------------
/papers/okws.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/okws.pdf
--------------------------------------------------------------------------------
/papers/urweb.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/urweb.pdf
--------------------------------------------------------------------------------
/papers/android.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/android.pdf
--------------------------------------------------------------------------------
/papers/capsicum.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/capsicum.pdf
--------------------------------------------------------------------------------
/papers/kerberos.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/kerberos.pdf
--------------------------------------------------------------------------------
/papers/forcehttps.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/forcehttps.pdf
--------------------------------------------------------------------------------
/papers/medical-sw.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/medical-sw.pdf
--------------------------------------------------------------------------------
/papers/passwords.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/passwords.pdf
--------------------------------------------------------------------------------
/papers/taintdroid.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/taintdroid.pdf
--------------------------------------------------------------------------------
/papers/tor-design.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/tor-design.pdf
--------------------------------------------------------------------------------
/papers/owasp-top-10.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/owasp-top-10.pdf
--------------------------------------------------------------------------------
/papers/trajectories.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/trajectories.pdf
--------------------------------------------------------------------------------
/papers/brumley-timing.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/brumley-timing.pdf
--------------------------------------------------------------------------------
/papers/confused-deputy.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/confused-deputy.pdf
--------------------------------------------------------------------------------
/papers/lookback-tcpip.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/lookback-tcpip.pdf
--------------------------------------------------------------------------------
/papers/private-browsing.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/private-browsing.pdf
--------------------------------------------------------------------------------
/papers/passwords-extended.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/passwords-extended.pdf
--------------------------------------------------------------------------------
/papers/.htaccess:
--------------------------------------------------------------------------------
1 | # Protect the htaccess file
2 |
3 | Order Allow,Deny
4 | Deny from all
5 |
6 |
7 | # Enable directory browsing
8 | Options All Indexes
9 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | SRCS=$(wildcard *.md)
2 |
3 | HTMLS=$(SRCS:.md=.html)
4 |
5 | %.html: %.md
6 | @echo "Compiling $< -> $*"
7 | markdown $< >$*.html
8 |
9 | all: $(HTMLS)
10 | @echo "HTMLs: $(HTMLS)"
11 | @echo "MDs: $(SRCS)"
12 |
--------------------------------------------------------------------------------
/.htaccess:
--------------------------------------------------------------------------------
1 | # Protect the htaccess file
2 |
3 | Order Allow,Deny
4 | Deny from all
5 |
6 |
7 | # Protect .git/
8 |
9 | Order Allow,Deny
10 | Deny from all
11 |
12 |
13 |
14 | Order Allow,Deny
15 | Deny from all
16 |
17 |
18 |
19 | Order Allow,Deny
20 | Deny from all
21 |
22 |
23 |
24 | Order Allow,Deny
25 | Deny from all
26 |
27 |
28 |
29 | Order Allow,Deny
30 | Deny from all
31 |
32 |
33 | # Disable directory browsing
34 | Options All -Indexes
35 |
--------------------------------------------------------------------------------
/old-quizzes.md:
--------------------------------------------------------------------------------
1 | Some questions may already be [here](http://css.csail.mit.edu/6.858/2014/quiz.html)
2 |
3 | Quiz 2 2011
4 | -----------
5 |
6 | Q8: An "Occupy Northbridge" protestor has set up a Twitter
7 | account to broadcast messages under an assumed name. In
8 | order to remain anonymous, he decides to use Tor to log into
9 | the account. He installs Tor on his computer (from a
10 | trusted source) and enables it, launches Firefox, types in
11 | www.twitter.com into his browser, and proceeds to log in.
12 | What adversaries may be able to now compromise the protestor
13 | in some way as a result of him using Tor? Ignore security
14 | bugs in the Tor client itself.
15 |
16 | A8: The protestor is vulnerable to a malicious exit node
17 | intercepting his non-HTTPS-protected connection. (Since Tor
18 | involves explicitly proxying through an exit node, this is
19 | easier than intercepting HTTP over the public internet.)
20 |
21 |
22 | Q9: The protestor now uses the same Firefox browser to
23 | connect to another web site that hosts a discussion forum,
24 | also via Tor (but only after building a fresh Tor circuit).
25 | His goal is to ensure that Twitter and the forum cannot
26 | collude to determine that the same person accessed Twitter
27 | and the forum. To avoid third-party tracking, he deletes all
28 | cookies, HTML5 client-side storage, history, etc. from his
29 | browser between visits to different sites. How could an
30 | adversary correlate his original visit to Twitter and his
31 | visit to the forum, assuming no software bugs, and a large
32 | volume of other traffic to both sites?
33 |
34 | A9: An adversary can fingerprint the protestor's browser,
35 | using the user-agent string, the plug-ins installed on that
36 | browser, window dimensions, etc., which may be enough to
37 | strongly correlate the two visits.
38 |
39 | ---
40 |
41 | Quiz 2, 2012
42 | ------------
43 |
44 | Q2: Alyssa wants to learn the identity of a hidden service
45 | running on Tor. She plans to set up a malicious Tor OR, set
46 | up a rendezvous point on that malicious Tor OR, and send
47 | this rendezvous point's address to the introduction point of
48 | the hidden service. Then, when the hidden service connects
49 | to the malicious rendezvous point, the malicious Tor OR will
50 | record where the connection is coming from.
51 |
52 | Will Alyssa's plan work? Why or why not?
53 |
54 | A2: Will not work. A new Tor circuit is constructed between
55 |
--------------------------------------------------------------------------------
/old-quizzes.html:
--------------------------------------------------------------------------------
1 |
Some questions may already be here
2 |
3 | Quiz 2 2011
4 |
5 | Q8: An "Occupy Northbridge" protestor has set up a Twitter
6 | account to broadcast messages under an assumed name. In
7 | order to remain anonymous, he decides to use Tor to log into
8 | the account. He installs Tor on his computer (from a
9 | trusted source) and enables it, launches Firefox, types in
10 | www.twitter.com into his browser, and proceeds to log in.
11 | What adversaries may be able to now compromise the protestor
12 | in some way as a result of him using Tor? Ignore security
13 | bugs in the Tor client itself.
14 |
15 | A8: The protestor is vulnerable to a malicious exit node
16 | intercepting his non-HTTPS-protected connection. (Since Tor
17 | involves explicitly proxying through an exit node, this is
18 | easier than intercepting HTTP over the public internet.)
19 |
20 | Q9: The protestor now uses the same Firefox browser to
21 | connect to another web site that hosts a discussion forum,
22 | also via Tor (but only after building a fresh Tor circuit).
23 | His goal is to ensure that Twitter and the forum cannot
24 | collude to determine that the same person accessed Twitter
25 | and the forum. To avoid third-party tracking, he deletes all
26 | cookies, HTML5 client-side storage, history, etc. from his
27 | browser between visits to different sites. How could an
28 | adversary correlate his original visit to Twitter and his
29 | visit to the forum, assuming no software bugs, and a large
30 | volume of other traffic to both sites?
31 |
32 | A9: An adversary can fingerprint the protestor's browser,
33 | using the user-agent string, the plug-ins installed on that
34 | browser, window dimensions, etc., which may be enough to
35 | strongly correlate the two visits.
36 |
37 |
38 |
39 | Quiz 2, 2012
40 |
41 | Q2: Alyssa wants to learn the identity of a hidden service
42 | running on Tor. She plans to set up a malicious Tor OR, set
43 | up a rendezvous point on that malicious Tor OR, and send
44 | this rendezvous point's address to the introduction point of
45 | the hidden service. Then, when the hidden service connects
46 | to the malicious rendezvous point, the malicious Tor OR will
47 | record where the connection is coming from.
48 |
49 | Will Alyssa's plan work? Why or why not?
50 |
51 | A2: Will not work. A new Tor circuit is constructed between
52 |
--------------------------------------------------------------------------------
/quiz2-tor.md:
--------------------------------------------------------------------------------
1 | Tor
2 | ===
3 | ---
4 | ## Resources
5 |
6 | * [Paper](http://css.csail.mit.edu/6.858/2014/readings/tor-design.pdf)
7 | * Blog posts: [1](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-1), [2](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-2), [3](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-3)
8 | * [Lecture note from 2012](http://css.csail.mit.edu/6.858/2012/lec/l16-tor.txt)
9 | * [Old quizzes](http://css.csail.mit.edu/6.858/2014/quiz.html)
10 |
11 | ---
12 |
13 | ## Overview
14 |
15 | - Goals
16 | - Mechanisms
17 | * Streams/Circuits
18 | * Rendezvous Points & Hidden services
19 | - Directory Servers
20 | - Attacks & Defenses
21 | - Practice Problems
22 |
23 | ---
24 |
25 | ## Goals
26 |
27 | - Anonymous communication
28 | - Responder anonymity
29 | * If I run a service like "mylittleponey.com" I don't want anyone
30 | associating me with that service
31 | - Deployability / usability
32 | * Why a security goal?
33 | + Because it increases the # of people using Tor, i.e. the _anonimity set_
34 | - ...which in turn increases security
35 | * (adversary has more people to distinguish you amongst)
36 | - TCP layer (Why? See explanations in lecture notes above)
37 | - **NOT** P2P (because more vulnerable?)
38 |
39 | ---
40 |
41 | ## Circuit creation
42 |
43 | TODO: Define circuit
44 |
45 | Alice multiplexes many TCP streams onto a few _circuits_. Why? Low-latency system, expensive to make new circuit.
46 |
47 | TODO: Define Onion Router (OR)
48 |
49 | _Directory server_: State of network, OR public keys, OR IPs
50 |
51 | ORs:
52 |
53 | - All connected to one another with TLS
54 | - See blog post 1: Authorities vote on consensus directory document
55 |
56 | Example:
57 |
58 | [ Draw example of Alice building a new circuit ]
59 | [ and connecting to Twitter. ]
60 |
61 | ---
62 |
63 | ## Rendezvous Points & Hidden services
64 |
65 | Example:
66 |
67 | [ Add an example of Alice connecting to Bob's ]
68 | [ hidden service on Tor ]
69 |
70 | Bob runs hidden service (HS):
71 |
72 | - Decides on long term PK/SK pair
73 | - Publish introduction points, advertises on lookup service
74 | - Builds a circuit to _Intro Points_, waits for messages
75 |
76 | Alice wants to connect to Bob's HS:
77 |
78 | - Build circuit to new _Rendezvous Point (RP)_ (any OR)
79 | * Gives _cookie_ to RP
80 | - Builds circuit to one of Bob's intro points and sends message
81 | * with `{RP, Cookie, g^x}_PK(Bob)`
82 | - Bob builds circuit to RP, sends `{ cookie, g^y, H(K)}`
83 | - RP connects Alice and Bob
84 |
--------------------------------------------------------------------------------
/quiz2-tor.html:
--------------------------------------------------------------------------------
1 | Tor
2 |
3 |
4 |
5 | Resources
6 |
7 |
13 |
14 |
15 |
16 | Overview
17 |
18 |
19 | - Goals
20 | - Mechanisms
21 |
22 | - Streams/Circuits
23 | - Rendezvous Points & Hidden services
24 |
25 | - Directory Servers
26 | - Attacks & Defenses
27 | - Practice Problems
28 |
29 |
30 |
31 |
32 | Goals
33 |
34 |
35 | - Anonymous communication
36 | - Responder anonymity
37 |
38 | - If I run a service like "mylittleponey.com" I don't want anyone
39 | associating me with that service
40 |
41 | - Deployability / usability
42 |
43 | - Why a security goal?
44 |
45 | - Because it increases the # of people using Tor, i.e. the anonimity set
46 | - ...which in turn increases security
47 |
48 | - (adversary has more people to distinguish you amongst)
49 |
50 |
51 |
52 | - TCP layer (Why? See explanations in lecture notes above)
53 | - NOT P2P (because more vulnerable?)
54 |
55 |
56 |
57 |
58 | Circuit creation
59 |
60 | TODO: Define circuit
61 |
62 | Alice multiplexes many TCP streams onto a few circuits. Why? Low-latency system, expensive to make new circuit.
63 |
64 | TODO: Define Onion Router (OR)
65 |
66 | Directory server: State of network, OR public keys, OR IPs
67 |
68 | ORs:
69 |
70 |
71 | - All connected to one another with TLS
72 | - See blog post 1: Authorities vote on consensus directory document
73 |
74 |
75 | Example:
76 |
77 | [ Draw example of Alice building a new circuit ]
78 | [ and connecting to Twitter. ]
79 |
80 |
81 |
82 |
83 | Rendezvous Points & Hidden services
84 |
85 | Example:
86 |
87 | [ Add an example of Alice connecting to Bob's ]
88 | [ hidden service on Tor ]
89 |
90 |
91 | Bob runs hidden service (HS):
92 |
93 |
94 | - Decides on long term PK/SK pair
95 | - Publish introduction points, advertises on lookup service
96 | - Builds a circuit to Intro Points, waits for messages
97 |
98 |
99 | Alice wants to connect to Bob's HS:
100 |
101 |
102 | - Build circuit to new Rendezvous Point (RP) (any OR)
103 |
104 | - Gives cookie to RP
105 |
106 | - Builds circuit to one of Bob's intro points and sends message
107 |
108 | - with
{RP, Cookie, g^x}_PK(Bob)
109 |
110 | - Bob builds circuit to RP, sends
{ cookie, g^y, H(K)}
111 | - RP connects Alice and Bob
112 |
113 |
--------------------------------------------------------------------------------
/README.html:
--------------------------------------------------------------------------------
1 | Computer systems security notes (6.858, Fall 2014)
2 |
3 | Lecture notes from 6.858, taught by Prof. Nickolai Zeldovich and Prof. James Mickens in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 course website.
4 |
5 |
6 | - Lecture 1: Introduction: what is security, what's the point, no perfect security, policy, threat models, assumptions, mechanism, buffer overflows
7 | - Lecture 2: Control hijacking attacks: buffer overflows, stack canaries, bounds checking, electric fences, fat pointers, shadow data structure, Jones & Kelly, baggy bounds checking
8 | - Lecture 3: More baggy bounds and return oriented programming: costs of bounds checking, non-executable memory, address-space layout randomization (ASLR), return-oriented programming (ROP), stack reading, blind ROP, gadgets
9 | - Lecture 4: OKWS: privilege separation, Linux discretionary access control (DAC), UIDs, GIDs, setuid/setgid, file descriptors, processes, the Apache webserver, chroot jails, remote procedure calls (RPC)
10 | - Lecture 5: Penetration testing guest lecture by Paul Youn, iSEC Partners
11 | - Lecture 6: Capsicum: confused deputy problem, ambient authority, capabilities, sandboxing, discretionary access control (DAC), mandatory access control (MAC), Capsicum
12 | - Lecture 7: Native Client (NaCl): sandboxing x86 native code, software fault isolation, reliable disassembly, x86 segmentation
13 | - Lecture 8: Web Security, Part I: modern web browsers, same-origin policy, frames, DOM nodes, cookies, cross-site request forgery (CSRF) attacks, DNS rebinding attacks, browser plugins
14 | - Lecture 9: Web Security, Part II: cross-site scripting (XSS) attacks, XSS defenses, SQL injection atacks, Django, session management, cookies, HTML5 local storage, HTTP protocol ambiguities, covert channels
15 | - Lecture 10: Symbolic execution guest lecture by Prof. Armando Solar-Lezama, MIT CSAIL
16 | - Lecture 11: Ur/Web guest lecture by Prof. Adam Chlipala, MIT, CSAIL
17 | - Lecture 12: TCP/IP security: threat model, sequence numbers and attacks, connection hijacking attacks, SYN flooding, bandwidth amplification attacks, routing
18 | - Lecture 13: Kerberos: Kerberos architecture and trust model, tickets, authenticators, ticket granting servers, password-changing, replication, network attacks, forward secrecy
19 | - Lecture 14: ForceHTTPS: certificates, HTTPS, Online Certificate Status Protocol (OCSP), ForceHTTPS
20 | - Lecture 15: Medical software guest lecture by Prof. Kevin Fu, U. Michigan
21 | - Lecture 16: Timing attacks: side-channel attacks, RSA encryption, RSA implementation, modular exponentiation, Chinese remainder theorem (CRT), repeated squaring, Montgomery representation, Karatsuba multiplication, RSA blinding, other timing attacks
22 | - Lecture 17: User authentication: what you have, what you know, what you are, passwords, challenge-response, usability, deployability, security, biometrics, multi-factor authentication (MFA), MasterCard's CAP reader
23 | - Lecture 18: Private browsing: private browsing mode, local and web attackers, VM-level privacy, OS-level privacy, OS-level privacy, what browsers implement, browser extensions
24 | - Lecture 19: Tor guest lecture by Nick Mathewson, Tor Project
25 |
26 | - 6.858 notes from 2012 on Anonymous communication: onion routing, Tor design, Tor circuits, Tor streams, Tor hidden services, blocking Tor, dining cryptographers networks (DC-nets)
27 |
28 | - Lecture 20: Mobile phone security: Android applications, activities, services, content providers, broadcast receivers, intents, permissions, labels, reference monitor, broadcast intents
29 | - Lecture 21: Information flow tracking: TaintDroid, Android data leaks, information flow control, taint tracking, taint flags, implicit flows, x86 taint tracking, TightLip
30 | - Lecture 22: MIT's IS&T guest lecture by Mark Silis and David LaPorte, MIT IS&T
31 | - Lecture 23: Security economics: economics of cyber-attacks, the spam value chain, advertising, click-support, realization, CAPTCHAs, botnets, payment protocols, ethics
32 |
33 |
--------------------------------------------------------------------------------
/quiz2-medical-dev.md:
--------------------------------------------------------------------------------
1 | 6.858 Quiz 2 Review
2 | ===================
3 |
4 | Medical Device Security
5 | -----------------------
6 |
7 | FDA standards: Semmelweis e.g. `=>` Should wash hands
8 |
9 | Defirbillator:
10 |
11 | - 2003: Implanted defibrillator use WiFi. What could
12 | possibly go wrong?
13 | - Inside: battery, radio, hermetically sealed
14 |
15 | Why wireless?
16 |
17 | - Old way: Inject a needle into arm to twist dial, risk of infection :(
18 |
19 | **Q:** What are security risks of wireless?
20 |
21 | - Unsafe practices - implementation errors.
22 | - Manufacturer and User Facility Device Experience (MAUDE) database
23 | * Cause of death: buffer overflow in infusion pump.
24 | * Error detected, but brought to safe mode, turn off pump.
25 | * Patient died after increase in brain pressure because
26 | no pump, because of buffer overflow.
27 |
28 | #### Human factors and software
29 |
30 | Why unique?
31 |
32 | 500+ deaths
33 |
34 | E.g. User interface for delivering dosage to patients did not properly indicate
35 | whether it expected hours or minutes as input (hh:mm:ss). Led to order of
36 | magnitude error: 20 min vs. the intended 20 hrs.
37 |
38 | #### Managerial issues
39 |
40 | Medical devices also need to take software updates.
41 |
42 | E.g. McAffee classified DLL as malicious, quarantines,
43 | messed up hospital services.
44 |
45 | E.g. hospitals using Windows XP:
46 | - There are no more security updates from Microsoft for XP, but still new medical products shipping Windows XP.
47 |
48 |
49 | #### FDA Cybersecurity Guidance
50 |
51 | What is expected to be seen from manufacturers? How they
52 | have thought through the security problems / risks /
53 | mitigation strategies / residual risks?
54 |
55 |
56 | #### Adversary stuff
57 |
58 | Defibrillator & Implants
59 |
60 | This section of the notes refers to the discussion of attacks on implanted defibrillators from Kevin Fu's lecture. In one example he gave, the implanted devices are wirelessly programmed with another device called a "wand", which uses a proprietary (non-public, non-standardized) protocol. Also, the wand transmits (and the device listens) on specially licensed EM spectrum (e.g. not WiFI or bluetooth). The next two lines describe the surgical process by which the defibrillator is implanted in the patient.
61 |
62 | - Device programmed w/ wand, speaking proprietary protocol
63 | over specially licensed spectrum. (good idea w.r.t.
64 | security?)
65 | - Patient awake but numbed and sedated
66 | - Six people weave electrodes through blood vessel....
67 |
68 | - Patient given a base station, looks like AP, speaks proprietary RF to implant,
69 | data sent via Internet to healthcare company
70 |
71 | - Communication between device and programmer: no crypto / auth, data sent in plaintext
72 | - Device stores: Patient name, DOB, make & model, serial no., more...
73 |
74 | - ???????? Use a software radio (USRP/GNU Radio Software)
75 |
76 | **Q:** Can you wirelessly induce a fatal heart rhythm
77 | **A:** Yes. Device emitted 500V shock in 1 msec. E.g. get kicked in chest by horse.
78 |
79 | Devices fixed through software updates?
80 |
81 | #### Healthcare Providers
82 |
83 | Screenshot of "Hospitals Stuck with Windows XP": 600 Service Pack 0 Windows XP devices in the hospital!
84 |
85 | Average time to infection for healthcare devices:
86 | - 12 days w/o protection
87 | - 1 year w/ antivirus
88 |
89 | #### Vendors are a common source of infection
90 |
91 | USB drive is a common vector for infection.
92 |
93 | #### Medical device signatures over download
94 |
95 | "Click here to download software update"
96 |
97 | - Website appears to contain malware
98 | - Chrome: Safe web browsing service detected "ventilator" malware
99 |
100 | "Drug Compounder" example:
101 |
102 | - Runs Windows XP embedded
103 | - **FDA expects manufacturers to keep SW up to date**
104 | - **Manufacturers claim cannot update because of FDA**
105 | * _double you tea f?_
106 |
107 | #### How significant intentional malicious SW malfunctions?
108 |
109 | E.g. 1: Chicago 1982: Somebody inserts cyanide into Tylenol
110 | E.g. 2: Somebody posted flashing images on epillepsy support group website.
111 |
112 |
113 | #### Why do you trust sensors?
114 |
115 | E.g. smartphones. Batteryless sensors demo. Running on an MSP430. uC believes
116 | anything coming from ADC to uC. Possible to do something related to resonant
117 | freq. of wire there?
118 |
119 | Inject interference into the baseband
120 |
121 | - Hard to filter in the analog
122 | - `=>` Higher quality audio w/ interference than microphone
123 |
124 | Send a signal that matches resonant frequency of the wire.
125 |
126 | Treat circuit as unintentional demodulator
127 |
128 | - Can use high frequency signal to trick uC into thinking
129 | - there is a low frequency signal due to knowing interrupt
130 | frequency of uC and related properties.
131 |
132 | Cardiac devices vulnerable to baseband EMI
133 |
134 | - Insert intentional EM interference in baseband
135 |
136 | Send pulsed sinewave to trick defibrilator into thinking heart beating correctly
137 |
138 | - ????? Works in vitro
139 | - Hard to replicate in a body or saline solution
140 |
141 | Any defenses?
142 |
143 | - Send an extra pacing pulse right after a beat
144 | * a real heart shouldn't send a response
145 |
146 | #### Detecting malware at power outlets
147 |
148 | Embedded system `<-->` WattsUpDoc `<-->` Power outlet
149 |
150 | #### Bigger problems than security?
151 |
152 | **Q:** True or false: Hackers breaking into medical devices is
153 | the biggest risk at the moment.
154 |
155 | **A:** False. Wide scale unavailability of patient care and integrity of
156 | medical sensors are more important.
157 |
158 | Security cannot be bolted on
159 |
160 | - E.g. MRI on windows 95
161 | - E.g. Pacemaker programmer running on OS/2
162 |
163 | Check gmail on medical devices, etc.
164 |
165 | Run pandora on medical machine.
166 |
167 | Keep clinical workflow predictable.
168 |
169 |
--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
1 | Computer systems security notes (6.858, Fall 2014)
2 | ==================================================
3 |
4 | Lecture notes from 6.858, taught by [Prof. Nickolai Zeldovich](http://people.csail.mit.edu/nickolai/) and [Prof. James Mickens](http://research.microsoft.com/en-us/people/mickens/) in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 [course website](http://css.csail.mit.edu/6.858/2014/schedule.html).
5 |
6 | * Lecture **1**: [Introduction](l01-intro.html): what is security, what's the point, no perfect security, policy, threat models, assumptions, mechanism, buffer overflows
7 | * Lecture **2**: [Control hijacking attacks](l02-baggy.html): buffer overflows, stack canaries, bounds checking, electric fences, fat pointers, shadow data structure, Jones & Kelly, baggy bounds checking
8 | * Lecture **3**: [More baggy bounds and return oriented programming](l03-brop.html): costs of bounds checking, non-executable memory, address-space layout randomization (ASLR), return-oriented programming (ROP), stack reading, blind ROP, gadgets
9 | * Lecture **4**: [OKWS](l04-okws.html): privilege separation, Linux discretionary access control (DAC), UIDs, GIDs, setuid/setgid, file descriptors, processes, the Apache webserver, chroot jails, remote procedure calls (RPC)
10 | * Lecture **5**: **Penetration testing** _guest lecture_ by Paul Youn, iSEC Partners
11 | * Lecture **6**: [Capsicum](l06-capsicum.html): confused deputy problem, ambient authority, capabilities, sandboxing, discretionary access control (DAC), mandatory access control (MAC), Capsicum
12 | * Lecture **7**: [Native Client (NaCl)](l07-nacl.html): sandboxing x86 native code, software fault isolation, reliable disassembly, x86 segmentation
13 | * Lecture **8**: [Web Security, Part I](l08-web-security.html): modern web browsers, same-origin policy, frames, DOM nodes, cookies, cross-site request forgery (CSRF) attacks, DNS rebinding attacks, browser plugins
14 | * Lecture **9**: [Web Security, Part II](l09-web-defenses.html): cross-site scripting (XSS) attacks, XSS defenses, SQL injection atacks, Django, session management, cookies, HTML5 local storage, HTTP protocol ambiguities, covert channels
15 | * Lecture **10**: **Symbolic execution** _guest lecture_ by Prof. Armando Solar-Lezama, MIT CSAIL
16 | * Lecture **11**: **Ur/Web** _guest lecture_ by Prof. Adam Chlipala, MIT, CSAIL
17 | * Lecture **12**: [TCP/IP security](l12-tcpip.html): threat model, sequence numbers and attacks, connection hijacking attacks, SYN flooding, bandwidth amplification attacks, routing
18 | * Lecture **13**: [Kerberos](l13-kerberos.html): Kerberos architecture and trust model, tickets, authenticators, ticket granting servers, password-changing, replication, network attacks, forward secrecy
19 | * Lecture **14**: [ForceHTTPS](l14-forcehttps.html): certificates, HTTPS, Online Certificate Status Protocol (OCSP), ForceHTTPS
20 | * Lecture **15**: **Medical software** _guest lecture_ by Prof. Kevin Fu, U. Michigan
21 | * Lecture **16**: [Timing attacks](l16-timing-attacks.html): side-channel attacks, RSA encryption, RSA implementation, modular exponentiation, Chinese remainder theorem (CRT), repeated squaring, Montgomery representation, Karatsuba multiplication, RSA blinding, other timing attacks
22 | * Lecture **17**: [User authentication](l17-authentication.html): what you have, what you know, what you are, passwords, challenge-response, usability, deployability, security, biometrics, multi-factor authentication (MFA), MasterCard's CAP reader
23 | * Lecture **18**: [Private browsing](l18-priv-browsing.html): private browsing mode, local and web attackers, VM-level privacy, OS-level privacy, OS-level privacy, what browsers implement, browser extensions
24 | * Lecture **19**: **Tor** _guest lecture_ by Nick Mathewson, Tor Project
25 | + 6.858 notes from 2012 on [Anonymous communication](l19-tor.html): onion routing, Tor design, Tor circuits, Tor streams, Tor hidden services, blocking Tor, dining cryptographers networks (DC-nets)
26 | * Lecture **20**: [Mobile phone security](l20-android.html): Android applications, activities, services, content providers, broadcast receivers, intents, permissions, labels, reference monitor, broadcast intents
27 | * Lecture **21**: [Information flow tracking](l21-taintdroid.html): TaintDroid, Android data leaks, information flow control, taint tracking, taint flags, implicit flows, x86 taint tracking, TightLip
28 | * Lecture **22**: **MIT's IS&T** _guest lecture_ by Mark Silis and David LaPorte, MIT IS&T
29 | * Lecture **23**: [Security economics](l23-click-trajectories.html): economics of cyber-attacks, the spam value chain, advertising, click-support, realization, CAPTCHAs, botnets, payment protocols, ethics
30 |
31 | Papers
32 | ------
33 |
34 | List of papers we read ([papers/](papers/)):
35 |
36 | - [Baggy bounds checking](papers/baggy.pdf)
37 | - [Hacking blind](papers/brop.pdf)
38 | - [OKWS](papers/okws.pdf)
39 | - [The confused deputy](papers/confused-deputy.pdf) (or why capabilities might have been invented)
40 | - [Capsicum](papers/capsicum.pdf) (capabilities)
41 | - [Native Client](papers/nacl.pdf) (sandboxing x86 code)
42 | - [OWASP Top 10](papers/owasp-top-10.pdf), the most critical web application security risks
43 | - [KLEE](papers/klee.pdf) (symbolic execution)
44 | - [Ur/Web](papers/urweb.pdf) (functional programming for the web)
45 | - [A look back at "Security problems in the TCP/IP protocol suite"](papers/lookback-tcpip.pdf)
46 | - [Kerberos](papers/kerberos.pdf): An authentication service for open network systems
47 | - [ForceHTTPs](papers/forcehttps.pdf)
48 | - [Trustworthy Medical Device Software](papers/medical-sw.pdf)
49 | - [Remote timing attacks are practical](papers/brumley-timing.pdf)
50 | - [The quest to replace passwords](papers/passwords.pdf)
51 | - [Private browsing modes](papers/private-browsing.pdf)
52 | - [Tor](papers/tor-design.pdf): the second-generation onion router
53 | - [Understanding android security](papers/android.pdf)
54 | - [TaintDroid](papers/taintdroid.pdf): an information-flow tracking system for realtime privacy monitoring on smartphones
55 | - [Click trajectories](papers/trajectories.pdf): End-to-end analysis of the spam value chain
56 |
--------------------------------------------------------------------------------
/previous-years/l12-resin.txt:
--------------------------------------------------------------------------------
1 | Resin
2 | =====
3 |
4 | administrivia:
5 | quiz 1 on Wednesday
6 | Xi: office hours for quiz review questions?
7 | lab 3 out today, first part due in ~1.5 weeks
8 |
9 | what kinds of problems is this paper trying to address?
10 | missing security checks in application code
11 | sanitizing user inputs for SQL injection or cross-site scripting
12 | calling access control functions for sensitive data
13 | protected wiki page; user's password
14 | checking where code came from before running it
15 |
16 | one such problem: cross-site scripting
17 | setting: one web server, multiple users
18 | users interact with each other (e.g. get a list of online users)
19 | attacker's plan: inject JS code in a script tag as part of user name
20 | victim's browser sees this code in the HTML page, runs it
21 | what kind of code could attacker inject?
22 | maybe steal the user's HTTP cookie
23 | how? create an image tag containing document.cookie
24 | why doesn't the browser's same-origin policy protect the cookie?
25 | as far as the browser is concerned, code came from server's origin
26 | lab 1's web server was vulnerable, as it turns out!
27 | http://.../
28 | returns: File not found: /
29 |
30 | a similar problem: SQL injection
31 | saw examples in previous lectures
32 | problems arise if programmer forgets to quote user inputs
33 |
34 | different kind of a problem: access control checks
35 | might have protected pages in a wiki, forget to call ACL function
36 | concrete example: hotcrp's password disclosure
37 | typical web site, sends password reminders
38 | email preview mode displays emails instead of sending
39 | turns out to display pw reminders in the requesting user's browser
40 | kind-of like the confused deputy prob: no module is really at fault?
41 |
42 | why are the checks missing?
43 | lots of places in the code where they need to be performed
44 | think of application as a black box; lots of inputs and outputs
45 | suppose that for a given output, only some inputs were OK
46 | e.g. sanitize user inputs in a SQL query, but not app's own data
47 | hard to tell where the output's data came from
48 | so, programmers try to do checks on all possible paths
49 | programmer forgets them on some paths from input to output
50 |
51 | what's the plan to prevent these?
52 | think of the checks as being associated with data flows input->output
53 | associate checks with data objects like user input or password strings
54 | perform checks whenever data gets used in some interesting way
55 |
56 | what does resin provide?
57 | [ diagram from figure 1 ]
58 | data tracking
59 | how does this work? assumes a language runtime
60 | python, php have a byte code representation, sort-of like java
61 | resin tags strings, integers with a policy object
62 | changes the implementation of operations that manipulate data
63 | why only tag strings and integers? what about other things?
64 | what kinds of operations propagate?
65 | why not propagate across "covert" or "implicit" channels?
66 | why byte-level tracking?
67 | what happens when data items are combined?
68 | concat two strings
69 | add two integers
70 | take a substring
71 | what happens for sha1sum() or touppercase() [which uses array lookups]?
72 | policy objects
73 | contains code to implement policy for its data
74 | what methods does the programmer have to implement in a policy object?
75 | export_check
76 | merge [optional]
77 | filter objects
78 | provided by default by resin for most external channels
79 | context information: combination of resin- and programmer-supplied
80 | how much synchronization does there need to be between filters & policies?
81 |
82 | what are all of the uses for filter objects?
83 | default filters for external boundaries
84 | persistent serialization
85 | files: extended attributes
86 | database: extra columns for policies, SQL rewriting
87 | code imports
88 | interpreter's input is yet another kind of channel
89 | write access control
90 | persistent filters on FS objects like files, directories
91 | almost a different kind of check: tied to an external object, not data
92 | propagation rules for functions
93 | sha1sum(), touppercase(), ..
94 |
95 | how would you use resin to prevent missing checks?
96 | hotcrp
97 | cross-site scripting
98 |
99 | does this system actually work?
100 | two versions of resin, one for python and one for php
101 | prevented known bugs in real apps
102 | prevented unknown bugs in real apps too
103 | few different kinds of bugs (ACL, XSS, SQL inj, directory traversal, ..)
104 | is it possible to forget checks with resin?
105 | what does resin provide/guarantee?
106 | are there potential pitfalls with resin's assertions?
107 | how much code is required to write these assertions? why?
108 | how specific are the assertions to the bug you want to prevent? why?
109 | how did they prevent the myphpscripts login library bug?
110 |
111 | what's the cost?
112 | need to deploy a new php/python interpreter
113 | need to write some assertions (policy objects?)
114 | runtime overheads: memory to store policies, CPU time to track them
115 | major cost: serializing policies to SQL, file system
116 | could that be less?
117 |
118 | how else can you avoid these missing check problems?
119 | IFC does data tracking in some logical sense
120 | trade-off: redesign/rewrite your app around some checks
121 | hard to redesign around multiple checks or to add a check later
122 | java stack inspection
123 | can't automatically perform checks for things that are off the stack
124 | can check if file is being read through a sanitizing/ACL-check function
125 | crimps programmer's style, but in theory possible
126 | express some of these checks in the type system
127 | maybe have a special kind of UntrustedString vs SafeString
128 | and conversely SQLString and HTMLString which get used for output
129 | special conversion rules for them
130 | could even do static checks for these data flows
131 | for password disclosure, ACL checks: maybe a delayed-check string?
132 | when about to send out the string, tell it where you're sending it
133 | almost like resin design
134 | problem with using the type system:
135 | policies intertwined with code throughout the app
136 | to add a new check, need to change types everywhere
137 | resin is almost like a shadow type system
138 |
139 | could you apply resin to other applications, or other environments?
140 | different languages?
141 | different machines (cluster of web servers)?
142 | no language runtime?
143 | untrusted/malicious code?
144 |
145 |
--------------------------------------------------------------------------------
/quiz2-medical-dev.html:
--------------------------------------------------------------------------------
1 | 6.858 Quiz 2 Review
2 |
3 | Medical Device Security
4 |
5 | FDA standards: Semmelweis e.g. => Should wash hands
6 |
7 | Defirbillator:
8 |
9 |
10 | - 2003: Implanted defibrillator use WiFi. What could
11 | possibly go wrong?
12 | - Inside: battery, radio, hermetically sealed
13 |
14 |
15 | Why wireless?
16 |
17 |
18 | - Old way: Inject a needle into arm to twist dial, risk of infection :(
19 |
20 |
21 | Q: What are security risks of wireless?
22 |
23 |
24 | - Unsafe practices - implementation errors.
25 | - Manufacturer and User Facility Device Experience (MAUDE) database
26 |
27 | - Cause of death: buffer overflow in infusion pump.
28 | - Error detected, but brought to safe mode, turn off pump.
29 | - Patient died after increase in brain pressure because
30 | no pump, because of buffer overflow.
31 |
32 |
33 |
34 | Human factors and software
35 |
36 | Why unique?
37 |
38 | 500+ deaths
39 |
40 | E.g. User interface for delivering dosage to patients did not properly indicate
41 | whether it expected hours or minutes as input (hh:mm:ss). Led to order of
42 | magnitude error: 20 min vs. the intended 20 hrs.
43 |
44 | Managerial issues
45 |
46 | Medical devices also need to take software updates.
47 |
48 | E.g. McAffee classified DLL as malicious, quarantines,
49 | messed up hospital services.
50 |
51 | E.g. hospitals using Windows XP:
52 | - There are no more security updates from Microsoft for XP, but still new medical products shipping Windows XP.
53 |
54 | FDA Cybersecurity Guidance
55 |
56 | What is expected to be seen from manufacturers? How they
57 | have thought through the security problems / risks /
58 | mitigation strategies / residual risks?
59 |
60 | Adversary stuff
61 |
62 | Defibrillator & Implants
63 |
64 | This section of the notes refers to the discussion of attacks on implanted defibrillators from Kevin Fu's lecture. In one example he gave, the implanted devices are wirelessly programmed with another device called a "wand", which uses a proprietary (non-public, non-standardized) protocol. Also, the wand transmits (and the device listens) on specially licensed EM spectrum (e.g. not WiFI or bluetooth). The next two lines describe the surgical process by which the defibrillator is implanted in the patient.
65 |
66 |
67 | - Device programmed w/ wand, speaking proprietary protocol
68 | over specially licensed spectrum. (good idea w.r.t.
69 | security?)
70 | - Patient awake but numbed and sedated
71 | Six people weave electrodes through blood vessel....
72 | Patient given a base station, looks like AP, speaks proprietary RF to implant,
73 | data sent via Internet to healthcare company
74 | Communication between device and programmer: no crypto / auth, data sent in plaintext
75 | Device stores: Patient name, DOB, make & model, serial no., more...
76 | ???????? Use a software radio (USRP/GNU Radio Software)
77 |
78 |
79 | Q: Can you wirelessly induce a fatal heart rhythm
80 | A: Yes. Device emitted 500V shock in 1 msec. E.g. get kicked in chest by horse.
81 |
82 | Devices fixed through software updates?
83 |
84 | Healthcare Providers
85 |
86 | Screenshot of "Hospitals Stuck with Windows XP": 600 Service Pack 0 Windows XP devices in the hospital!
87 |
88 | Average time to infection for healthcare devices:
89 | - 12 days w/o protection
90 | - 1 year w/ antivirus
91 |
92 | Vendors are a common source of infection
93 |
94 | USB drive is a common vector for infection.
95 |
96 | Medical device signatures over download
97 |
98 | "Click here to download software update"
99 |
100 |
101 | - Website appears to contain malware
102 | - Chrome: Safe web browsing service detected "ventilator" malware
103 |
104 |
105 | "Drug Compounder" example:
106 |
107 |
108 | - Runs Windows XP embedded
109 | - FDA expects manufacturers to keep SW up to date
110 | - Manufacturers claim cannot update because of FDA
111 |
112 | - double you tea f?
113 |
114 |
115 |
116 | How significant intentional malicious SW malfunctions?
117 |
118 | E.g. 1: Chicago 1982: Somebody inserts cyanide into Tylenol
119 | E.g. 2: Somebody posted flashing images on epillepsy support group website.
120 |
121 | Why do you trust sensors?
122 |
123 | E.g. smartphones. Batteryless sensors demo. Running on an MSP430. uC believes
124 | anything coming from ADC to uC. Possible to do something related to resonant
125 | freq. of wire there?
126 |
127 | Inject interference into the baseband
128 |
129 |
130 | - Hard to filter in the analog
131 | => Higher quality audio w/ interference than microphone
132 |
133 |
134 | Send a signal that matches resonant frequency of the wire.
135 |
136 | Treat circuit as unintentional demodulator
137 |
138 |
139 | - Can use high frequency signal to trick uC into thinking
140 | - there is a low frequency signal due to knowing interrupt
141 | frequency of uC and related properties.
142 |
143 |
144 | Cardiac devices vulnerable to baseband EMI
145 |
146 |
147 | - Insert intentional EM interference in baseband
148 |
149 |
150 | Send pulsed sinewave to trick defibrilator into thinking heart beating correctly
151 |
152 |
153 | - ????? Works in vitro
154 | - Hard to replicate in a body or saline solution
155 |
156 |
157 | Any defenses?
158 |
159 |
160 | - Send an extra pacing pulse right after a beat
161 |
162 | - a real heart shouldn't send a response
163 |
164 |
165 |
166 | Detecting malware at power outlets
167 |
168 | Embedded system <--> WattsUpDoc <--> Power outlet
169 |
170 | Bigger problems than security?
171 |
172 | Q: True or false: Hackers breaking into medical devices is
173 | the biggest risk at the moment.
174 |
175 | A: False. Wide scale unavailability of patient care and integrity of
176 | medical sensors are more important.
177 |
178 | Security cannot be bolted on
179 |
180 |
181 | - E.g. MRI on windows 95
182 | - E.g. Pacemaker programmer running on OS/2
183 |
184 |
185 | Check gmail on medical devices, etc.
186 |
187 | Run pandora on medical machine.
188 |
189 | Keep clinical workflow predictable.
190 |
--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
1 | Computer systems security notes (6.858, Fall 2014)
2 |
3 | Lecture notes from 6.858, taught by Prof. Nickolai Zeldovich and Prof. James Mickens in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 course website.
4 |
5 |
6 | - Lecture 1: Introduction: what is security, what's the point, no perfect security, policy, threat models, assumptions, mechanism, buffer overflows
7 | - Lecture 2: Control hijacking attacks: buffer overflows, stack canaries, bounds checking, electric fences, fat pointers, shadow data structure, Jones & Kelly, baggy bounds checking
8 | - Lecture 3: More baggy bounds and return oriented programming: costs of bounds checking, non-executable memory, address-space layout randomization (ASLR), return-oriented programming (ROP), stack reading, blind ROP, gadgets
9 | - Lecture 4: OKWS: privilege separation, Linux discretionary access control (DAC), UIDs, GIDs, setuid/setgid, file descriptors, processes, the Apache webserver, chroot jails, remote procedure calls (RPC)
10 | - Lecture 5: Penetration testing guest lecture by Paul Youn, iSEC Partners
11 | - Lecture 6: Capsicum: confused deputy problem, ambient authority, capabilities, sandboxing, discretionary access control (DAC), mandatory access control (MAC), Capsicum
12 | - Lecture 7: Native Client (NaCl): sandboxing x86 native code, software fault isolation, reliable disassembly, x86 segmentation
13 | - Lecture 8: Web Security, Part I: modern web browsers, same-origin policy, frames, DOM nodes, cookies, cross-site request forgery (CSRF) attacks, DNS rebinding attacks, browser plugins
14 | - Lecture 9: Web Security, Part II: cross-site scripting (XSS) attacks, XSS defenses, SQL injection atacks, Django, session management, cookies, HTML5 local storage, HTTP protocol ambiguities, covert channels
15 | - Lecture 10: Symbolic execution guest lecture by Prof. Armando Solar-Lezama, MIT CSAIL
16 | - Lecture 11: Ur/Web guest lecture by Prof. Adam Chlipala, MIT, CSAIL
17 | - Lecture 12: TCP/IP security: threat model, sequence numbers and attacks, connection hijacking attacks, SYN flooding, bandwidth amplification attacks, routing
18 | - Lecture 13: Kerberos: Kerberos architecture and trust model, tickets, authenticators, ticket granting servers, password-changing, replication, network attacks, forward secrecy
19 | - Lecture 14: ForceHTTPS: certificates, HTTPS, Online Certificate Status Protocol (OCSP), ForceHTTPS
20 | - Lecture 15: Medical software guest lecture by Prof. Kevin Fu, U. Michigan
21 | - Lecture 16: Timing attacks: side-channel attacks, RSA encryption, RSA implementation, modular exponentiation, Chinese remainder theorem (CRT), repeated squaring, Montgomery representation, Karatsuba multiplication, RSA blinding, other timing attacks
22 | - Lecture 17: User authentication: what you have, what you know, what you are, passwords, challenge-response, usability, deployability, security, biometrics, multi-factor authentication (MFA), MasterCard's CAP reader
23 | - Lecture 18: Private browsing: private browsing mode, local and web attackers, VM-level privacy, OS-level privacy, OS-level privacy, what browsers implement, browser extensions
24 | - Lecture 19: Tor guest lecture by Nick Mathewson, Tor Project
25 |
26 | - 6.858 notes from 2012 on Anonymous communication: onion routing, Tor design, Tor circuits, Tor streams, Tor hidden services, blocking Tor, dining cryptographers networks (DC-nets)
27 |
28 | - Lecture 20: Mobile phone security: Android applications, activities, services, content providers, broadcast receivers, intents, permissions, labels, reference monitor, broadcast intents
29 | - Lecture 21: Information flow tracking: TaintDroid, Android data leaks, information flow control, taint tracking, taint flags, implicit flows, x86 taint tracking, TightLip
30 | - Lecture 22: MIT's IS&T guest lecture by Mark Silis and David LaPorte, MIT IS&T
31 | - Lecture 23: Security economics: economics of cyber-attacks, the spam value chain, advertising, click-support, realization, CAPTCHAs, botnets, payment protocols, ethics
32 |
33 |
34 | Papers
35 |
36 | List of papers we read (papers/):
37 |
38 |
60 |
--------------------------------------------------------------------------------
/previous-years/l14-resin.txt:
--------------------------------------------------------------------------------
1 | Resin
2 | =====
3 |
4 | what kinds of problems is this paper trying to address?
5 | threat model
6 | trusted: hardware/os/language runtime/db/app code
7 | untrusted: external inputs (users/whois servers)
8 | non-goals: buffer overflows, malicious apps
9 | programming errors: missing security checks in application code
10 | sanitizing user inputs for code injection
11 | calling access control functions for sensitive data
12 | protected wiki page; user's password
13 |
14 | Example: one web server, multiple users
15 | users interact with each other
16 | reading posts in a web forum
17 | avatar url / upload
18 | post content
19 | profile / signature
20 | attacker's plan: inject JS code / forge requests
21 | victim's browser sees this code in the HTML page, runs it
22 | what kind of code could attacker inject?
23 | steal the cookie
24 | transfer credits
25 | acl
26 | privileged operations (for admin)
27 | why doesn't the browser's same-origin policy protect the cookie?
28 | as far as the browser is concerned, code came from server's origin
29 | lower level: the zookws web server was vulnerable
30 | http://.../
31 | returns: File not found: /
32 |
33 | a similar problem: whois injection
34 | admin views logs: user, ip, domain
35 | malicious whois server
36 | problems arise if programmer forgets to quote external inputs
37 |
38 | different kind of a problem: access control checks
39 | might have protected pages in a wiki, forget to call ACL function
40 | example: hotcrp's password disclosure
41 | typical web site, sends password reminders
42 | email preview mode displays emails instead of sending
43 | turns out to display pw reminders in the requesting user's browser
44 | kind-of like the confused deputy prob: no module is really at fault?
45 |
46 | why are the checks missing?
47 | lots of places in the code where they need to be performed
48 | think of application as a black box; lots of inputs and outputs
49 | suppose that for a given output, only some inputs were OK
50 | e.g. sanitize user inputs in a SQL query, but not app's own data
51 | hard to tell where the output's data came from
52 | so, programmers try to do checks on all possible paths
53 | programmer forgets them on some paths from input to output
54 | plug-in developers may be unaware of security plan
55 |
56 | what's the plan to prevent these?
57 | think of the checks as being associated with data flows input->output
58 | associate checks with data objects like user input or password strings
59 | perform checks whenever data gets used in some interesting way
60 |
61 | what does resin provide?
62 | hotcrp data: password
63 | [ diagram from figure 1 ]
64 | policy objects
65 | contains code to implement policy for its data
66 | hotcrp: only email password to the user or the pc chair
67 | what methods does the programmer have to implement in a policy object?
68 | export_check(context)
69 | merge [optional]
70 | filter objects
71 | data flow boundaries
72 | channels with contexts: http, email, ...
73 | provided by default by resin for most external channels
74 | invoke export_check if possible
75 | data tracking
76 | how does this work? assumes a language runtime
77 | python, php have a byte code representation, sort-of like java
78 | resin tags strings, integers with a policy object
79 | changes the implementation of operations that manipulate data
80 | why only tag strings and integers? what about other things?
81 | what kinds of operations propagate?
82 | why not propagate across "covert" or "implicit" channels?
83 | why byte-level tracking?
84 | what happens when data items are combined?
85 | common: concat strings (automatic via byte-level tracking)
86 | rare: add integers
87 |
88 | what are all of the uses for filter objects?
89 | default filters for external boundaries: sockets, pipes, http, email
90 | persistent serialization
91 | files: extended attributes
92 | database: extra columns for policies, SQL rewriting
93 | example: write password to file/db
94 | code imports
95 | interpreter's input is yet another kind of channel
96 | write access control
97 | persistent filters on FS objects like files, directories
98 | almost a different kind of check: tied to an external object, not data
99 | propagation rules for functions
100 | sha1(), strtoupper(), ..
101 |
102 | how would you use resin to prevent missing checks?
103 | hotcrp
104 | cross-site scripting: profile
105 | UntrustedData & XFilter calls strip and removes the policy?
106 | define UntrustedData and JSSantitized, empty export_check
107 | input tagged UntrustedData
108 | strip function attach JSSantitized
109 | output filter checks strings must contain JSSantitized if UntrustedData exists
110 | alternative: UntrustedData policy only; filter parses and sanitizes strings
111 |
112 | does this system actually work?
113 | two versions of resin, one for python and one for php
114 | prevented known bugs in real apps
115 | prevented unknown bugs in real apps too
116 | few different kinds of bugs (ACL, XSS, SQL inj, directory traversal, ..)
117 | is it possible to forget checks with resin?
118 | what does resin provide/guarantee?
119 | are there potential pitfalls with resin's assertions?
120 | how much code is required to write these assertions? why?
121 | how specific are the assertions to the bug you want to prevent? why?
122 | how did they prevent the myphpscripts login library bug?
123 |
124 | what's the cost?
125 | need to deploy a new php/python interpreter
126 | need to write some assertions (policy objects?)
127 | runtime overheads: memory to store policies, CPU time to track them
128 | major cost: serializing policies to SQL, file system
129 | could that be less? e.g. avoid storing email twice in hotcrp?
130 |
131 | how else can you avoid these missing check problems?
132 | IFC does data tracking in some logical sense
133 | trade-off: redesign/rewrite your app around some checks
134 | hard to redesign around multiple checks or to add a check later
135 | java stack inspection
136 | can't automatically perform checks for things that are off the stack
137 | can check if file is being read through a sanitizing/ACL-check function
138 | crimps programmer's style, but in theory possible
139 | express some of these checks in the type system
140 | maybe have a special kind of UntrustedString vs SafeString
141 | and conversely SQLString and HTMLString which get used for output
142 | special conversion rules for them
143 | could even do static checks for these data flows
144 | for password disclosure, ACL checks: maybe a delayed-check string?
145 | when about to send out the string, tell it where you're sending it
146 | almost like resin design
147 | problem with using the type system:
148 | policies intertwined with code throughout the app
149 | to add a new check, need to change types everywhere
150 | resin is almost like a shadow type system
151 |
152 | could you apply resin to other applications, or other environments?
153 | different languages?
154 | different machines (cluster of web servers)?
155 | no language runtime?
156 | untrusted/malicious code?
157 |
158 |
--------------------------------------------------------------------------------
/previous-years/l22-usability-2.txt:
--------------------------------------------------------------------------------
1 | Security Usability
2 | ==================
3 |
4 | is this problem real? concrete examples of things that go wrong?
5 |
6 | why is usable security a big problem?
7 | secondary tasks: users concerned with something other than security
8 | negative goal / weakest link: must consider entire system
9 | abstract, hard to reason about; little feedback: security often not tangible
10 | users don't fully understand threats, mechanisms they're using
11 |
12 | why do we need users in the loop?
13 | good reasons: users should be ultimately in control of their security
14 | bad reasons: programmers didn't know what to do, so they asked the user
15 | backwards compatibility
16 |
17 | what does the paper think constitutes usability for PGP?
18 | encrypt/decrypt, sign/verify signatures
19 | generate and distribute public key for encryption
20 | generate and publish public key for signing
21 | obtain other users' keys for verifying signatures
22 | obtain other users' keys for encrypting
23 | avoid errors (trusting wrong keys, accidentally not encrypting, ..)
24 |
25 | how do they evaluate it?
26 |
27 | cognitive walkthrough
28 | inspection by a developer trying to simulate a user's mindset
29 | overly-simplistic metaphors
30 | physical keys are similar to symmetric crypto, not public-key crypto
31 | quill pens lack the idea of a key being involved; key vs signature
32 | leads to faulty intuition
33 | not exposing key type information more explicitly
34 | good principle: if user needs to worry about something, expose it well
35 | users had to decide how to encrypt and sign a particular message
36 | old vs new key type icons not well documented
37 | figure 3: recipient dialog box talks about users, not keys
38 | implicit trust policy that might not be obvious to users
39 | web-of-trust model, keys can be trusted through multiple marginal sigs
40 | user might not realize what's going on
41 | not making key server operations explicit? unclear what's the precise risk
42 | failing to upload revocations to the key server
43 | publicizing or revoking keys unintentionally
44 | irreversible operations not well described
45 | deleting private key: should tell user they won't be able to decrypt, ..
46 | publicizing/revoking keys: warn the user it's a permanent change
47 | too much info
48 | UI focused on exposing what's technically hard: key trust management
49 | maybe a good model would be to ask the user to specify a threat model
50 | beginner: worried about opportunistic attackers stealing plaintext
51 | medium: worried about attacker injecting malicious keys?
52 | advanced: worried about attacker compromising some friends?
53 | more advanced: worried about cryptographic attack on small key sizes
54 | worry: users not good at estimating risk
55 | e.g. a worm might easily compromise friends' machines and sign keys
56 |
57 | lab experiment
58 | users confused about how the keys fit into the security model
59 | is something a key or a message?
60 | maybe extract as much info as possible from supplied data?
61 | could tell the user it's a key vs message based on headers etc
62 | where do keys come from? who generates them?
63 | need to use recipient's key rather than my own (sender's)
64 | key icons confusing because they don't differentiate public vs private
65 | noone managed to handle mixed key types in a single message
66 | practical solution was to send separate messages to each recipient
67 | perhaps sacrifice generality for usability?
68 | key trust questions were not prominent
69 | some users concerned about why they should trust keys
70 | one user assumed keys were OK because signed by campaign manager
71 | (but is campaign manager key's OK?)
72 | noone used PGP's key trust model
73 | overall results
74 | 4/12 managed to send an encrypted, signed email
75 | 3/12 disclosed the secret message in plaintext
76 | what does this mean?
77 | how effective is PGP in practice?
78 | maybe not so dismal for users that learn to use it over time
79 | on the other hand, easy to make dangerous mistakes
80 | all users disinclined to use PGP further
81 | what other experiments would be valuable?
82 | no attackers in the experiment
83 | would users notice a bad signature?
84 |
85 | phishing attacks
86 | look-alike domains
87 | visually similar (bankofthevvest.com)
88 | exploit incorrect user intuition (ebay-security.com)
89 | unfortunately even legitimate companies often outsource some services!
90 | e.g. URLs like "ebay.somesurveysite.com"
91 | visual deception
92 | copy logos, site layout
93 | inject look-alike security indicators
94 | create new windows that look like other dialog boxes
95 |
96 | why is phishing such a big problem? what UI security problems contribute to it?
97 | novice users don't understand the threats they are facing
98 | users don't have a clear mental model of the browser's security policy
99 | users don't understand technical details of what constitutes an origin
100 | users don't understand what to look for in an SSL certificate / EV certs
101 | users don't understand implications of security decisions
102 | allow cookie? allow non-SSL content?
103 | java security model: grant code from developer X access to FS/net?
104 | browsers have complex security indicators
105 | need to look at origin in URL bar, SSL certificate
106 | security indicators can be absent instead of indicating a warning/error
107 | e.g. if site is non-SSL, nothing out-of-the-ordinary appears to the user
108 |
109 | techniques to combat phishing?
110 | most common: maintain a database of known phishing sites
111 | why isn't this fully effective?
112 | active vs passive warnings
113 | habituation: users accustomed to warnings/errors
114 | users focused on getting their work done
115 | if the warning gives an option to continue, users may think it's OK
116 |
117 | more intrusive measures are often more effective here
118 | replace passwords with some other form of auth (smartcard, PAKE, etc)
119 | only works for credentials; attackers might still steal DOB, SSN, ..
120 | turn phishing into online attack
121 | site must display an agreed-upon image before user enters password
122 | can be hard for users to comprehend how and what this defends from
123 |
124 | other human factors in system security?
125 | social engineering attacks
126 | least privilege can conflict with allowing users to do their work
127 | differentiating between trust in users vs trust in users' machines
128 |
129 | principles for designing usable secure systems?
130 | avoid false positives in security warnings (can make them errors then?)
131 | active security warnings to force user to make a choice (cannot ignore)
132 | present users with useful choices when possible
133 | users want to perform their task, don't want to choose "stop" option
134 | e.g. try to look up the correct key in a PGP key server?
135 | search google for an authentic web site vs phishing attack?
136 | secure defaults; secure by design; "invisible security"
137 | when does this work?
138 | when is this insufficient?
139 | intuitive security mechanisms that make sense to the user
140 | some of the windows "privacy" knobs or wizards that give a few options
141 | train users
142 | users unlikely to spend time to learn on their own
143 | interesting idea: try to train users as part of normal workflow
144 | try to mount phishing attacks on user by sending spam to them
145 | if they fall for an attack, tell them what they should've looked for
146 | can get tiresome after a while, if not done properly..
147 | security training games
148 |
149 |
--------------------------------------------------------------------------------
/previous-years/l21-captcha.txt:
--------------------------------------------------------------------------------
1 | CAPTCHAs
2 | ========
3 |
4 | Administrivia.
5 | This week, Wed: in-lecture quiz.
6 | Next week, Mon + Wed: in-lecture final project presentations.
7 | 10 minutes per group.
8 | We will have a projector set up if you want to use one.
9 | Feel free to do a demo (e.g., 5 minute talk + 5 minute demo).
10 | Volunteers for Monday? If not, we will just pick at random.
11 | Turn in code + writeup by Friday next week (i.e., Dec 10th).
12 |
13 | Goal of this paper: better understand the economics of security.
14 | Context: earlier paper, "Spamalytics", studied economics of botnets, spam.
15 | Adversaries profitably send spam, mount denial-of-service attacks, etc.
16 | The bulk of botnet activity is work like this (spam, DoS).
17 | Botnet operators sell access to botnets, so there's a real market for this.
18 |
19 | What web sites would use CAPTCHAs?
20 | Open services that allow any user to interact with their site.
21 | Applications that have user accounts but allow anyone to sign up.
22 |
23 | Why would a web site want to use a CAPTCHA?
24 | Prevent adversary from causing DoS (e.g., too many Google searches).
25 | Prevent adversary from spamming users.
26 | Many examples: email spam, social network spam, blog comments.
27 | Prevent adversary from signing up for many accounts?
28 | Harness humans for some task.
29 | reCAPTCHA: OCR books.
30 | Solve CAPTCHAs from other sites? Interesting but probably not worth it.
31 | What if a user legitimately signs up for an account and sends spam?
32 | What if adversary bypasses CAPTCHA and signs up for account?
33 | Can probably detect an adversary sending spam relatively fast.
34 | Still want CAPTCHA to prevent those first few messages before detection.
35 |
36 | Why do sites care if users are humans or software?
37 | Maintain some form of per-person fairness, + hope good users outnumber bad.
38 | Advertising revenue.
39 | What about ad-blocking software?
40 |
41 | If a site doesn't want to implement CAPTCHAs, what are the alternatives?
42 | Track based on IPs.
43 | IPs are cheap for botnet operators.
44 | False positives due to large NATs.
45 | Implement stronger authentication.
46 | Rely on some other authentication mechanism.
47 | Email address, Google account.
48 | At extreme end, bank account, even if no money is charged.
49 | How does Wikipedia work with no CAPTCHAs?
50 | Strong logging, auditing, recovery.
51 | Selective mechanisms to require long-lived accounts.
52 | Measure account life in time, or in number of un-reverted edits?
53 |
54 | Bypassing CAPTCHAs.
55 | Plan 1: write software to recognize characters / challenges in images.
56 | Plan 2: use humans to solve CAPTCHAs.
57 |
58 | Why does the paper argue the technical approach (plan 1) is not effective?
59 | Up-front cost: about $10k to implement solver for CAPTCHA.
60 | CPU cost: a few seconds of CPU time per CAPTCHA solved.
61 | Amazon EC2 prices, order-of-magnitude: $0.10 for an hour of CPU.
62 | CPU cost for solving a CAPTCHA is ~$10^-4 ($0.0001), could be less.
63 | Using humans: $1 for 1,000 CAPTCHA solutions, or $0.001 per CAPTCHA.
64 | Break-even point: solve order-of-magnitude 10M CAPTCHAs.
65 | Worse yet, accuracy rate of automated solver is poor (e.g., 30%).
66 | Thus, break-even point for plan 1 might be higher by 3x.
67 | How do we tell if this break-even point is too high?
68 | Can CAPTCHA developers switch algorithms faster than this?
69 | Experimentally, paper says reCAPTCHA can change fast enough.
70 | Thus, investment not worth it.
71 |
72 | Human-based CAPTCHA solving: Figure 3.
73 | Well-defined API between application and CAPTCHA-solving site.
74 | Back-end site for workers, with a web-based UI.
75 | Some internal protocol between the front- and back-end sites.
76 | How do the authors find out these things?
77 | Looks like a lot of manual work finding these sites.
78 | Interviewed an operator of one such site.
79 | How reliable are these sites?
80 | 80-90% availability (Table 1).
81 | 10-20% error rate (Fig. 4).
82 | What's the cost range?
83 | $0.50 -- $20.00 per 1,000 CAPTCHAs solved.
84 | Wide variance in adaptability, accuracy, latency, capacity.
85 |
86 | Does low accuracy rate matter?
87 | Service provider could detect many incorrect CAPTCHAs.
88 | What would a service provider do in this case?
89 | Can blacklist an IP address after several incorrect answers.
90 | If overall rate across IPs goes down, deploy new CAPTCHA scheme?
91 | Even humans have a 75-90% accuracy rate, depending on the CAPTCHA.
92 | Assuming the humans are similar, service shouldn't blacklist.
93 |
94 | Does latency matter?
95 | CAPTCHA solver cannot be significantly slower than human.
96 | Service would be able to tell the real human & adversary apart.
97 | Regular humans can solve CAPTCHAs in ~10 seconds.
98 | Software can solve CAPTCHAs in several seconds: fast enough.
99 | CAPTCHA-solving services seem to add little latency (Fig. 7).
100 |
101 | How scalable is this?
102 | One service appears to have 400+ workers.
103 | Measured much like network analysis: watch for queueing.
104 |
105 | How much are the workers getting paid?
106 | Quite little: $2-4 per day!
107 | Workers get ~quarter of front-end cost.
108 | Many workers seem to be in China, India, Russia.
109 | Cute tricks for identifying workers:
110 | Ask to decode 3-digit numbers in specific language.
111 | Ask to write down the current time, to find timezone.
112 |
113 | How much profit does an adversary get from abusing an open service?
114 | Email spam: relatively little, but non-zero.
115 | Earlier work suggests a rough estimate of $0.00001 (10^-5) per msg.
116 | How do we measure the profit from sending spam?
117 | Comment spam: not known, might be higher?
118 | Is it possible to quantify or estimate?
119 | Possibly look at the ad costs for the page hosting the comments.
120 | Vandalism, DoS attacks: hard to quantify, externalities.
121 |
122 | Are CAPTCHAs still useful, worthwhile?
123 | An easy way to impose some non-zero cost on potential adversaries.
124 | Why do adversaries sign up for Gmail accounts to send spam?
125 | Gmail's servers unlikely to be marked as spam senders.
126 | Botnet IP addresses are, on the other hand, likely marked as spam.
127 | At $0.001, 1 CAPTCHA is worth 100 emails (at $0.0001 profit per msg).
128 | Borderline-profitable.
129 | Bad place to be in terms of security parameters.
130 |
131 | Users seem to have become more-or-less OK with solving CAPTCHAs.
132 | Can we provide better forms of CAPTCHAs?
133 | Example in paper: Microsoft's Asirra, solvers adapted within days.
134 | Can sites make the cost of solving a CAPTCHA high?
135 |
136 | How to protect more valuable services?
137 | Gmail: SMS-based verification after a few signups from an IP address.
138 | Interesting: gmail accounts went from $8 per 1,000 to unavailable!
139 | Trade-off between defense mechanism usability and security.
140 | Apparently, users do go away from a site if they must solve CAPTCHAs.
141 | Do computational puzzles help? Micropayments?
142 | Can TPMs help, perhaps on the client machines?
143 |
144 | Is it ethical to do the kind of research in this paper?
145 | Authors argue they don't significantly change what's going on.
146 | They don't solve any additional CAPTCHAs by hand.
147 | Instead, they re-submit CAPTCHAs back into the system to be solved.
148 | They don't use the solutions they purchased for any adversarial activity.
149 | They do inject money into the market, but perhaps not significant.
150 |
151 | Other courses, if you're interested in security.
152 | 6.857: Computer and Network Security, in the spring.
153 | 6.875: Cryptography and Cryptanalysis, in the spring.
154 |
155 |
--------------------------------------------------------------------------------
/previous-years/l20-bots.txt:
--------------------------------------------------------------------------------
1 | Botnets
2 | =======
3 |
4 | botnet: network of many machines under someone's control
5 |
6 | what are botnets good for?
7 | using the resources of bot nodes:
8 | IP addrs (spam, click fraud), bandwidth (DoS), maybe CPU (??)
9 | steal sensitive user data (bank account info, credit cards, etc)
10 | impersonate user (inject requests to transfer money on user's behalf)
11 | extortion (encrypt user's data, demand payment for decryption)
12 | attackers might be able to extract a lot of benefit from high-value machines
13 | one botnet had control of machines of officials of diff governments
14 | could enable audio, video and stream it out of important meetings?
15 | other candiates: stealing secret designs from competitor company?
16 | what sorts of attacks are counter-productive for attacker?
17 | making the machine unusable for end-user (unless trying extortion)
18 |
19 | how does the botnet grow? (largely orthogonal from botnet operation)
20 | this particular botnet (Torpig): drive-by downloads
21 | user's browser loads a malicious page (e.g. attacker purchased adspace)
22 | malicious page looks for vulnerabilities in browser or plug-ins
23 | if it finds a way to execute native code, downloads bot code
24 | can we prevent or detect this? maybe look for unusual new processes?
25 | botnet in paper: injects DLLs into existing processes
26 | can use a debugging interface to modify existing process
27 | some processes support plugins/modules (IE, windows explorer)
28 | once DLL running in some other process, looks less suspicious?
29 |
30 | other schemes: worms (self-replicating attack malware)
31 | why worms?
32 | harder to detect (no single attack source)
33 | compromise more machines (attacker now behind firewalls)
34 | faster (much less than an hour for every internet-connected machine)
35 | usually exploit a few wide-spread vulnerabilities
36 | simple worms: exploit some vulnerability in network-facing service
37 | easy strategy: try to spread to other machines at random
38 | e.g. guessing random IPs works (but inefficient)
39 | use user's machine as source of other victims
40 | for worms that spread via email, try user's email address book
41 | try other victims in the same network as the current machine
42 | try machines in user's ssh known_hosts file
43 | use other databases to find candidate victims
44 | google for "powered by phpBB"
45 | try to propagate to any servers that the user connects to
46 | hides communication patterns!
47 | more complex worms possible (from web server to browser and back)
48 | requires finding wide-spread bugs in multiple apps at once
49 | less common as a result?
50 | can we prevent or detect this?
51 | prevent: could try to isolate machines after you've detected it
52 | worm fingerprinting in the network (traffic patterns)
53 | monitor unused machines, email addresses, etc for suspicious traffic
54 | in theory shouldn't be getting anything legitimate
55 | what would show up if you monitored traffic to unused subnet?
56 | network mapping by researchers?
57 | random probes by worms poking at IP addresses
58 | "backscatter" from source-spoofing
59 | could use these to infer what's happening "out there"
60 | detect by planting honeypots
61 | if machine starts generating traffic, probably infected
62 |
63 | once some machine is infected, how does the botnet operate?
64 | bot master, command and control (C&C) server(s), bots talk to C&C servers
65 | bots receive commands from C&C servers
66 | some bots accept commands from the network (e.g. run an open proxy server)
67 | upload stolen data either to the same C&C servers or some other server
68 |
69 | how do bot masters try to avoid being taken down?
70 | change the C&C server's IP address ("fast flux")
71 | can move from one ISP to another after getting abuse complaints
72 | how to inform your bots that your IP address changed? DNS
73 | domain name is a single point of failure for bot master
74 | dynamic domain names ("domain flux")
75 | how does this work?
76 | how do you take down access to a botnet using this?
77 | is there still a single point of failure here?
78 | currently many different domain registrars, little cooperation
79 | conficker generated many more dynamic domain names than torpig
80 | makes it impractical to register all of these names ahead of time
81 | peer-to-peer control networks (Storm botnet)
82 | harder for someone else to take down: no single server
83 | harder for botmaster to hide botnet internals: no protected central srvr
84 |
85 | how did torpig work?
86 | mebroot installs itself into the MBR, so gets to inject itself early on
87 | loads modules from mebroot C&C server
88 | mebroot C&C server responds with torpig DLL to inject into various apps
89 | torpig DLL collects any data that matches pre-defined patterns
90 | usernames and passwords; credit card numbers; ...
91 | torpig DLL contacts torpig's C&C server for info about what sites to target
92 | torpig's C&C server using domain flux: weekly and daily domains
93 | "injection server" responsible for stealing credentials for a specific site
94 | redirects visits to bank login page to fake login page
95 | in-browser DLL subverts any browser protections (SSL, lock icon)
96 | lots of "outsourcing" going on: mebroot, torpig, torpig build customers?
97 |
98 | all traffic encrypted
99 | but these bots implement their own crypto: bad plan, can get broken
100 | conficker used well-known crypto, and was thus much harder to break
101 |
102 | how did these guys take over the botnet?
103 | attackers did not register every torpig dynamic domain name ahead of time
104 | bots did not properly authenticate responses from C&C server
105 | (torpig "owners" eventually took back control through mebroot's C&C)
106 |
107 | how big is the torpig botnet?
108 | 1.2 million IPs
109 | each bot has a "nid" that reflects its hardware config (disk serial number)
110 | ~180k unique nid's
111 | ~182k unique (nid+os+...)'s
112 | 40 VMs (nid's match a standard configuration of vmware or qemu)
113 | lots of IP reuse
114 | aggregate bandwidth is likely over 17 Gbps
115 |
116 | how effective is torpig?
117 | authors collected all data during the 10 days they had control of torpig
118 | collected lots of account information: millions of passwords
119 | many users reuse passwords across sites
120 | 8310 accounts at financial institutions
121 | 1660 credit/debit card numbers
122 | 30 came from a single compromised at-home call center node
123 | pattern-matching works well: don't have to know app ahead of time
124 | kept producing a steady stream of new financial data throughout the 10 days
125 | what's going on?
126 | probably users don't enter their CC#, bank password every day
127 |
128 | how effective is spam?
129 | separate paper looked at the economics of sending spam
130 | about 0.005% users visit URLs in spam messages (1 out of 20,000)
131 | less than 10% of those users "bought" whatever the site was selling
132 | so send ~200,000 spam messages for one real customer
133 | unclear if it's cost-effective (esp. if bots are nearly-free)
134 |
135 | how to defend against bots?
136 | are TPMs of any help?
137 | maybe a way to keep your credentials safe (and avoid simple passwords)
138 | resource abuse: annoying because it gets your machine blacklisted
139 | VMM-level scheme to track user activity?
140 | make their operation not cost-effective
141 | need to get a good idea of what's most profitable for botmasters
142 |
143 | did these guys make it more difficult to mount similar attacks in the future?
144 | probably torpig will get fixed
145 | other papers written about takeovers on different bot nets
146 | other bots employ much stronger security measures to prevent takeover
147 |
148 |
--------------------------------------------------------------------------------
/previous-years/l23-voting.txt:
--------------------------------------------------------------------------------
1 | Electronic voting
2 | =================
3 |
4 | final projects reminder
5 | 10-minute presentations about your projects on Wednesday
6 | we will have a projector that you can use
7 | code and write-up describing your project due on Friday
8 | will only start grading on monday morning, if you need extension..
9 |
10 | quiz solutions posted on course web site
11 | HKN course eval link posted on course web site
12 |
13 | ---
14 |
15 | what are the security goals in elections?
16 | availability: voters can vote
17 | integrity: votes cannot be changed; results reflect all votes
18 | registration: voters should vote at most once
19 | privacy: voters should not be able to prove how they voted (eg to sell vote)
20 |
21 | what's the threat model?
22 | lots of potential attackers
23 | officials, vendors, candidates themselves, activists, governments, ..
24 | may be interested in obtaining a particular outcome
25 | voters may want to sell votes
26 | real world: anything is fair game
27 | intimidation
28 | impersonation (incl. dead people)
29 | denial of service
30 | ballot box stuffing, miscounting, ..
31 | electronic voting machine attacks
32 | buffer overflows
33 | logic bugs
34 | insider attacks
35 | physical attacks
36 | crashing / corrupting
37 | ..
38 | ideal designs focus on making the attack cost high
39 | auditing with penalties if detected
40 |
41 | what are the alternatives?
42 | vote in public: raise hands, ..
43 | written paper ballots
44 | optical-scan paper ballots
45 | punched paper ballots
46 | DRE (what this paper is about): direct-recording electronic machine
47 | absentee voting by mail
48 | vote-selling potential
49 | internet voting
50 | greater voter turnout
51 | vote-selling potential
52 | more practical problem: worms/viruses voting?
53 |
54 | why DRE?
55 | partly a response to voting problems in florida in the 2000 election
56 | hoped: easier-to-use UI, faster results, more accurate counting, ..
57 | interesting set of constraints from a research point of view
58 | high integrity, ideally verifiable
59 | most of the process should be transparent and auditable
60 | cannot expose individual voter's choices
61 | cannot allow individual voters to prove their vote
62 |
63 | how does the machine work?
64 | 133MHz CPU
65 | 32MB RAM
66 |
67 | on-board flash memory
68 | EPROM socket
69 | "ext flash" socket
70 | boot selector switches, determine which of above 3 device is used to boot
71 |
72 | internal speaker
73 |
74 | external devices:
75 | touch-sensitive LCD panel, keypad, headphones
76 | printer -- why?
77 | smart card reader/writer -- why?
78 | irda transmitter/receiver -- why?
79 |
80 | power switch, keyboard port, PC-card slots (behind locked metal door)
81 |
82 | what does the boot sequence look like?
83 | bootloader runs from selected source
84 | internal flash contains a gzip'ed OS image that gets loaded into RAM
85 | includes image for root file system
86 | internal flash contains file system that stores votes, among other things
87 |
88 | what's on the memory card?
89 | machine state configured via election.brs file
90 | votes stored on memory card (and in the built-in flash) in election.brs
91 | data encrypted using a fixed DES key (hard-coded in the software)
92 |
93 | machine states: pre-download, pre-election testing, election, post-election
94 | what's the point of L&A testing?
95 | want to distinguish test votes from real votes
96 | want to make it difficult to erase existing votes
97 | also tips off the software that it's being tested!
98 |
99 | why smartcards?
100 | contain a secure token from sign-in desk to the voting machine
101 | ideal property: cannot fake a token, cannot duplicate token
102 | how to implement? faking is easy, duplication is harder
103 | can give each token a unique ID, store used tokens on machine
104 | potentially vulnerable to multiple votes on different machines
105 | can have smartcard destroy the token after use, no read/write API
106 | in practice, turned out the machine was not using any smartcard crypto
107 | attacker can easily manufacture fake smartcards and vote many times
108 | (attacker can also manufacture an "admin" smartcard and manage the machine)
109 |
110 | what's the point of printing out receipt tapes post-election?
111 | in theory can do a recount based on these tapes; compare with check-in data
112 | assumes the attack is mounted after the election happens
113 | corrupt or lost memory cards, compromised tabulation, ..
114 |
115 | what attacks did the authors explore?
116 | exploiting physical access to inject malicious code
117 | vote stealing
118 | denial of service
119 | viruses/worms
120 |
121 | specific bugs
122 | unauthenticated smartcards
123 | unauthenticated firmware updates (fboot.nb0)
124 | unauthenticated OS updates (nk.bin)
125 | unauthenticated debug mode flag (explorer.glb)
126 | unauthenticated wipe command (EraseFFX.bsq)
127 | unauthenticated code injection (.ins files with buffer overflows)
128 | poor physical security (cheap lock)
129 | easy to change boot source
130 | easy to change components like EPROM
131 | insufficient audit logs (no integrity; election.adt just has "Ballot cast")
132 | sound when machine reboots, but can be prevented with headphones
133 |
134 | what to do when an audit shows an error?
135 | with this machine: denial of service attack, effectively
136 | ideally would be able to reconstruct what happened or recount manually?
137 |
138 | how to scrub a machine after a potential compromise?
139 | can't trust anything: all memory/code easily changed by attacker
140 | need to install a known-good EPROM, use that to overwrite bootloader, OS
141 | can take a long time, esp. if problem spread to many machines
142 |
143 | how to prevent these attacks?
144 | TPM / secure boot?
145 | would signed files be enough?
146 | attacker can get a hold of signed "debug mode" file and he's done?
147 | signed software updates might not be the latest version
148 | attacker installs old version, exploits bug
149 | might want to prevent rollbacks (but may want to allow, too?)
150 | read-only memory for software?
151 | physical switches to allow updates
152 | could make it more difficult to write a fast-spreading virus/worm
153 | physical access control
154 | probably a good idea, to some extent
155 | auditing physical access leads to easy DoS attacks
156 | need a strong audit mechanism to prevent DoS (i.e., can recount)
157 | append-only memory for auditing?
158 | disable the "flash" (rewrite) circuitry from flash memory?
159 | or just have a dedicated "audit" controller
160 | system already has a separate battery-management PIC
161 | OS-level protection?
162 | language security?
163 | operating / setup procedures?
164 | who has access to the machine, chain of custody, ...
165 | parallel testing?
166 |
167 | what is software-independence?
168 | malicious software alone cannot change election results (undetectably)
169 | e.g. software helps print out ballot, voter makes sure ballot is OK
170 | or prints out a paper tape with all votes, which is counted by hand
171 |
172 | usability for voters?
173 | paper doesn't describe the UI, unfortunately..
174 | "machine ate my vote"?
175 | could invalidate smartcard and crash?
176 |
177 | usability for officials?
178 | potentially same problems as PGP
179 | do officials have the right mental model to worry about potential attacks?
180 |
181 | end-to-end integrity
182 | voting integrity has 3 parts:
183 | cast-as-intended
184 | collected-as-cast
185 | counted-as-collected
186 | above techniques only help ensure cast-as-intended
187 | need more end-to-end security to ensure other 2 properties
188 | Twin scheme by Rivest and Smith
189 |
190 |
--------------------------------------------------------------------------------
/previous-years/l21-dropbox.txt:
--------------------------------------------------------------------------------
1 | Looking inside the (drop)box
2 | ==================
3 |
4 | why are we reading this paper?
5 | code obfuscation is a common goal in the real world
6 | skype, dropbox
7 | gmail
8 | malware
9 | closed versus open design
10 | contrast bitlocker and dropbox client
11 |
12 | this paper has several aspects
13 | code obfuscation weaknesses
14 | focus of this lecture
15 | user authentication weaknesses
16 | not our focus, technically less interesting
17 | automatic login without user credentials fixed (i think)
18 | aside: etiquette with finding security flaws
19 | report before you publish
20 |
21 | what is Dropbox's goal for obfuscation?
22 | don't know, but ...
23 | no open-source client
24 | dropbox e.g., can change the wire protocol at will
25 | make it difficult to for competitors to develop a client
26 | portable fs client is tricky
27 |
28 | what is the threat model?
29 | adversary has access to obfuscated code and can run it
30 | adversary reverse re-engineers client to avoid the above goals
31 | sidenote: malware may have additional threats to protect against
32 | e.g., make it difficult to fingerprint so that anti-virus application cannot remove malware
33 |
34 | challenging threat, because:
35 | code must run correctly on adversary's processor
36 | code may have to make systems calls
37 | code may have to be linked dynamically with host libraries
38 | adversary can observe processor and systems calls
39 |
40 | general approach: code obfuscation
41 | Given a program P, produce O(P)
42 | O(P) has same functions as P but a black box
43 | there is nothing substantial one can learn from O(P)
44 | O(P) isn't much slower than P
45 |
46 | minimum requirement: adversary cannot reconstruct P
47 | ignore programs that are trivially learnable from excuting w. different inputs
48 | easy to avoid complete failure
49 | execute only if an input matches some SHA hash
50 | hash is embedded in program, but difficult to compute inverse
51 | difficult to succeed completely
52 | program prints itself
53 | in general: impossible (see references)
54 | there is a family of interesting programs for which O(P) will fail [see references]
55 | but, perhaps you could do well on a particular program
56 | difficult to state a precise requirements for an obfuscator
57 | should be skeptical that it can work in practice against skilled adversary
58 |
59 | code obfuscation in practice
60 | write C programs from which is difficult to tell what they do
61 | down-side: hard on developer
62 | but makes for great contests (e.g., International Obfuscated C Code Contest[)
63 | use an obfuscator
64 | Takes a program as input and produces a intermediate code
65 | You don't want to ship the original source code
66 | Ship program in intermediate form with interpreter to computer
67 | You don't want to ship the actual assembly
68 | Can cook up your own intermediate language that nobody knows
69 | Computer runs interpreter, which interprets intermediate code
70 | Interpreter reads input and outputs values
71 | The interpreter can try to hide what is actual computing
72 | Fake instructions, fake control
73 | Use inputs as index into a fine state machine and spit out values
74 | Etc.
75 |
76 | dropbox's approach
77 | all code is written in python
78 | compiles programs to bytecode
79 | interpreter executes bytecode
80 | dropbox application
81 | contains encrypted python byte code
82 | encryption method is changed often
83 | byte code opcodes are different than Python
84 | contains a special interpreter
85 | application is built/packaged in non-standard way
86 | special "linker"
87 |
88 | dynamic linking
89 | what are the .so files in the downloaded dropbox directory?
90 | dynamically-linkable libraries
91 | modern applications are not a single file
92 | when application runs and unresolved references are resolved at runtime
93 | e.g., application makes a system call
94 | dynamic linker links the application with the library with system call stubs
95 | adv: library is only once in memory
96 | with static linking: library would be N times in memory
97 | once with each application
98 | LD_PRELOAD: insert your own library in front of others
99 | dropbox ships its app with several libraries that are dynamically linked
100 | but interpreter and SSL are statically linked
101 |
102 | goal of paper: *automatically* break obfuscation (de-drop)
103 | another goal: break user authentication
104 | demo:
105 | look at dropbox binary
106 | ls: no pyc files
107 | gdb binary
108 | nm binary
109 | objdump -S binary
110 |
111 | run dropboxd with LD preload
112 | extracts pyc_decrypted
113 | cd pyc_decrypted/client_api
114 | python
115 | import hashing
116 | dir (hashing)
117 | run uncompyle2 hashing.pyc
118 |
119 | Paper: how to de-crypt pyc files?
120 | study modified python interpreter
121 | diffed Python27.ddl from dropbox with standard
122 | r_object is patched
123 | decrypt decrypts bytecode
124 | how to extract encrypted bytecode?
125 | inject code into dropbox binary using LD_PRELOAD
126 | injected code overwrites strlen
127 | when strlen is called by dropbox, injected code runs
128 | inject Python code using PyRun_SimpleString
129 | not patched
130 | can run arbitrary python code in dropbox context
131 | GIL must be acquired by injected code
132 | call PyMarshal_ReadLastObjectFromFile()
133 | reads encrypted pyc into memory
134 | but, co_code is not exposed to Python!
135 | linear memory search to find co_code
136 | serialize it back to a file
137 | but, marshal.dumps is NOP
138 | inject PyPy's _marshal.py
139 | written in python!
140 |
141 | How to remap opcodes?
142 | manual reconstruct opcode mapping
143 | time intensive, but opcode hasn't changed since 1.6.0
144 | frequency analysis for common modules
145 | decrypted dropbox bytecode
146 | standard bytecode
147 |
148 | How to get user credentials?
149 | hostid are used for authentication
150 | established during registration
151 | not affected by changing password!
152 | stored in encrypted sql database
153 | components of decryption key are stored on device
154 | linux: custom obfuscator
155 | except host_int comes from server
156 | Can also be extracted from dropbox client logs
157 | enable logging based MD5 checksum of "DBDEV"
158 | md5("a2y6shaya") = "c3da6009e4"
159 | patched now.
160 | Snooping on objects, looking for host_id and host_int
161 | Login to web site for logintray is based only on host_id and host_int
162 | Dropbox uses now "better" logintray ...
163 | Dropbox should probably use SRP (or something else good)
164 |
165 | How to learn what dropbox internal APIs are?
166 | Patch all SSL objects, every second
167 | "monkey patch" == dynamic modifications of a class at runtime without
168 | modifying the original source code
169 | maybe derived from guerrilla (as in an sneaky attack) patch?
170 | No two-factor authentication for access to drop-box account
171 | One use: open-source client
172 |
173 | Is the dropbox obfuscation the best you can do?
174 | No.
175 | How could you do better?
176 | Hide instructions much better
177 | Obscure control flow
178 | But, is it worth it?
179 |
180 | Closed versus open design
181 | Downside of closed designs
182 | easy to miss assumptions because right eyes don't look at it
183 | Downside of open design
184 | you competitor has access to it too
185 | Ideal case: minimal secret, make most of design open
186 | maybe not always possible to make the secret small?
187 |
188 | References
189 | http://www.math.ias.edu/~boaz/Papers/obfuscate.ps
190 | http://www.math.ias.edu/~boaz/Papers/obf_informal.html
191 | https://github.com/kholia/dedrop
192 | uncompyle2 https://github.com/wibiti/uncompyle2
193 |
--------------------------------------------------------------------------------
/previous-years/l18-dealloc.txt:
--------------------------------------------------------------------------------
1 | Secure deallocation
2 | ===================
3 |
4 | Aside: some recent reverse-engineering of Stuxnet by Symantec.
5 | http://www.symantec.com/connect/blogs/stuxnet-breakthrough
6 | Stuxnet targets specific frequency converters.
7 | Manufactured by companies headquartered in either Finland or Tehran.
8 | Used to drive motors at high speeds.
9 | Stuxnet watches for a specific frequency band.
10 | When detected, changes frequencies to low or high for short periods.
11 |
12 | Problem: disclosure of sensitive data.
13 | 1. Many kinds of sensitive data in applications.
14 | 2. Copies of sensitive data exist for a long time in running system.
15 | 3. Many ways for data to be disclosed (often unintentionally).
16 |
17 | What kinds of sensitive data are these authors concerned about?
18 | Passwords, crypto keys, etc.
19 | Small amounts of data that can be devastating if disclosed.
20 | Bulk data, such as files in a file system.
21 | Sensitive, but not as acute.
22 | Hard to reduce data lifetime (the only knob this paper is using).
23 | Small leaks might not be a disaster (unlike with a private key).
24 |
25 | Where could copies of sensitive data exist in a running system?
26 | Example applications: typing password into Firefox; Zoobar web server.
27 | Process memory: heap, stack.
28 | IO buffers, X event queues, string processing libraries.
29 | Language runtime makes copies (immutable strings, Lisp objects, ..)
30 | Thread registers.
31 | Files, backups of files, ...
32 | Swapped memory, hibernate for laptops.
33 | Kernel memory.
34 | IO buffers: keyboard, mouse inputs.
35 | Kernel stack, freed pages, saved thread registers.
36 | Network packet buffers.
37 | Pipe buffers contain data sent between processes.
38 | Random number generator inputs.
39 |
40 | How does data get disclosed?
41 | Any vulnerability that allows code execution.
42 | Logging / debugging statements.
43 | Core dumps.
44 | DRAM cold-boot attacks.
45 | Stolen disks, or just disposing of old disks.
46 | Revealing uninitialized memory.
47 | Applications with memory management bugs.
48 | Linux kernel didn't zero net buffers, sent "garbage" data in packets.
49 | Same with directories, "garbage" data was written to disk upon mkdir.
50 | MS Word (used to?) contain "garbage" in saved files, such as old text.
51 |
52 | How serious is it?
53 | What data copies might persist for a long time?
54 | Process memory: Looks like yes.
55 | How do they figure this out?
56 | Use valgrind -- could do something similar in DynamoRIO.
57 | Track all memory allocs, reads, writes, frees.
58 | Process registers: Maybe floating-point? Still, probably not that bad.
59 | Files, backups: lives on disk, long-term.
60 | Swap: lives on disk, possibly long-term, expensive to erase.
61 | Kernel memory.
62 | Experiments in paper show live data after many weeks (Sec 3.2).
63 | How do they figure this out?
64 | Place many random 20-byte "stamps" in memory.
65 | Periodically read all phys. memory in kernel, look for stamps.
66 | How can data continue to persist for so long?
67 | Memory should be getting reused?
68 | To some extent, depends on the workload.
69 | Even with an expensive workload, may not eliminate all stamps.
70 | Holes in long-lived kernel data structures, slab allocators.
71 | Persistence across reboots, even.
72 | Are there really that many data disclosure bugs?
73 | Some examples of past bugs.
74 | Worse yet: data disclosure bugs not treated with much urgency?
75 |
76 | Paper's goal:
77 | Try to minimize the amount of time that sensitive data exists.
78 | Not focusing on fixing data disclosure mechanisms (hard to generalize).
79 |
80 | How do we reduce/avoid data copies?
81 | Process memory: need application's help. Mostly what this paper is about.
82 | Process registers: not really needed.
83 | Swap: mlock(), mlockall() on Unix. Encrypted swap.
84 | File system: Bitlocker. Vanish, if the application is involved.
85 | Kernel memory: need to modify the kernel. Partly discussed in paper.
86 |
87 | Paper's model for thinking about data lifetime in memory.
88 | Interesting operations: allocation, write, read, free.
89 | Conceptually applies to any memory.
90 | malloc(), stack allocation on function call, global variables, ..
91 | Ideal lifetime for data: from first write to last read (before write/free).
92 | Can't do any better: data must stay around.
93 | Natural lifetime: from first write to next write
94 | (potentially after free and re-alloc).
95 | Natural lifetime is what most systems do today.
96 | Data lives until overwritten by something else re-using that memory.
97 |
98 | Why is natural lifetime too long?
99 | Bursty memory allocation: memory freed, never allocated again.
100 | "Holes": not every byte of an allocation might be written to.
101 | Holes in the stack.
102 | Unused members in structs.
103 | Padding in structs.
104 | Variable-length data (e.g., packets or path names).
105 |
106 | How can we do better than natural lifetime?
107 | "Secure deallocation": erase data from memory when region is freed.
108 | Safe: programs should not rely on data living past free.
109 | How close to ideal is this?
110 | Depends on program, experiments show usually good (except for GUIs).
111 | Can we do better?
112 | Might be able to figure out last read through program analysis.
113 | Seems tricky to do in a general-purpose way.
114 | Programmers can manually annotate, or manually clear data.
115 |
116 | Secure deallocation in a process.
117 | Heap: zero out the memory in free().
118 | What about memory leaks? Rely on OS to clean up on process exit.
119 | Private allocators? Modify, or rely on reuse or returning memory to OS.
120 | Stack: two plans.
121 | 1. Augment the compiler to zero out stack frames on function return.
122 | 2. Periodically zero out memory below stack pointer, from the OS.
123 | Advantages / disadvantages:
124 | 1 is precise, but maybe expensive (CPU time, memory bandwidth).
125 | 2 is cheaper, but may not clear right away, or delete everything.
126 | 1 requires re-compiling code; 2 works with unmodified binaries.
127 | Static data in process memory: rely on OS to clean up on exit.
128 |
129 | Secure deallocation in the kernel.
130 | Can we apply the same plan as in the applications? Why or why not?
131 | Vague argument about kernel being performance-sensitive.
132 | Not clear exactly why this is (applications are also perf-sensitive?).
133 | What kinds of data do we want to clear in the kernel?
134 | Data that applications are processing: IO buffers, anon process memory.
135 | Not internal kernel data (e.g., pointers).
136 | Not application data that lives on disk (files, directories).
137 | Page allocation: track pages that contain sensitive data ("polluted").
138 | Three lists of free pages:
139 | - Zeroed pages.
140 | - Polluted non-zero pages.
141 | - Unpolluated non-zero pages.
142 | How is the polluted bit updated?
143 | Manually set in kernel code when page is used for process memory.
144 | Cleared when polluted free page is zeroed or overwritten.
145 | Smaller kernel objects: caller of kfree() must say if object is polluted.
146 | Objects presumably include network buffers, pipes, user IO, ..
147 | Memory allocator then erases data just like free() in user-space.
148 | Circular queues: semi-static allocation / specialized allocator.
149 | E.g., terminal buffers, PRNG inputs.
150 | Erase data when elements removed from queue.
151 |
152 | More efficient clearing of kernel memory.
153 | No numbers to explain why optimizations are needed, or which ones matter..
154 | Page zeroing: return different pages depending on callers to alloc.
155 | Insight: zeroed pages are "expensive", polluted pages are "cheap".
156 | 1. Can return polluted page if caller will overwrite entire page.
157 | E.g., new page to be used to read an entire page from disk.
158 | 2. Avoid returning zeroed pages if caller doesn't care about contents.
159 | If not enough memory, return zeroed page, or zero a polluted page.
160 | Cannot simply return polluted page: sensitive data may persist.
161 | Batch page zeroing: why?
162 | Allows the optimization of caller overwriting page to take place.
163 | May improve interactive performance, by deferring the cost of zeroing.
164 | Specialized zeroing strategies.
165 | Variable-length buffers: packets (implemented), path names (not).
166 | Clear out just the used part (e.g., 64 byte pkt in 1500-byte buffer).
167 |
168 | Side-effects of secure deallocation.
169 | Might make some bugs more predictable, or make bugs go away.
170 | Periodic stack clearing may make uninitialized stack bugs less predictable.
171 |
172 | Performance impact?
173 | Seems to be low, but a bit hard to tell what's going on in the kernel.
174 |
175 | What happens in a higher-level language (PHP, Javascript, ..)?
176 | May need to modify language runtime to erase stack.
177 | If runtime uses own allocator (typical), need to modify that as well.
178 | Otherwise, free() may be sufficient.
179 |
180 | How does garbage collection interact with secure deallocation?
181 | Reference-counting GC can free, erase objects fast in most cases.
182 | Periodic garbage collection may unnecessarily prolong data lifetime.
183 |
184 |
--------------------------------------------------------------------------------
/previous-years/l19-backtracker.txt:
--------------------------------------------------------------------------------
1 | Backtracking intrusions
2 | =======================
3 |
4 | Overall problem: intrusions are a fact of life.
5 | Will this ever change?
6 | Buggy code, weak passwords, wrong policies / permissions..
7 |
8 | What should an administrator do when the system is compromised?
9 | Detect the intrusion ("detection point").
10 | Result of this stage is a file, network conn, file name, or process.
11 | Find how the attacker got access ("entry point").
12 | This is what Backtracker helps with.
13 | Fix the problem that allowed the compromise
14 | (e.g., weak password, buggy program).
15 | Identify and revert any damage caused by intrusion
16 | (e.g., modified files, trojaned binaries, their side-effects, etc).
17 |
18 | How would an administrator detect the intrusion?
19 | Modified, missing, or unexpected file; unexpected or missing process.
20 | Could be manual (found extra process or corrupted file).
21 | Tripwire could point out unexpected changes to system files.
22 | Network traffic analysis could point out unexpected / suspicious packets.
23 | False positives is often a problem with intrusion detection.
24 |
25 | What good is finding the attacker's entry point?
26 | Curious administrator.
27 | In some cases, might be able to fix the problem that allowed compromise.
28 | User with a weak / compromised password.
29 | Bad permissions or missing firewall rules.
30 | Maybe remove or disable buggy program or service.
31 | Backtracker itself will not produce fix for buggy code.
32 | Can we tell what vulnerability the attacker exploited?
33 | Not necessarily: all we know is object name (process, socket, etc).
34 | Might not have binary for process, or data for packets.
35 | Probably a good first step if we want to figure out the extent of damage.
36 | Initial intrusion detection might only find a subset of changes.
37 | Might be able to track forward in the graph to find affected files.
38 |
39 | Do we need Backtracker to find out how the attacker gained access?
40 | Can look at disk state: files, system logs, network traffic logs, ..
41 | Files might not contain enough history to figure out what happened.
42 | System logs (e.g., Apache's log) might only contain network actions.
43 | System logs can be deleted, unless otherwise protected.
44 | Of course, this is also a problem for Backtracker.
45 | Network traffic logs may contain encrypted packets (SSL, SSH).
46 | If we have forward-secrecy, cannot decrypt packets after the fact.
47 |
48 | Backtracker objects
49 | Processes, files (including pipes and sockets), file names.
50 | How does Backtracker name objects?
51 | File name: pathname string.
52 | Canonical: no ".." or "." components.
53 | Unclear what happens to symlinks.
54 | File: device, inode, version#.
55 | Why track files and file names separately?
56 | Where does the version# come from?
57 | Why track pipes as an object, and not as dependency event?
58 | Process: pid, version#.
59 | Where does the version# come from?
60 | How long does Backtracker have to track the version# for?
61 |
62 | Backtracker events
63 | Process -> process: fork, exec, signals, debug.
64 | Process -> file: write, chmod, chown, utime, mmap'ed files, ..
65 | Process -> filename: create, unlink, rename, ..
66 | File -> process: read, exec, stat, open.
67 | Filename -> process: open, readdir, anything that takes a pathname.
68 | File -> filename, filename -> file: none.
69 | How does Backtracker name events?
70 | Not named explicitly.
71 | Event is a tuple (source-obj, sink-obj, time-start, time-end).
72 | What happens to memory-mapped files?
73 | Cannot intercept every memory read or write operation.
74 | Event for mmap starts at mmap time, ends at exit or exec.
75 | Implemented: process fork/exec, file read/write/mmap, network recv.
76 | In particular, none of the filename stuff.
77 |
78 | How does Backtracker avoid changing the system to record its log?
79 | Runs in a virtual machine monitor, intercept system calls.
80 | Extracts state from guest virtual machine:
81 | Event (look at system call registers).
82 | Currently running process (look at kernel memory for current PID).
83 | Object being accessed (look at syscall args, FD state, inode state).
84 | Logger has access to guest kernel's symbols for this purpose.
85 | How to track version# for inodes or pids?
86 | Might be able to use NFS generation numbers for inodes.
87 | Need to keep a shadow data structure for PIDs.
88 | Bump generation number when a PID is reused (exit, fork, clone).
89 |
90 | What do we have to trust?
91 | Virtual machine monitor trusted to keep the log safe.
92 | Kernel trusted to keep different objects isolated except for syscalls.
93 | What happens if kernel is compromised?
94 | Adversary gets to run arbitrary code in kernel.
95 | Might not know about some dependencies between objects.
96 | Can we detect kernel compromises?
97 | If accessed via certain routes (/dev/kmem, kernel module), then yes.
98 | More generally, kernel could have buffer overflow: hard to detect.
99 |
100 | Given the log, how does Backtracker find the entry point?
101 | Present the resulting dependency graph to the administrator.
102 | Ask administrator to find the entry point.
103 |
104 | Optimizations to make the graph manageable.
105 | Distinction: affecting vs. controlling an object.
106 | Many ways to affect execution (timing channels, etc).
107 | Adversary interested in controlling (causing specific code to execute).
108 | High-control vs. low-control events.
109 | Prototype does not track file names, file metadata, etc.
110 | Trim any events, objects that do not lead to detection point.
111 | Use event times to trim events that happened too late for detection point.
112 | Hide read-only files.
113 | Seems like an instance of a more general principle.
114 | Let's assume adversary came from the network.
115 | Then, can filter out any objects with no (transitive) socket deps.
116 | Hide nodes that do not provide any additional sources.
117 | Ultimate goal of graph: help administrator track down entry point.
118 | Some nodes add no new sources to the graph.
119 | More general than read-only files (above):
120 | Can have socket sources, as long as they're not new socket sources.
121 | E.g., shell spawning a helper process.
122 | Could probably extend to temporary files created by shell.
123 | Use several detection point.
124 | Sounds promising, but not really evaluated.
125 | Potentially unsound heuristics:
126 | Filter out low-control events.
127 | Filter out well-known objects that cause false positives.
128 | E.g., /var/log/utmp, /etc/mtab, ..
129 |
130 | How can an adversary elude Backtracker?
131 | Avoid detection.
132 | Use low-control events.
133 | Use events not monitored by Backtracker (e.g., ptrace).
134 | Log in over the network a second time.
135 | If using a newly-created account or back door, will probably be found.
136 | If using a password stolen via first compromise, might not be found.
137 | Compromise OS kernel.
138 | Compromise the event logger (in VM monitor).
139 | Intertwine attack actions with other normal events.
140 | Exploit heuristics: write attack code to /var/log/utmp and exec it.
141 | Read many files that were recently modified by others.
142 | Other recent modifications become candidate entry points for admin.
143 | Prolong intrusion.
144 | Backtracker stores fixed amount of log data (paper suggests months).
145 | Even before that, there may be changes that cause many dependencies.
146 | Legitimate software upgrades.
147 | Legitimate users being added to /etc/passwd.
148 | Much more difficult to track down intrusions across such changes.
149 |
150 | Can we fix file name handling?
151 | What to do with symbolic links?
152 | Is it sufficient to track file names?
153 | Renaming top-level directory loses deps for individual file names.
154 | More accurate model: file names in each directory; dir named by inode.
155 | Presumably not addressed in the paper because they don't implement it.
156 |
157 | How useful is Backtracker?
158 | Easy to use?
159 | Administrator needs to know a fair amount about system, Backtracker.
160 | After filtering, graphs look reasonably small.
161 | Reliable / secure?
162 | Probably works fine for current attacks.
163 | Determined attacker can likely bypass.
164 | Practical?
165 | Overheads probably low enough.
166 | Depends on VM monitor knowing specific OS version, symbols, ..
167 | Not clear what to do with kernel compromises
168 | Probably still OK for current attacks / malware.
169 | Would a Backtracker-like system help with Stuxnet?
170 | Need to track back across a ~year of logs.
171 | Need to track back across many machines, USB devices, ..
172 | Within a single server, may be able to find source (USB drive or net).
173 | Stuxnet did compromise the kernel, so hard to rely on log.
174 |
175 | Do we really need a VM?
176 | Authors used VM to do deterministic replay of attacks.
177 | Didn't know exactly what to log yet, so tried different logging techniques.
178 | In the end, mostly need an append-only log.
179 | Once kernel compromised, no reliable events anyway.
180 | Can send log entries over the network.
181 | Can provide an append-only log storage service in VM (simpler).
182 |
183 |
--------------------------------------------------------------------------------
/previous-years/l20-traceback.txt:
--------------------------------------------------------------------------------
1 | Denial of service attacks
2 | =========================
3 |
4 | What kinds of DoS attacks can an adversary mount?
5 | Exhaust resources of some service.
6 | Network bandwidth.
7 | CPU time (e.g., image processing, text searching, etc).
8 | Disk bandwidth (e.g., complex SQL queries touching a lot of data).
9 | Disk space, memory.
10 | Deny service by exploiting some vulnerability in protocol, application.
11 | In TCP, if adversary can guess TCP sequence numbers, can send RST.
12 | Terminates TCP connection.
13 | In 802.11, deauthenticate packets were (still?) not authenticated.
14 | Adversary can forge deauthenticate packets, disconnect client.
15 | In BGP, routers perform little authentication on route announcements.
16 | A year or so ago, Pakistan announced BGP route for Youtube.
17 | In April, China announced BGP routes for many addresses.
18 | Poorly-designed or poorly-implemented protocols or apps can be fixed.
19 | Resource exhaustion attacks are often harder to fix.
20 |
21 | Why do attackers mount DoS attacks?
22 | "Spite", but increasingly less so.
23 | Extortion. Force victim to incur cost of defense or downtime.
24 | Extortion (used to be?) relatively common for online gambling sites.
25 | High-value, time-sensitive, downtime is very costly.
26 |
27 | Network bandwidth DoS attacks.
28 | Adversary unlikely to directly have overwhelming network bandwidth.
29 | Thus, key goal for an adversary is amplification.
30 | One way to amplify bandwidth: reflection.
31 | Early trick: "smurf", send source-spoofed ICMP ping to broadcast addr.
32 | More likely today: source-spoofed UDP DNS queries.
33 | Why don't adversaries use TCP services for reflection?
34 | Higher-level amplification: compromise machines via malware, form botnet.
35 | Most prevalent today, can send well-formed TCP connections.
36 | Why are TCP connections more interesting for adversaries?
37 | Reflected ICMP, UDP packets much easier to filter out.
38 |
39 | CPU time attacks.
40 | Complex applications perform large amounts of computation for requests.
41 | SSL handshake, PDF generation, Google search, airline ticket searches.
42 | High-end DoS attackers do this routinely to incur maximum cost per request.
43 |
44 | Disk bandwidth attacks.
45 | Disk is often the slowest part of the system (100 seeks per second?)
46 | Systems optimized to avoid disk whenever possible: use caches.
47 | Caches work due to statistical distributions.
48 | Adversary can construct an unlikely distribution, ask for unpopular data.
49 | Caches no longer effective, many queries hit disk, system grinds to a halt.
50 | Hard to control, predict, or even detect.
51 |
52 | Space exhaustion attacks (disk space, memory).
53 | Once a user is authenticated, relatively easy to enforce quotas.
54 | Many protocols require servers to store state on behalf of unknown clients.
55 |
56 | How to defend against DoS attacks in general?
57 | Accountability: track down the attacker.
58 | Becoming harder to do, at a conceptual level, with botnets, Tor, ..
59 | Require authentication to access services.
60 | Lowest level (IP) does not provide authentication by default.
61 | Require clients to prove they've spent some resources.
62 | Might be plausible if adversary's goal is to exhaust server resources.
63 | Captchas.
64 | Cryptographic puzzles.
65 | Given challenge (C,n) find R so that low n bits of SHA1(C||R) are 0.
66 | Easy to synthesize challenge and verify answer.
67 | Easy to scale up the challenge, if under attack.
68 | Deliver/verify challenge over some protocol not susceptible to DoS.
69 | One slight problem: CPU speeds vary a lot.
70 | More memory-intensive puzzles also exist, might be more fair.
71 | Micropayments.
72 | Some "e-stamp" proposals tried, but micropayments are hard.
73 | Bandwidth (Speak-up by Mike Walfish).
74 | Big problem: adversary can get more resources through botnets.
75 |
76 | Specific problem: IP address spoofing.
77 | What's the precise problem?
78 | Adversary can put any IP address as source when sending packet.
79 | Not all networks perform sanity-checks on source IP addresses.
80 | Hard for victim to track down who is responsible for the traffic.
81 | What resources can adversary exhaust in this manner?
82 | Can send arbitrary packets, exhausting bandwidth.
83 | Can issue any queries to UDP services (e.g., DNS), exhausting CPU time.
84 | Cannot establish fully-open TCP connections (must guess sequence#).
85 | Can create half-open TCP conns, exhausting server memory (SYN flood).
86 | SYN flood problem: three-way TCP handshake (SYN, SYN-ACK, ACK).
87 | Server must keep state about the received SYN and sent SYN-ACK.
88 | Needed to figure out what connection the third ACK packet is for.
89 | One solution: use cryptography to off-load state onto the client.
90 | SYN cookies: encode server-side state into sequence number.
91 | seq = MAC(client & server IPs, ports, timestamp) || timestamp
92 | Server computes seq as above when sending SYN-ACK response.
93 | Server can verify state is intact by verifying hash (MAC) on ACK's seq.
94 | Not quite ideal: need to think about replay attacks within timestamp.
95 | Another problem: if third packet lost, noone retransmits.
96 | Maybe not a big deal in case of a DoS attack.
97 | Only a problem for protocols where server speaks first.
98 |
99 | What's the best we can hope for in an IP traceback scheme?
100 | No way to authenticate messages from any given router.
101 | Goal: suffix of the real attack path.
102 | Adversary is free to make up his or her own routers.
103 | Infact, this is realistic, since adversary may be an actual ISP.
104 | Rely on fact that adversary's packets must repeatedly traverse suffix.
105 |
106 | Typical constraints for deploying IP traceback, in order of increasing hardness:
107 | Routers are hard to change.
108 | Routers cannot do a lot of processing per packet.
109 | End-hosts are hard to change.
110 | Packets formats are nearly impossible to change.
111 |
112 | Manual tracing through the network.
113 | 1. Find a pattern for the attack packets (e.g., destination address).
114 | 2. Call up your ISP, ask them to tcpdump and say where packets come from.
115 | 3. Repeat calling up the next ISP and asking them to do the same.
116 | Slow, tedious, non-scalable, hard to get cooperation from far-away ISPs.
117 |
118 | Controlled flooding.
119 | Clever idea: flood individual links you suspect might be used by attack.
120 | See how the flood affects the incoming DoS packets.
121 | Potentially works for a single source of attack, but causes DoS by itself.
122 |
123 | Ideal packet marking: record every link traversed by a packet.
124 | Problem: requires a lot of space in each packet.
125 |
126 | Trade-off: record individual links with some probability ("edge sampling").
127 | Each packet gets marked with two link endpoints and a distance counter.
128 | How do we reconstruct the path from the individual links?
129 | How do we decide when to mark a packet? Small probability.
130 | What if the packet is already marked? Why overwrite?
131 | Why do we need a distance counter?
132 | Why do we need the two endpoints to each mark the packet with their own IP?
133 | Could have one router write down its own IP and the next hop's IP.
134 | However, routers have many interfaces, with a separate IP for each.
135 | Makes it difficult for end-node machine to piece together route.
136 | Don't know when two IPs belong to the same router.
137 |
138 | Making edge sampling work in IP packets.
139 | Challenge: encoding edge information into IP packet.
140 | Ideally, want to store 2 IPs (2 x 32 bits) and distance (8 bits).
141 | Authors only found space for 16 bits in the rarely-used fragment ID.
142 | Trick 1: Edge IDs.
143 | XOR the IPs of neighboring nodes into a single 32-bit edge ID.
144 | How much does this save us?
145 | How can we reconstruct the path?
146 | Start with first hop, keep XORing with increasingly larger distances.
147 | Trick 2: Integrity checking scheme to know when we've XORed the right IDs.
148 | Potential problem: attack may come from many sources.
149 | As a result, XORing with edge-ID of some distance may not be right.
150 | Approach: make IPs easy to verify, by bit-interleaving hash of IP.
151 | Can validate candidate IP addresses by checking their hash.
152 | Doesn't save us space (yet), only increases edge IDs to 64 bits.
153 | Trick 3: Break up edge IDs into fragments (e.g., 8 bit chunks of 64 bits).
154 | Encoding in the IP header:
155 | 3-bit offset (which 8-bit chunk out of 64-bit edge ID).
156 | 5-bit distance (up to traceback-enabled 32 hops away).
157 | 8-bit data (i.e., a particular fragment of the 64-bit edge ID).
158 | How to reconstruct?
159 | Know the right offset for each chunk, and the right distance.
160 | Try all combinations of offsets for given distance to match hash.
161 | Once we know IP address for one hop, move on to the next distance.
162 | Trick 4: What happens if the fragment-ID field is in use?
163 | Drop fragmented packet with some prob., replace with entire edge info.
164 | Probability needed for fragmented packets is less: no matching needed.
165 |
166 | How practical is the proposed IP traceback scheme?
167 | What happens if not all routers implement this scheme?
168 | How do we know when the traceback information stops being a legal suffix?
169 | How expensive is it to reconstruct edges from fragments?
170 |
171 |
--------------------------------------------------------------------------------
/previous-years/l17-vanish.txt:
--------------------------------------------------------------------------------
1 | Vanish
2 | ======
3 |
4 | Problem: sensitive data can be difficult to get rid of.
5 | Emails, shared documents, even files on a desktop computer.
6 | Adversary may get old data after they break in, gain access.
7 | Difficult to prevent certain kinds of "break ins": legal subpoenas, etc.
8 | Would like to have data become inaccessible after some period of time.
9 |
10 | How serious of a problem is this?
11 | Seems like there are some interesting use cases that the paper discusses.
12 | Especially useful for ensuring email messages cannot be recovered later on.
13 |
14 | Strawman 1: why not attach metadata with expiration date (e.g., email header)?
15 | Copies of data may be stored on servers: backups, logs (e.g., email).
16 | Even with no copies, data may be stored on broken machine: hard to erase.
17 | Adversary may be able to obtain sensitive data from those copies.
18 | Goal: do not require any explicit data deletion.
19 |
20 | Strawman 2: why not encrypt email messages with recipient's public key?
21 | Adversary may steal the user's private key.
22 | Adversary may use a court order or subpoena to obtain private key.
23 | Goal: ensure data is inaccessible even if recipient's key compromised.
24 |
25 | Strawman 3: why not use an online service specifically for this purpose?
26 | Simple service, in principle:
27 | Encrypt messages with a specified expiration time.
28 | Decrypt only ciphertexts whose expiration time is in the future.
29 | Service is trusted (if compromised, can recover old "expired" data).
30 | Security services were targeted by law enforcement in the past.
31 | E.g., Hushmail incident.
32 | Hard to deploy service specifically for an unknown new application.
33 | Difficult to justify resources for services that's not used yet.
34 | Goal: no new services.
35 |
36 | Strawman 4: why not use specialized hardware?
37 | Need a reliable source of time; TPM hardware does not provide one.
38 | In principle, smartcard could serve as distributed encrypt/decrypt service.
39 | If we can't use a standard TPM chip, difficult to deploy new hardware.
40 | Goal: no new hardware.
41 |
42 | Vanish design, step 1: reduce problem to limiting lifetime of random keys.
43 | To create a vanishing data object (VDO), create fresh data encryption key K.
44 | Encrypt the real data with this key: C = E_K(D).
45 | Strawman VDO is now (C, K).
46 | Next, we will make sure key K vanishes at the right time..
47 | Why is this step useful?
48 | 1. Need to worry about vanishing of a small, fixed-size object (key K).
49 | 2. The key K itself doesn't leak any information about data.
50 |
51 | Vanish design, step 2: store the secret key in a DHT.
52 | Quick aside on how DHTs work..
53 | Logical view:
54 | Many machines (e.g., ~1M for Vuze DHT) talk to each other.
55 | Store key-value pairs, where keys are 160-bit things called "indexes".
56 | Storage is distributed across the nodes in the DHT.
57 | (Thus, the name: distributed hash table.)
58 | API:
59 | lookup(index) -> set of nodes
60 | store(node, index, value) -> node stores the (index, value) entry
61 | get(node, index) -> value, if stored at that node
62 | The tricky function is lookup (others are just talking to one node).
63 | Vuze DHT works by constructing a single 160-bit address/name space.
64 | 160 bits works well, because it's large and fits a SHA-1 hash.
65 | Nodes get 160-bit identifiers (SHA-1 hash of, e.g., node's public key).
66 | Nodes are responsible for indexes near their own 160-bit ID.
67 | That is, lookup(index) returns nodes with IDs near index.
68 | Nodes talk to other nodes with nearby ID values, to replicate data.
69 | (Also need to talk to a few nodes far away, for lookup to work).
70 | Intermediate step (not quite Vanish):
71 | Choose random "access key" L.
72 | Store data key K at index L in the DHT.
73 | Strawman VDO is now (C, L).
74 | How to recover the VDO before it expires?
75 | Straightforward: fetch key K from index L in the DHT.
76 | What causes data to vanish?
77 | In the Vuze DHT, values expire after 8 hours (fixed timeout).
78 | More generally, DHTs experience churn (nodes join and leave the DHT).
79 | Once a node leaves DHT, it will re-join with a different ID.
80 | Difficult to track down nodes that used to store some index in the past.
81 | Why does Vanish choose an "access key" L instead of using, say, H(C)?
82 | Ensures that Vanish does not reduce security.
83 | The only things revealed to the DHT are random values (e.g., L and K).
84 | Not dependent on actual sensitive data (plaintext D or ciphertext C).
85 |
86 | Vanish design, step 3: split up the key into multiple pieces, store the pieces.
87 | Why does Vanish do this?
88 | 1. Individual nodes may go away prematurely, want reliability until timeout.
89 | 2. Individual nodes can be malicious, can be subpoenaed, can be buggy..
90 | Problem shown in Figure 4 (with N=1).
91 | Less than 100% availability before 8 hours. Why?
92 | More than 0% availability after 8 hours. Why?
93 |
94 | Secret sharing (by Adi Shamir).
95 | Given secret K, want to split it up into shares K_1, .., K_N.
96 | Given some threshold M of shares (<= N), should be able to reconstruct K.
97 | Construction: random polynomial of degree M-1, whose constant coeff is K.
98 | Assume we can operate mod some large constant (e.g. 2^128 for AES keys).
99 | Polynomial is f(x) = z_{M-1} x^{M-1} + .. + z_1 x^1 + K (mod 2^128).
100 | To generate N secret shares, compute f(1), f(2), .., f(N).
101 | To reconstruct secret given M shares, solve polynomial and compute f(0).
102 | With fewer than M shares, there is a unique solution for any f(0) value.
103 | This means adversary doesn't know what f(0)=K is, with
3 |
4 |
5 | Wireless Sensor Networks (notes by Marten van Dijk)
6 |
7 | Read: A. Perrig, R. Szewczyk, J.D. Tygar, V. Wen, and D.E. Culler, "SPINS: Security Protocols for Sensor Networks", Wireless Networks 8, 521-534, 2002.
8 |
9 |
10 | Model (assumptions, security requirements, possible threats):
11 |
12 |
13 | What is a sensor network? Thousands to millions of small sensors form self-organizing wireless networks. Sensors have limited processing power, storage, bandwidth, and energy (this gives low production costs). For example, use TinyOS, a small, event-driven OS, see Table 1. Serious security and privacy questions arise if third parties can read or tamper with sensor data.
14 |
15 |
16 | Examples: emergency response information, energy management, medical monitoring, logistics and inventory management, battlefield management.
17 |
18 |
19 | What are the differences between wireless sensor networks (WSN) and mobile ad hoc networks (MANET)? The number of sensor nodes in a WSN can be several orders of magnitude larger than the nodes in a MANET. Sensor nodes are densely deployed. Sensor nodes are prone to failures. The topology of a WSN changes very frequently. Sensor nodes mainly use a broadcast communication paradigm, whereas most MANETs are based on point-to-point communication. Sensor nodes are limited in processing power, storage, bandwidth, and energy.
20 |
21 |
22 | What are the components of a sensor node? Sensing unit with a sensor and analog-to-digital converter (ADC). Processor with storage. Transceiver. Power unit.
23 |
24 |
25 | What are the capabilities of a base station? More battery power, sufficient memory, means for communicating with outside networks.
26 |
27 |
28 | What are the trust assumptions? Individual sensors are untrusted. There is a known upper bound on the fraction of all sensors that are compromised. Communication infrastructure is untrusted (except that messages are delivered to the destination with non-negligible probability). Sensor nodes trust their base station. Each node trusts itself.
29 |
30 |
31 | What is the protocol stack? Physical layer: simple but robust modulation, transmission, and receiving techniques; responsible for frequency selection, carrier frequency generation, signal detection, modulation. Data link layer: medium access control (MAC) protocol must be power-aware and able to minimize collision with neighbors' broadcasts, MAC protocol in a wireless multi-hop self-organizing network creates the network infrastructure (topology changes due to node mobility and failure, periodic transmission of beacons allows nodes to create a routing topology) and efficiently shares communication resources between sensor nodes (both fixed allocation and random access versions have been proposed), data link layer also implements error control and data encryption + security. Network layer: routing the data supplied by the transport layer, provide internetworking with external networks, design principles are power efficiency, data aggregation useful only when it does not hinder the collaborative effort of the sensor nodes, attribute-based addressing and location awareness. Transport layer: helps to maintain the flow of data if the application requires it, especially needed when the system is planned to be accessed through the Internet or other external networks. Application layer: largely unexplored.
32 |
33 |
34 | What are performance metrics? Fault tolerance or reliability: is the ability to sustain sensor network functionalities without interruption due to sensor node failures (non-adversarial such as lack of power, physical damage, environmental interference), it is modeled as a Poisson distribution e^{-lambda*t} to capture the probability of not having a failure within the time interval (0,t). Scalability: ability to support larger networks, flexible against increase in the size of the network even after deployment, ability to utilize more dense networks (density gives the number of nodes within the transmission radius of each node; it equals N*pi*R^2/A, where N is the number of scattered sensor nodes in region A, and R is the radio transmission range). Efficiency: storage complexity (amount of memory required to store certificates, credentials, keys), processing complexity (amount of processor cycles required by security primitives and protocols), communication complexity (overhead in number and size of messages exchanged in order to provide security). Network connectivity: probability that two neighboring sensors are able to share a key (enough key connectivity is required in order to provide intended functionality). Network resilience: resistance against node capture; for each c and s, what is the probability that c compromised sensors can break s links (by reconstructing the corresponding shared secret keys)?
35 |
36 |
37 | What are the security requirements? Availability: ensure that service offered by the whole WSN, by any part of it, or by a single node must be available whenever required. Degradation of security services: ability to change security level as resource availability changes. Survivability: ability to provide a minimum level of service in the presence of power loss, failures, or attacks (need to thwart denial of service attacks).
38 |
39 |
40 | Authentication: authenticate other nodes, cluster heads, and base stations before granting a limited resource, or revealing information. Integrity: ensure that the message or entity under consideration is not altered (data integrity is achieved by data authentication). Freshness: ensure that each message is fresh, most recent (detect replay attacks).
41 |
42 |
43 | Confidentiality: providing privacy of the wireless communication channels (prevent information leakage by eavesdropping or covert channels), need semantic security, which ensures that an eavesdropper has no information about the plaintext, even if it sees multiple encryptions of the same plaintext (e.g., concatenate plaintext with a random bit string, this however requires sending more data and costs more energy). Non-repudiation: preventing malicious nodes from hiding their activities (e.g., they cannot refute the validity of a statement they signed).
44 |
45 |
46 | Solutions (SNEP, micro TESLA, Key Distribution):
47 |
48 |
49 | What are the limitations in designing security? Security needs to limit the consumption of processing power. Limited power supply limits the lifetime of keys. Working memory cannot hold the variables for asymmetric cryptographic algorithms such as RSA. High overhead to create and verify signatures. Need to limit communication.
50 |
51 |
52 | SNEP: A and B share a master key, which they use to derive an encryption keys K_AB and K_BA, and MAC keys K'_AB and K'_BA. A and B synchronize counter values C_A=C_B. Communication from A to B: {Data}_[K_AB,C_A] = Data XOR E_{K_AB}(C_A) together with MAC_{K'_AB}({Data}_[K_AB,C_A]||C_A), see Formula (1). The MAC computation is pictured in Figure 3 using CBC mode. This gives semantic security, data authentication, weak freshness (if the message verifies correctly, a receiver knows that the message must have been sent after the previous message it (the receiver) received correctly), low communication overhead (the counter value is not sent).
53 |
54 |
55 | Strong freshness: see Formula (2), if B request a message from A, then B transmits to A a nonce and A includes this nonce in the MAC of its communication to B. If the MAC verifies correctly, B knows that A generated the response after B sent the request.
56 |
57 |
58 | Synchronize counter values: see Section 5.2 for a simple bootstrapping protocol, at any time the above protocol with strong freshness can be used to request the current counter value. To prevent denial of service attacks, allow transmitting the counter with each encrypted message in the above protocols, or attach another short MAC to the message that does not depend on the counter.
59 |
60 |
61 | micro TESLA: authenticated broadcast requires an asymmetric mechanism, otherwise any compromised receiver could forge messages from the sender. How can this be done without asymmetric crypto? Introduce asymmetry through delayed disclosure of symmetric keys. Idea: base station uses MAC_K with a key unknown to sensor nodes, K is a key of a key chain (K_i = F(K_{i+1}), where F is a one-way function) through which it is committed to the base station (in a key chain, keys are self-authenticating), the key chain is revealed through delayed disclosure by the base station. The key disclosure time delay is on the order of a few time intervals and greater than any reasonable round trip time. Receiver node knows the key disclosure time. Each receiver node needs to have one authentic key of the one-way key chain as a commitment to the entire chain. Sender base station and receiver nodes are loosely time synchronized. Simple bootstrapping protocol using shared secret MAC keys, see Section 5.5.
62 |
63 |
64 | Nodes cannot store the keys of a key chain: node may broadcast data through the base station, or uses the base station to outsource key chain management.
65 |
66 |
67 | Key setup: master key shared by the base station and node. How do we do key distribution? There has been a lot of research providing solutions that have good resilience, connectivity, and scalability. Controversial solution: Key infection; bootstrapping does not need to be secure, it is about security maintenance in a stationary network. Idea: transmit symmetric keys in the clear and use secrecy amplification (and other mechanisms). In secrecy amplification two nodes A and B use a third neighboring node C to set up communication between A and B. This communication channel is protected by keys K_{A,C} and K_{C,B}. It is used to exchange a nonce N. A and B replace their key K_{A,B} by H(K_{A,B}||N) and verify whether they can use this new key. If K_{A,B} is know to an adversary, but keys K_{A,C} and K_{C,B} are not, then the adversary cannot extract the new K_{A,B}! This solution has been proposed for the battlefield management application.
68 |
69 |
70 | Related topics: RFID tags, social networks, TinyDB.
71 |
72 |
73 |
74 |
--------------------------------------------------------------------------------
/previous-years/l08-browser-security.txt:
--------------------------------------------------------------------------------
1 | Browser Security (guest lecture by Ramesh Chandra)
2 | ==================================================
3 |
4 | web app security
5 | server and client -- we'll mostly focus on client
6 |
7 | web apps: past vs. present
8 | past: mainly static content, simpler security model
9 | user interactions resulted in round-trips to server
10 | present: highly dynamic content with client-side code
11 | advantages: responsiveness, better functionality
12 | more complex security model
13 |
14 | threat model / assumptions
15 | attacker controls his/her own web site, attacker.com (sounds reasonable)
16 | attacker's web site is loaded in your browser (why is this reasonable?)
17 | attacker cannot intercept/inject packets into the network
18 | browser/server doesn't have buffer overflows
19 |
20 | security policy / goals
21 | 1: isolation of code from different sites
22 | javascript code runs in your browser, has access to lots of things
23 | need to have some way of isolating code from different sites
24 | attacker should not be able to get your bank balance, xfer money, ..
25 |
26 | 2: UI security -- user needs to know what site they're talking to
27 | phishing attacks are usually the biggest problem in this space
28 | without isolation of code from diff. sites, UI security is hopeless
29 | how do you know you're interacting with your bank vs. an attacker?
30 | (if security can avoid depending on this question, all the better!)
31 |
32 | we'll largely focus on the first (isolation of code) for now
33 |
34 | how does javascript fit into the web model?
35 | HTML elements
36 | script tags; inline and src=
37 | built-in objects like window, document, etc
38 | DOM
39 | HTML elements can invoke JS code: onClick, onLoad, ..
40 | single-threaded execution; event-driven programming style for network IO
41 | frames for composing/structuring
42 |
43 | browser security model
44 | principal: domain of the web content's URL
45 | http://a.com/b.html and http://a.com/c.html are the same principal
46 |
47 | protected resource: frame
48 | principal is the domain of frame's location URL
49 | all code in the frame runs as that principal
50 | doesn't matter where the code came from (e.g. script src=...)
51 | analogous to a process in Unix
52 |
53 | protection mechanisms:
54 | javascript references as capabilities
55 | may not be able to get references to other windows/frames
56 | but there are many objects with global names
57 | access control: same origin policy
58 | privileged functions implement their own protection
59 | e.g. postMessage, XMLHttpRequest, window.open()
60 |
61 | same-origin policy (SOP)
62 | intuition/goal: only code from origin X can manipulate resources from X
63 | frame A can poke at frame B's content only if they have the same principal
64 | why does the browser allow any cross-frame access at all?
65 | frames used for layout in addition to protection
66 | unfortunately, quite vague, and overly restrictive; shows in practice
67 | exceptions to get around restrictions:
68 | script, image, css src tags: why are these needed?
69 | frame navigation
70 |
71 |
72 | frame navigation
73 | problem: navigating a frame is a special operation not governed by SOP
74 | subject to other access control rules, which this paper talks about
75 | why does the browser allow this in the first place?
76 | might have navigation links in one frame, other sites in another
77 |
78 | what goes wrong if attacker.com can navigate another frame?
79 | can substitute a phishing page for the login frame of another site (eg. bank)
80 | why doesn't the SSL icon "go away"? rule: all pages came via SSL
81 | reasoning: original site included the other origin explicitly?
82 | how does the attacker get a handle on that sub-frame?
83 | global name space of frame/window names
84 | more difficult in current browser -- firefox has per-frame name space
85 | of frame names
86 |
87 | what's their proposed fix?
88 | window policy: can only navigate frames in the same window
89 | can still mount the attack on another site if you open it within a window
90 | why is this still OK? no correct answer; mostly because of the URL bar
91 |
92 | mash-ups
93 | idea: combine data from multiple sites/origins
94 | eg: iGoogle combines frames from many developers in the same page
95 | terminology: the whole site is a "mashup"
96 | iGoogle is an "integrator"
97 | all the little boxes that are included in the page are "gadgets"
98 | what are the problems that we run into?
99 | one site's code in one frame can navigate another site's frame
100 | window policy is of no help
101 | why does it matter? UI for login, again
102 |
103 | better policy: descendant/child policy
104 | why do they argue the descendant policy is just as good as child?
105 | in theory, parent can cover up any descendant with a floating object
106 | when is child a better choice?
107 | later examples where site wants to know it's talking to the right child
108 | i.e. cases when the worry isn't the UI issues
109 | origin propagation:
110 | what's the reasoning for this?
111 | would this occur in real sites? frames used for side-by-side structure
112 |
113 | cross-origin frame communication
114 | when would you need it? mashups where origins interact
115 | why do origins need to interact on the client? can we push interactions to
116 | server-side?
117 | cleaner design => easier to implement
118 | avoid extra round trips => more responsive app
119 | better integration => better user experience
120 | nice example: yelp wants to use google maps
121 | mutually distrustful (in theory, at least)
122 | alternative 1: map in another frame (open it to some location), no feedback
123 | alternative 2: map in the same frame (script src=), no protection
124 | yelp does this today
125 | alternative 3: map in one frame, yelp in another frame, communication btwn
126 |
127 | threat model: in addition to threat model described above, we assume:
128 | attacker's gadget can load honest gadget in a subframe
129 | attacker's gadget can communicate with integrator and honest gadget
130 |
131 | goal: secure, reliable communication between origins
132 |
133 | how does frame communication work?
134 | plan 1: exploiting a covert channel! (fragment channel)
135 | problem: no authentication (where did a message come from?)
136 | workaround: treat as a network, run authentication protocol
137 | all 3 impls these guys looked at had the same bug
138 |
139 | protocol: nonces, include sender's ID (rcpt doesn't know sender)
140 | idea: each side generates a nonce, gives it to the other side
141 | if someone gives you a message w/ nonce, it came from other side
142 |
143 | what's the possible attack?
144 | attacker can impersonate integrator when talking to gadget
145 |
146 | why does it matter? gadget might have policies for diff. sites
147 | OK to add your contacts list gadget into facebook, access it
148 | not OK to access your contact list gadget by other sites
149 |
150 | how does the attack work?
151 | relay initial message to the gadget
152 | gadget replies back to the integrator
153 | integrator sends gadget's nonce to attacker,
154 | to prove it's the integrator sending the msg
155 | now the attacker has both nonces, can impersonate in both dir'n
156 | might not be able to intercept msgs from gadget, though
157 | they're sent directly to integrator's URI
158 | fix is well-known: include URI (name) in second response too
159 |
160 | plan 2: browser developers designed a special mechanism for it
161 | frame.postMessage("hello")
162 | paper claims this provides authentication but not privacy; how come?
163 | frame can re-navigate without sender's knowledge
164 | how can this happen?
165 | sender was itself in a sub-frame of attacker's site
166 | descendant policy allows attacker to access all sub*-frames
167 | why didn't the fragment channel have this problem?
168 | tight binding between message and recipient (url#msg)
169 | solution: make the binding explicit
170 |
171 | protected resource: cookie
172 | how does HTTP authentication work?
173 | browser keeps track of a session "cookie" -- arbitrary blob from server
174 | sends the cookie along with every request to that server
175 | cookie often includes username and authentication proof
176 | inside browser, same-origin policy protects cookies like frames
177 | cookie stored in document.cookie
178 | can only access cookies for your own origin
179 |
180 | possible attack: generate requests to xfer money from attacker.com
181 |
182 |
183 | solution: spaghetti-rules
184 | hard to prevent GET requests, so allow those (e.g. img tags)
185 | protect from malicious ops: include some non-cookie token in the request
186 | protect bank account balance: only see responses from the same origin
187 | except that's not quite true either
188 | script src= tags run code
189 | style src= tags load CSS style-sheets, also visible
190 | so, to protect sensitive data, make sure it doesn't parse as JS or CSS?
191 |
192 | another mechanism to secure mashups: safe subset of javascript
193 | eg: FBJS, ADSafe, Caja
194 | Facebook javascript (FBJS): compiles gadget down to a safe subset of JS
195 | per gadget name space
196 | accesses to global name space through secure wrappers
197 | intercepts all events and proxies AJAX requests thru FB
198 | gadget is embedded into FB and needs to trust FB
199 |
200 | takeaways
201 | web security lacks unifying set of principles
202 | policies such as SOP have many exceptions
203 | different browsers / runtimes (e.g. Flash) implement different policies
204 | confusing to web developers
205 | supporting existing web sites makes deploying fundamental fixes difficult
206 | lesson: think about security early on in the design
207 |
208 |
--------------------------------------------------------------------------------
/l08-my-web-security.md:
--------------------------------------------------------------------------------
1 | Web security
2 | ============
3 |
4 | Web security for a long time meant looking at what the server was doing, since the client-side was very simple. On the server, CGI scripts were executed and they interfaced with DBs, etc.
5 |
6 | These days, browsers are very complicated:
7 |
8 | * JavaScript: pages execute client-side code
9 | * The Document Object Model (DOM)
10 | * XMLHttpRequests: a way for JavaScript client-side code to fetch content from the web-server asynchronously
11 | - a.k.a AJAX
12 | * Web Sockets
13 | * Multimedia support (the `