├── README.md
├── papers
    ├── baggy.pdf
    ├── brop.pdf
    ├── klee.pdf
    ├── nacl.pdf
    ├── okws.pdf
    ├── urweb.pdf
    ├── android.pdf
    ├── capsicum.pdf
    ├── kerberos.pdf
    ├── forcehttps.pdf
    ├── medical-sw.pdf
    ├── passwords.pdf
    ├── taintdroid.pdf
    ├── tor-design.pdf
    ├── owasp-top-10.pdf
    ├── trajectories.pdf
    ├── brumley-timing.pdf
    ├── confused-deputy.pdf
    ├── lookback-tcpip.pdf
    ├── private-browsing.pdf
    ├── passwords-extended.pdf
    └── .htaccess
├── Makefile
├── .htaccess
├── old-quizzes.md
├── old-quizzes.html
├── quiz2-tor.md
├── quiz2-tor.html
├── README.html
├── quiz2-medical-dev.md
├── index.md
├── previous-years
    ├── l12-resin.txt
    ├── l14-resin.txt
    ├── l22-usability-2.txt
    ├── l21-captcha.txt
    ├── l20-bots.txt
    ├── l23-voting.txt
    ├── l21-dropbox.txt
    ├── l18-dealloc.txt
    ├── l19-backtracker.txt
    ├── l20-traceback.txt
    ├── l17-vanish.txt
    ├── l07-xfi.txt
    ├── l11-spins.html
    ├── l08-browser-security.txt
    ├── l22-usability.txt
    ├── l19-cryptdb.txt
    ├── l06-java.txt
    └── l10-memauth.html
├── quiz2-medical-dev.html
├── index.html
└── l08-my-web-security.md


/README.md:
--------------------------------------------------------------------------------
1 | index.md


--------------------------------------------------------------------------------
/papers/baggy.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/baggy.pdf


--------------------------------------------------------------------------------
/papers/brop.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/brop.pdf


--------------------------------------------------------------------------------
/papers/klee.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/klee.pdf


--------------------------------------------------------------------------------
/papers/nacl.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/nacl.pdf


--------------------------------------------------------------------------------
/papers/okws.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/okws.pdf


--------------------------------------------------------------------------------
/papers/urweb.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/urweb.pdf


--------------------------------------------------------------------------------
/papers/android.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/android.pdf


--------------------------------------------------------------------------------
/papers/capsicum.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/capsicum.pdf


--------------------------------------------------------------------------------
/papers/kerberos.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/kerberos.pdf


--------------------------------------------------------------------------------
/papers/forcehttps.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/forcehttps.pdf


--------------------------------------------------------------------------------
/papers/medical-sw.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/medical-sw.pdf


--------------------------------------------------------------------------------
/papers/passwords.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/passwords.pdf


--------------------------------------------------------------------------------
/papers/taintdroid.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/taintdroid.pdf


--------------------------------------------------------------------------------
/papers/tor-design.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/tor-design.pdf


--------------------------------------------------------------------------------
/papers/owasp-top-10.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/owasp-top-10.pdf


--------------------------------------------------------------------------------
/papers/trajectories.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/trajectories.pdf


--------------------------------------------------------------------------------
/papers/brumley-timing.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/brumley-timing.pdf


--------------------------------------------------------------------------------
/papers/confused-deputy.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/confused-deputy.pdf


--------------------------------------------------------------------------------
/papers/lookback-tcpip.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/lookback-tcpip.pdf


--------------------------------------------------------------------------------
/papers/private-browsing.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/private-browsing.pdf


--------------------------------------------------------------------------------
/papers/passwords-extended.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/passwords-extended.pdf


--------------------------------------------------------------------------------
/papers/.htaccess:
--------------------------------------------------------------------------------
1 | # Protect the htaccess file
2 | <Files .htaccess>
3 | Order Allow,Deny
4 | Deny from all
5 | </Files>
6 | 
7 | # Enable directory browsing
8 | Options All Indexes
9 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | SRCS=$(wildcard *.md)
 2 | 
 3 | HTMLS=$(SRCS:.md=.html)
 4 | 
 5 | %.html: %.md
 6 | 	@echo "Compiling $< -> $*"
 7 | 	markdown $< >$*.html
 8 | 
 9 | all: $(HTMLS)
10 | 	@echo "HTMLs: $(HTMLS)"
11 | 	@echo "MDs: $(SRCS)"
12 | 


--------------------------------------------------------------------------------
/.htaccess:
--------------------------------------------------------------------------------
 1 | # Protect the htaccess file
 2 | <Files .htaccess>
 3 | Order Allow,Deny
 4 | Deny from all
 5 | </Files>
 6 | 
 7 | # Protect .git/
 8 | <Files .git>
 9 | Order Allow,Deny
10 | Deny from all
11 | </Files>
12 | 
13 | <Files .gitmodules>
14 | Order Allow,Deny
15 | Deny from all
16 | </Files>
17 | 
18 | <Files .gitignore>
19 | Order Allow,Deny
20 | Deny from all
21 | </Files>
22 | 
23 | <Files README.md>
24 | Order Allow,Deny
25 | Deny from all
26 | </Files>
27 | 
28 | <Files README.html>
29 | Order Allow,Deny
30 | Deny from all
31 | </Files>
32 | 
33 | # Disable directory browsing
34 | Options All -Indexes
35 | 


--------------------------------------------------------------------------------
/old-quizzes.md:
--------------------------------------------------------------------------------
 1 | Some questions may already be [here](http://css.csail.mit.edu/6.858/2014/quiz.html)
 2 | 
 3 | Quiz 2 2011
 4 | -----------
 5 | 
 6 | Q8: An "Occupy Northbridge" protestor has set up a Twitter
 7 | account to broadcast messages under an assumed name. In
 8 | order to remain anonymous, he decides to use Tor to log into
 9 | the account.  He installs Tor on his computer (from a
10 | trusted source) and enables it, launches Firefox, types in
11 | www.twitter.com into his browser, and proceeds to log in.
12 | What adversaries may be able to now compromise the protestor
13 | in some way as a result of him using Tor? Ignore security
14 | bugs in the Tor client itself.
15 | 
16 | A8: The protestor is vulnerable to a malicious exit node
17 | intercepting his non-HTTPS-protected connection. (Since Tor
18 | involves explicitly proxying through an exit node, this is
19 | easier than intercepting HTTP over the public internet.)
20 | 
21 | 
22 | Q9: The protestor now uses the same Firefox browser to
23 | connect to another web site that hosts a discussion forum,
24 | also via Tor (but only after building a fresh Tor circuit).
25 | His goal is to ensure that Twitter and the forum cannot
26 | collude to determine that the same person accessed Twitter
27 | and the forum. To avoid third-party tracking, he deletes all
28 | cookies, HTML5 client-side storage, history, etc.  from his
29 | browser between visits to different sites. How could an
30 | adversary correlate his original visit to Twitter and his
31 | visit to the forum, assuming no software bugs, and a large
32 | volume of other traffic to both sites?
33 | 
34 | A9: An adversary can fingerprint the protestor's browser,
35 | using the user-agent string, the plug-ins installed on that
36 | browser, window dimensions, etc., which may be enough to
37 | strongly correlate the two visits.
38 | 
39 | ---
40 | 
41 | Quiz 2, 2012
42 | ------------
43 | 
44 | Q2: Alyssa wants to learn the identity of a hidden service
45 | running on Tor. She plans to set up a malicious Tor OR, set
46 | up a rendezvous point on that malicious Tor OR, and send
47 | this rendezvous point's address to the introduction point of
48 | the hidden service. Then, when the hidden service connects
49 | to the malicious rendezvous point, the malicious Tor OR will
50 | record where the connection is coming from.
51 | 
52 | Will Alyssa's plan work? Why or why not?
53 | 
54 | A2: Will not work. A new Tor circuit is constructed between
55 | 


--------------------------------------------------------------------------------
/old-quizzes.html:
--------------------------------------------------------------------------------
 1 | <p>Some questions may already be <a href="http://css.csail.mit.edu/6.858/2014/quiz.html">here</a></p>
 2 | 
 3 | <h2>Quiz 2 2011</h2>
 4 | 
 5 | <p>Q8: An "Occupy Northbridge" protestor has set up a Twitter
 6 | account to broadcast messages under an assumed name. In
 7 | order to remain anonymous, he decides to use Tor to log into
 8 | the account.  He installs Tor on his computer (from a
 9 | trusted source) and enables it, launches Firefox, types in
10 | www.twitter.com into his browser, and proceeds to log in.
11 | What adversaries may be able to now compromise the protestor
12 | in some way as a result of him using Tor? Ignore security
13 | bugs in the Tor client itself.</p>
14 | 
15 | <p>A8: The protestor is vulnerable to a malicious exit node
16 | intercepting his non-HTTPS-protected connection. (Since Tor
17 | involves explicitly proxying through an exit node, this is
18 | easier than intercepting HTTP over the public internet.)</p>
19 | 
20 | <p>Q9: The protestor now uses the same Firefox browser to
21 | connect to another web site that hosts a discussion forum,
22 | also via Tor (but only after building a fresh Tor circuit).
23 | His goal is to ensure that Twitter and the forum cannot
24 | collude to determine that the same person accessed Twitter
25 | and the forum. To avoid third-party tracking, he deletes all
26 | cookies, HTML5 client-side storage, history, etc.  from his
27 | browser between visits to different sites. How could an
28 | adversary correlate his original visit to Twitter and his
29 | visit to the forum, assuming no software bugs, and a large
30 | volume of other traffic to both sites?</p>
31 | 
32 | <p>A9: An adversary can fingerprint the protestor's browser,
33 | using the user-agent string, the plug-ins installed on that
34 | browser, window dimensions, etc., which may be enough to
35 | strongly correlate the two visits.</p>
36 | 
37 | <hr />
38 | 
39 | <h2>Quiz 2, 2012</h2>
40 | 
41 | <p>Q2: Alyssa wants to learn the identity of a hidden service
42 | running on Tor. She plans to set up a malicious Tor OR, set
43 | up a rendezvous point on that malicious Tor OR, and send
44 | this rendezvous point's address to the introduction point of
45 | the hidden service. Then, when the hidden service connects
46 | to the malicious rendezvous point, the malicious Tor OR will
47 | record where the connection is coming from.</p>
48 | 
49 | <p>Will Alyssa's plan work? Why or why not?</p>
50 | 
51 | <p>A2: Will not work. A new Tor circuit is constructed between</p>
52 | 


--------------------------------------------------------------------------------
/quiz2-tor.md:
--------------------------------------------------------------------------------
 1 | Tor
 2 | ===
 3 | ---
 4 | ## Resources
 5 | 
 6 |   * [Paper](http://css.csail.mit.edu/6.858/2014/readings/tor-design.pdf)
 7 |   * Blog posts: [1](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-1), [2](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-2), [3](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-3)
 8 |   * [Lecture note from 2012](http://css.csail.mit.edu/6.858/2012/lec/l16-tor.txt)
 9 |   * [Old quizzes](http://css.csail.mit.edu/6.858/2014/quiz.html)
10 | 
11 | ---
12 | 
13 | ## Overview
14 | 
15 |  - Goals
16 |  - Mechanisms
17 |    * Streams/Circuits
18 |    * Rendezvous Points & Hidden services
19 |  - Directory Servers
20 |  - Attacks & Defenses
21 |  - Practice Problems
22 | 
23 | ---
24 | 
25 | ## Goals
26 | 
27 |  - Anonymous communication
28 |  - Responder anonymity
29 |    * If I run a service like "mylittleponey.com" I don't want anyone
30 |      associating me with that service
31 |  - Deployability / usability
32 |    * Why a security goal? 
33 |      + Because it increases the # of people using Tor, i.e. the _anonimity set_
34 |        - ...which in turn increases security
35 |          * (adversary has more people to distinguish you amongst)
36 |  - TCP layer (Why? See explanations in lecture notes above)
37 |  - **NOT** P2P (because more vulnerable?)
38 | 
39 | ---
40 | 
41 | ## Circuit creation
42 | 
43 | TODO: Define circuit
44 | 
45 | Alice multiplexes many TCP streams onto a few _circuits_. Why? Low-latency system, expensive to make new circuit.
46 | 
47 | TODO: Define Onion Router (OR)
48 | 
49 | _Directory server_: State of network, OR public keys, OR IPs
50 | 
51 | ORs:
52 | 
53 |  - All connected to one another with TLS
54 |  - See blog post 1: Authorities vote on consensus directory document
55 | 
56 | Example:
57 | 
58 |     [ Draw example of Alice building a new circuit ]
59 |     [ and connecting to Twitter.                   ]
60 | 
61 | ---
62 | 
63 | ## Rendezvous Points & Hidden services
64 | 
65 | Example: 
66 | 
67 |     [ Add an example of Alice connecting to Bob's  ]
68 |     [ hidden service on Tor                        ]
69 | 
70 | Bob runs hidden service (HS): 
71 | 
72 |   - Decides on long term PK/SK pair
73 |   - Publish introduction points, advertises on lookup service
74 |   - Builds a circuit to _Intro Points_, waits for messages
75 | 
76 | Alice wants to connect to Bob's HS:
77 | 
78 |  - Build circuit to new _Rendezvous Point (RP)_ (any OR)
79 |    * Gives _cookie_ to RP
80 |  - Builds circuit to one of Bob's intro points and sends message
81 |    * with `{RP, Cookie, g^x}_PK(Bob)`
82 |  - Bob builds circuit to RP, sends `{ cookie, g^y, H(K)}`
83 |  - RP connects Alice and Bob
84 | 


--------------------------------------------------------------------------------
/quiz2-tor.html:
--------------------------------------------------------------------------------
  1 | <h1>Tor</h1>
  2 | 
  3 | <hr />
  4 | 
  5 | <h2>Resources</h2>
  6 | 
  7 | <ul>
  8 | <li><a href="http://css.csail.mit.edu/6.858/2014/readings/tor-design.pdf">Paper</a></li>
  9 | <li>Blog posts: <a href="https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-1">1</a>, <a href="https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-2">2</a>, <a href="https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-3">3</a></li>
 10 | <li><a href="http://css.csail.mit.edu/6.858/2012/lec/l16-tor.txt">Lecture note from 2012</a></li>
 11 | <li><a href="http://css.csail.mit.edu/6.858/2014/quiz.html">Old quizzes</a></li>
 12 | </ul>
 13 | 
 14 | <hr />
 15 | 
 16 | <h2>Overview</h2>
 17 | 
 18 | <ul>
 19 | <li>Goals</li>
 20 | <li>Mechanisms
 21 | <ul>
 22 | <li>Streams/Circuits</li>
 23 | <li>Rendezvous Points &amp; Hidden services</li>
 24 | </ul></li>
 25 | <li>Directory Servers</li>
 26 | <li>Attacks &amp; Defenses</li>
 27 | <li>Practice Problems</li>
 28 | </ul>
 29 | 
 30 | <hr />
 31 | 
 32 | <h2>Goals</h2>
 33 | 
 34 | <ul>
 35 | <li>Anonymous communication</li>
 36 | <li>Responder anonymity
 37 | <ul>
 38 | <li>If I run a service like "mylittleponey.com" I don't want anyone
 39 | associating me with that service</li>
 40 | </ul></li>
 41 | <li>Deployability / usability
 42 | <ul>
 43 | <li>Why a security goal? 
 44 | <ul>
 45 | <li>Because it increases the # of people using Tor, i.e. the <em>anonimity set</em></li>
 46 | <li>...which in turn increases security
 47 | <ul>
 48 | <li>(adversary has more people to distinguish you amongst)</li>
 49 | </ul></li>
 50 | </ul></li>
 51 | </ul></li>
 52 | <li>TCP layer (Why? See explanations in lecture notes above)</li>
 53 | <li><strong>NOT</strong> P2P (because more vulnerable?)</li>
 54 | </ul>
 55 | 
 56 | <hr />
 57 | 
 58 | <h2>Circuit creation</h2>
 59 | 
 60 | <p>TODO: Define circuit</p>
 61 | 
 62 | <p>Alice multiplexes many TCP streams onto a few <em>circuits</em>. Why? Low-latency system, expensive to make new circuit.</p>
 63 | 
 64 | <p>TODO: Define Onion Router (OR)</p>
 65 | 
 66 | <p><em>Directory server</em>: State of network, OR public keys, OR IPs</p>
 67 | 
 68 | <p>ORs:</p>
 69 | 
 70 | <ul>
 71 | <li>All connected to one another with TLS</li>
 72 | <li>See blog post 1: Authorities vote on consensus directory document</li>
 73 | </ul>
 74 | 
 75 | <p>Example:</p>
 76 | 
 77 | <pre><code>[ Draw example of Alice building a new circuit ]
 78 | [ and connecting to Twitter.                   ]
 79 | </code></pre>
 80 | 
 81 | <hr />
 82 | 
 83 | <h2>Rendezvous Points &amp; Hidden services</h2>
 84 | 
 85 | <p>Example: </p>
 86 | 
 87 | <pre><code>[ Add an example of Alice connecting to Bob's  ]
 88 | [ hidden service on Tor                        ]
 89 | </code></pre>
 90 | 
 91 | <p>Bob runs hidden service (HS): </p>
 92 | 
 93 | <ul>
 94 | <li>Decides on long term PK/SK pair</li>
 95 | <li>Publish introduction points, advertises on lookup service</li>
 96 | <li>Builds a circuit to <em>Intro Points</em>, waits for messages</li>
 97 | </ul>
 98 | 
 99 | <p>Alice wants to connect to Bob's HS:</p>
100 | 
101 | <ul>
102 | <li>Build circuit to new <em>Rendezvous Point (RP)</em> (any OR)
103 | <ul>
104 | <li>Gives <em>cookie</em> to RP</li>
105 | </ul></li>
106 | <li>Builds circuit to one of Bob's intro points and sends message
107 | <ul>
108 | <li>with <code>{RP, Cookie, g^x}_PK(Bob)</code></li>
109 | </ul></li>
110 | <li>Bob builds circuit to RP, sends <code>{ cookie, g^y, H(K)}</code></li>
111 | <li>RP connects Alice and Bob</li>
112 | </ul>
113 | 


--------------------------------------------------------------------------------
/README.html:
--------------------------------------------------------------------------------
 1 | <h1>Computer systems security notes (6.858, Fall 2014)</h1>
 2 | 
 3 | <p>Lecture notes from 6.858, taught by <a href="http://people.csail.mit.edu/nickolai/">Prof. Nickolai Zeldovich</a> and <a href="http://research.microsoft.com/en-us/people/mickens/">Prof. James Mickens</a> in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 <a href="http://css.csail.mit.edu/6.858/2014/schedule.html">course website</a>.</p>
 4 | 
 5 | <ul>
 6 | <li>Lecture <strong>1</strong>: <a href="l01-intro.html">Introduction</a>: what is security, what's the point, no perfect security, policy, threat models, assumptions, mechanism, buffer overflows</li>
 7 | <li>Lecture <strong>2</strong>: <a href="l02-baggy.html">Control hijacking attacks</a>: buffer overflows, stack canaries, bounds checking, electric fences, fat pointers, shadow data structure, Jones &amp; Kelly, baggy bounds checking</li>
 8 | <li>Lecture <strong>3</strong>: <a href="l03-brop.html">More baggy bounds and return oriented programming</a>: costs of bounds checking, non-executable memory, address-space layout randomization (ASLR), return-oriented programming (ROP), stack reading, blind ROP, gadgets</li>
 9 | <li>Lecture <strong>4</strong>: <a href="l04-okws.html">OKWS</a>: privilege separation, Linux discretionary access control (DAC), UIDs, GIDs, setuid/setgid, file descriptors, processes, the Apache webserver, chroot jails, remote procedure calls (RPC)</li>
10 | <li>Lecture <strong>5</strong>: <strong>Penetration testing</strong> <em>guest lecture</em> by Paul Youn, iSEC Partners</li>
11 | <li>Lecture <strong>6</strong>: <a href="l06-capsicum.html">Capsicum</a>: confused deputy problem, ambient authority, capabilities, sandboxing, discretionary access control (DAC), mandatory access control (MAC), Capsicum</li>
12 | <li>Lecture <strong>7</strong>: <a href="l07-nacl.html">Native Client (NaCl)</a>: sandboxing x86 native code, software fault isolation, reliable disassembly, x86 segmentation</li>
13 | <li>Lecture <strong>8</strong>: <a href="l08-web-security.html">Web Security, Part I</a>: modern web browsers, same-origin policy, frames, DOM nodes, cookies, cross-site request forgery (CSRF) attacks, DNS rebinding attacks, browser plugins</li>
14 | <li>Lecture <strong>9</strong>: <a href="l09-web-defenses.html">Web Security, Part II</a>: cross-site scripting (XSS) attacks, XSS defenses, SQL injection atacks, Django, session management, cookies, HTML5 local storage, HTTP protocol ambiguities, covert channels</li>
15 | <li>Lecture <strong>10</strong>: <strong>Symbolic execution</strong> <em>guest lecture</em> by Prof. Armando Solar-Lezama, MIT CSAIL</li>
16 | <li>Lecture <strong>11</strong>: <strong>Ur/Web</strong> <em>guest lecture</em> by Prof. Adam Chlipala, MIT, CSAIL</li>
17 | <li>Lecture <strong>12</strong>: <a href="l12-tcpip.html">TCP/IP security</a>: threat model, sequence numbers and attacks, connection hijacking attacks, SYN flooding, bandwidth amplification attacks, routing</li>
18 | <li>Lecture <strong>13</strong>: <a href="l13-kerberos.html">Kerberos</a>: Kerberos architecture and trust model, tickets, authenticators, ticket granting servers, password-changing, replication, network attacks, forward secrecy</li>
19 | <li>Lecture <strong>14</strong>: <a href="l14-forcehttps.html">ForceHTTPS</a>: certificates, HTTPS, Online Certificate Status Protocol (OCSP), ForceHTTPS</li>
20 | <li>Lecture <strong>15</strong>: <strong>Medical software</strong> <em>guest lecture</em> by Prof. Kevin Fu, U. Michigan</li>
21 | <li>Lecture <strong>16</strong>: <a href="l16-timing-attacks.html">Timing attacks</a>: side-channel attacks, RSA encryption, RSA implementation, modular exponentiation, Chinese remainder theorem (CRT), repeated squaring, Montgomery representation, Karatsuba multiplication, RSA blinding, other timing attacks</li>
22 | <li>Lecture <strong>17</strong>: <a href="l17-authentication.html">User authentication</a>: what you have, what you know, what you are, passwords, challenge-response, usability, deployability, security, biometrics, multi-factor authentication (MFA), MasterCard's CAP reader</li>
23 | <li>Lecture <strong>18</strong>: <a href="l18-priv-browsing.html">Private browsing</a>: private browsing mode, local and web attackers, VM-level privacy, OS-level privacy,  OS-level privacy, what browsers implement, browser extensions </li>
24 | <li>Lecture <strong>19</strong>: <strong>Tor</strong> <em>guest lecture</em> by Nick Mathewson, Tor Project
25 | <ul>
26 | <li>6.858 notes from 2012 on <a href="l19-tor.html">Anonymous communication</a>: onion routing, Tor design, Tor circuits, Tor streams, Tor hidden services, blocking Tor, dining cryptographers networks (DC-nets)</li>
27 | </ul></li>
28 | <li>Lecture <strong>20</strong>: <a href="l20-android.html">Mobile phone security</a>: Android applications, activities, services, content providers, broadcast receivers, intents, permissions, labels, reference monitor, broadcast intents</li>
29 | <li>Lecture <strong>21</strong>: <a href="l21-taintdroid.html">Information flow tracking</a>: TaintDroid, Android data leaks, information flow control, taint tracking, taint flags, implicit flows, x86 taint tracking, TightLip</li>
30 | <li>Lecture <strong>22</strong>: <strong>MIT's IS&amp;T</strong> <em>guest lecture</em> by Mark Silis and David LaPorte, MIT IS&amp;T</li>
31 | <li>Lecture <strong>23</strong>: <a href="l23-click-trajectories.html">Security economics</a>: economics of cyber-attacks, the spam value chain, advertising, click-support, realization, CAPTCHAs, botnets, payment protocols, ethics</li>
32 | </ul>
33 | 


--------------------------------------------------------------------------------
/quiz2-medical-dev.md:
--------------------------------------------------------------------------------
  1 | 6.858 Quiz 2 Review
  2 | ===================
  3 | 
  4 | Medical Device Security
  5 | -----------------------
  6 | 
  7 | FDA standards: Semmelweis e.g. `=>` Should wash hands
  8 | 
  9 | Defirbillator:
 10 | 
 11 |   - 2003: Implanted defibrillator use WiFi. What could
 12 |     possibly go wrong?
 13 |   - Inside: battery, radio, hermetically sealed	
 14 | 
 15 | Why wireless?
 16 | 
 17 |   - Old way: Inject a needle into arm to twist dial, risk of infection :(
 18 | 
 19 | **Q:** What are security risks of wireless?
 20 | 
 21 |   - Unsafe practices - implementation errors.
 22 |   - Manufacturer and User Facility Device Experience (MAUDE) database
 23 |      * Cause of death: buffer overflow in infusion pump.
 24 |      * Error detected, but brought to safe mode, turn off pump.
 25 |      * Patient died after increase in brain pressure because
 26 |        no pump, because of buffer overflow.
 27 | 
 28 | #### Human factors and software
 29 | 
 30 | Why unique?
 31 | 
 32 | 500+ deaths
 33 | 
 34 | E.g. User interface for delivering dosage to patients did not properly indicate
 35 | whether it expected hours or minutes as input (hh:mm:ss). Led to order of
 36 | magnitude error: 20 min vs. the intended 20 hrs.
 37 | 
 38 | #### Managerial issues
 39 | 
 40 | Medical devices also need to take software updates.
 41 | 
 42 | E.g. McAffee classified DLL as malicious, quarantines,
 43 | messed up hospital services.
 44 | 
 45 | E.g. hospitals using Windows XP:
 46 |   - There are no more security updates from Microsoft for XP, but still new medical products shipping Windows XP.
 47 | 
 48 | 
 49 | #### FDA Cybersecurity Guidance
 50 | 
 51 | What is expected to be seen from manufacturers? How they
 52 | have thought through the security problems / risks /
 53 | mitigation strategies / residual risks?
 54 | 
 55 | 
 56 | #### Adversary stuff
 57 | 
 58 | Defibrillator & Implants
 59 | 
 60 | This section of the notes refers to the discussion of attacks on implanted defibrillators from Kevin Fu's lecture. In one example he gave, the implanted devices are wirelessly programmed with another device called a "wand", which uses a proprietary (non-public, non-standardized) protocol. Also, the wand transmits (and the device listens) on specially licensed EM spectrum (e.g. not WiFI or bluetooth). The next two lines describe the surgical process by which the defibrillator is implanted in the patient.
 61 | 
 62 |   - Device programmed w/ wand, speaking proprietary protocol
 63 |     over specially licensed spectrum. (good idea w.r.t.
 64 |     security?)
 65 |   - Patient awake but numbed and sedated
 66 |   - Six people weave electrodes through blood vessel....
 67 | 
 68 |   - Patient given a base station, looks like AP, speaks proprietary RF to implant, 
 69 |     data sent via Internet to healthcare company
 70 | 
 71 |   - Communication between device and programmer: no crypto / auth, data sent in plaintext
 72 |   - Device stores:	Patient name, DOB, make & model, serial no., more...
 73 | 
 74 |   - ???????? Use a software radio (USRP/GNU Radio Software)
 75 | 
 76 | **Q:** Can you wirelessly induce a fatal heart rhythm  
 77 | **A:** Yes. Device emitted 500V shock in 1 msec. E.g. get kicked in chest by horse.
 78 | 
 79 | Devices fixed through software updates?
 80 | 
 81 | #### Healthcare Providers
 82 | 
 83 | Screenshot of "Hospitals Stuck with Windows XP": 600 Service Pack 0 Windows XP devices in the hospital!
 84 | 
 85 | Average time to infection for healthcare devices:
 86 |   - 12 days w/o protection
 87 |   - 1 year w/ antivirus
 88 | 
 89 | #### Vendors are a common source of infection
 90 | 
 91 | USB drive is a common vector for infection.
 92 | 
 93 | #### Medical device signatures over download
 94 | 
 95 | "Click here to download software update"
 96 | 
 97 |   - Website appears to contain malware
 98 |   - Chrome: Safe web browsing service detected "ventilator" malware
 99 | 
100 | "Drug Compounder" example:
101 | 
102 |   - Runs Windows XP embedded
103 |   - **FDA expects manufacturers to keep SW up to date**
104 |   - **Manufacturers claim cannot update because of FDA**
105 |       * _double you tea f?_
106 | 
107 | #### How significant intentional malicious SW malfunctions?
108 | 
109 | E.g. 1: Chicago 1982: Somebody inserts cyanide into Tylenol
110 | E.g. 2: Somebody posted flashing images on epillepsy support group website.
111 | 
112 | 
113 | #### Why do you trust sensors?
114 | 
115 | E.g. smartphones. Batteryless sensors demo. Running on an MSP430. uC believes
116 | anything coming from ADC to uC. Possible to do something related to resonant
117 | freq. of wire there?
118 | 
119 | Inject interference into the baseband
120 | 
121 |   - Hard to filter in the analog
122 |   - `=>` Higher quality audio w/ interference than microphone
123 | 
124 | Send a signal that matches resonant frequency of the wire.
125 | 
126 | Treat circuit as unintentional demodulator
127 | 
128 |   - Can use high frequency signal to trick uC into thinking
129 |   - there is a low frequency signal due to knowing interrupt
130 |     frequency of uC and related properties.
131 | 
132 | Cardiac devices vulnerable to baseband EMI
133 | 
134 |   - Insert intentional EM interference in baseband
135 | 
136 | Send pulsed sinewave to trick defibrilator into thinking heart beating correctly
137 | 
138 |   - ????? Works in vitro
139 |   - Hard to replicate in a body or saline solution
140 | 
141 | Any defenses?
142 | 
143 |   - Send an extra pacing pulse right after a beat
144 |     * a real heart shouldn't send a response
145 | 
146 | #### Detecting malware at power outlets
147 | 
148 | Embedded system `<-->` WattsUpDoc `<-->` Power outlet
149 | 
150 | #### Bigger problems than security?
151 | 
152 | **Q:** True or false: Hackers breaking into medical devices is
153 | the biggest risk at the moment.
154 | 
155 | **A:** False. Wide scale unavailability of patient care and integrity of
156 | medical sensors are more important.
157 | 
158 | Security cannot be bolted on
159 | 
160 |   - E.g. MRI on windows 95
161 |   - E.g. Pacemaker programmer running on OS/2
162 | 
163 | Check gmail on medical devices, etc.
164 | 
165 | Run pandora on medical machine.
166 | 
167 | Keep clinical workflow predictable.
168 | 
169 | 


--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
 1 | Computer systems security notes (6.858, Fall 2014)
 2 | ==================================================
 3 | 
 4 | Lecture notes from 6.858, taught by [Prof. Nickolai Zeldovich](http://people.csail.mit.edu/nickolai/) and [Prof. James Mickens](http://research.microsoft.com/en-us/people/mickens/) in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 [course website](http://css.csail.mit.edu/6.858/2014/schedule.html).
 5 | 
 6 |  * Lecture **1**: [Introduction](l01-intro.html): what is security, what's the point, no perfect security, policy, threat models, assumptions, mechanism, buffer overflows
 7 |  * Lecture **2**: [Control hijacking attacks](l02-baggy.html): buffer overflows, stack canaries, bounds checking, electric fences, fat pointers, shadow data structure, Jones & Kelly, baggy bounds checking
 8 |  * Lecture **3**: [More baggy bounds and return oriented programming](l03-brop.html): costs of bounds checking, non-executable memory, address-space layout randomization (ASLR), return-oriented programming (ROP), stack reading, blind ROP, gadgets
 9 |  * Lecture **4**: [OKWS](l04-okws.html): privilege separation, Linux discretionary access control (DAC), UIDs, GIDs, setuid/setgid, file descriptors, processes, the Apache webserver, chroot jails, remote procedure calls (RPC)
10 |  * Lecture **5**: **Penetration testing** _guest lecture_ by Paul Youn, iSEC Partners
11 |  * Lecture **6**: [Capsicum](l06-capsicum.html): confused deputy problem, ambient authority, capabilities, sandboxing, discretionary access control (DAC), mandatory access control (MAC), Capsicum
12 |  * Lecture **7**: [Native Client (NaCl)](l07-nacl.html): sandboxing x86 native code, software fault isolation, reliable disassembly, x86 segmentation
13 |  * Lecture **8**: [Web Security, Part I](l08-web-security.html): modern web browsers, same-origin policy, frames, DOM nodes, cookies, cross-site request forgery (CSRF) attacks, DNS rebinding attacks, browser plugins
14 |  * Lecture **9**: [Web Security, Part II](l09-web-defenses.html): cross-site scripting (XSS) attacks, XSS defenses, SQL injection atacks, Django, session management, cookies, HTML5 local storage, HTTP protocol ambiguities, covert channels
15 |  * Lecture **10**: **Symbolic execution** _guest lecture_ by Prof. Armando Solar-Lezama, MIT CSAIL
16 |  * Lecture **11**: **Ur/Web** _guest lecture_ by Prof. Adam Chlipala, MIT, CSAIL
17 |  * Lecture **12**: [TCP/IP security](l12-tcpip.html): threat model, sequence numbers and attacks, connection hijacking attacks, SYN flooding, bandwidth amplification attacks, routing
18 |  * Lecture **13**: [Kerberos](l13-kerberos.html): Kerberos architecture and trust model, tickets, authenticators, ticket granting servers, password-changing, replication, network attacks, forward secrecy
19 |  * Lecture **14**: [ForceHTTPS](l14-forcehttps.html): certificates, HTTPS, Online Certificate Status Protocol (OCSP), ForceHTTPS
20 |  * Lecture **15**: **Medical software** _guest lecture_ by Prof. Kevin Fu, U. Michigan
21 |  * Lecture **16**: [Timing attacks](l16-timing-attacks.html): side-channel attacks, RSA encryption, RSA implementation, modular exponentiation, Chinese remainder theorem (CRT), repeated squaring, Montgomery representation, Karatsuba multiplication, RSA blinding, other timing attacks
22 |  * Lecture **17**: [User authentication](l17-authentication.html): what you have, what you know, what you are, passwords, challenge-response, usability, deployability, security, biometrics, multi-factor authentication (MFA), MasterCard's CAP reader
23 |  * Lecture **18**: [Private browsing](l18-priv-browsing.html): private browsing mode, local and web attackers, VM-level privacy, OS-level privacy,  OS-level privacy, what browsers implement, browser extensions 
24 |  * Lecture **19**: **Tor** _guest lecture_ by Nick Mathewson, Tor Project
25 |    + 6.858 notes from 2012 on [Anonymous communication](l19-tor.html): onion routing, Tor design, Tor circuits, Tor streams, Tor hidden services, blocking Tor, dining cryptographers networks (DC-nets)
26 |  * Lecture **20**: [Mobile phone security](l20-android.html): Android applications, activities, services, content providers, broadcast receivers, intents, permissions, labels, reference monitor, broadcast intents
27 |  * Lecture **21**: [Information flow tracking](l21-taintdroid.html): TaintDroid, Android data leaks, information flow control, taint tracking, taint flags, implicit flows, x86 taint tracking, TightLip
28 |  * Lecture **22**: **MIT's IS&T** _guest lecture_ by Mark Silis and David LaPorte, MIT IS&T
29 |  * Lecture **23**: [Security economics](l23-click-trajectories.html): economics of cyber-attacks, the spam value chain, advertising, click-support, realization, CAPTCHAs, botnets, payment protocols, ethics
30 | 
31 | Papers
32 | ------
33 | 
34 | List of papers we read ([papers/](papers/)):
35 | 
36 |  - [Baggy bounds checking](papers/baggy.pdf)
37 |  - [Hacking blind](papers/brop.pdf)
38 |  - [OKWS](papers/okws.pdf)
39 |  - [The confused deputy](papers/confused-deputy.pdf) (or why capabilities might have been invented)
40 |  - [Capsicum](papers/capsicum.pdf) (capabilities)
41 |  - [Native Client](papers/nacl.pdf) (sandboxing x86 code)
42 |  - [OWASP Top 10](papers/owasp-top-10.pdf), the most critical web application security risks
43 |  - [KLEE](papers/klee.pdf) (symbolic execution)
44 |  - [Ur/Web](papers/urweb.pdf) (functional programming for the web)
45 |  - [A look back at "Security problems in the TCP/IP protocol suite"](papers/lookback-tcpip.pdf)
46 |  - [Kerberos](papers/kerberos.pdf): An authentication service for open network systems
47 |  - [ForceHTTPs](papers/forcehttps.pdf)
48 |  - [Trustworthy Medical Device Software](papers/medical-sw.pdf)
49 |  - [Remote timing attacks are practical](papers/brumley-timing.pdf)
50 |  - [The quest to replace passwords](papers/passwords.pdf)
51 |  - [Private browsing modes](papers/private-browsing.pdf)
52 |  - [Tor](papers/tor-design.pdf): the second-generation onion router
53 |  - [Understanding android security](papers/android.pdf)
54 |  - [TaintDroid](papers/taintdroid.pdf): an information-flow tracking system for realtime privacy monitoring on smartphones
55 |  - [Click trajectories](papers/trajectories.pdf): End-to-end analysis of the spam value chain
56 | 


--------------------------------------------------------------------------------
/previous-years/l12-resin.txt:
--------------------------------------------------------------------------------
  1 | Resin
  2 | =====
  3 | 
  4 | administrivia:
  5 |     quiz 1 on Wednesday
  6 |     Xi: office hours for quiz review questions?
  7 |     lab 3 out today, first part due in ~1.5 weeks
  8 | 
  9 | what kinds of problems is this paper trying to address?
 10 |     missing security checks in application code
 11 | 	sanitizing user inputs for SQL injection or cross-site scripting
 12 | 	calling access control functions for sensitive data
 13 | 	    protected wiki page; user's password
 14 | 	checking where code came from before running it
 15 | 
 16 |     one such problem: cross-site scripting
 17 | 	setting: one web server, multiple users
 18 | 	users interact with each other (e.g. get a list of online users)
 19 | 	attacker's plan: inject JS code in a script tag as part of user name
 20 | 	victim's browser sees this code in the HTML page, runs it
 21 | 	what kind of code could attacker inject?
 22 | 	    maybe steal the user's HTTP cookie
 23 | 	    how?  create an image tag containing document.cookie
 24 | 	why doesn't the browser's same-origin policy protect the cookie?
 25 | 	    as far as the browser is concerned, code came from server's origin
 26 | 	lab 1's web server was vulnerable, as it turns out!
 27 | 	    http://.../<SCRIPT>alert(5)</SCRIPT>
 28 | 	    returns: File not found: /<SCRIPT>alert(5)</SCRIPT>
 29 | 
 30 |     a similar problem: SQL injection
 31 | 	saw examples in previous lectures
 32 | 	problems arise if programmer forgets to quote user inputs
 33 | 
 34 |     different kind of a problem: access control checks
 35 | 	might have protected pages in a wiki, forget to call ACL function
 36 | 	concrete example: hotcrp's password disclosure
 37 | 	    typical web site, sends password reminders
 38 | 	    email preview mode displays emails instead of sending
 39 | 	    turns out to display pw reminders in the requesting user's browser
 40 | 	    kind-of like the confused deputy prob: no module is really at fault?
 41 | 
 42 |     why are the checks missing?
 43 | 	lots of places in the code where they need to be performed
 44 | 	think of application as a black box; lots of inputs and outputs
 45 | 	suppose that for a given output, only some inputs were OK
 46 | 	    e.g. sanitize user inputs in a SQL query, but not app's own data
 47 | 	hard to tell where the output's data came from
 48 | 	    so, programmers try to do checks on all possible paths
 49 | 	programmer forgets them on some paths from input to output
 50 | 
 51 |     what's the plan to prevent these?
 52 | 	think of the checks as being associated with data flows input->output
 53 | 	associate checks with data objects like user input or password strings
 54 | 	perform checks whenever data gets used in some interesting way
 55 | 
 56 | what does resin provide?
 57 |     [ diagram from figure 1 ]
 58 |     data tracking
 59 | 	how does this work?  assumes a language runtime
 60 | 	    python, php have a byte code representation, sort-of like java
 61 | 	    resin tags strings, integers with a policy object
 62 | 	    changes the implementation of operations that manipulate data
 63 | 	why only tag strings and integers?  what about other things?
 64 | 	what kinds of operations propagate?
 65 | 	why not propagate across "covert" or "implicit" channels?
 66 | 	why byte-level tracking?
 67 | 	what happens when data items are combined?
 68 | 	    concat two strings
 69 | 	    add two integers
 70 | 	    take a substring
 71 | 	what happens for sha1sum() or touppercase() [which uses array lookups]?
 72 |     policy objects
 73 | 	contains code to implement policy for its data
 74 | 	what methods does the programmer have to implement in a policy object?
 75 | 	    export_check
 76 | 	    merge [optional]
 77 |     filter objects
 78 | 	provided by default by resin for most external channels
 79 | 	context information: combination of resin- and programmer-supplied
 80 |     how much synchronization does there need to be between filters & policies?
 81 | 
 82 | what are all of the uses for filter objects?
 83 |     default filters for external boundaries
 84 |     persistent serialization
 85 | 	files: extended attributes
 86 | 	database: extra columns for policies, SQL rewriting
 87 |     code imports
 88 | 	interpreter's input is yet another kind of channel
 89 |     write access control
 90 | 	persistent filters on FS objects like files, directories
 91 | 	almost a different kind of check: tied to an external object, not data
 92 |     propagation rules for functions
 93 | 	sha1sum(), touppercase(), ..
 94 | 
 95 | how would you use resin to prevent missing checks?
 96 |     hotcrp
 97 |     cross-site scripting
 98 | 
 99 | does this system actually work?
100 |     two versions of resin, one for python and one for php
101 | 	prevented known bugs in real apps
102 | 	prevented unknown bugs in real apps too
103 | 	few different kinds of bugs (ACL, XSS, SQL inj, directory traversal, ..)
104 |     is it possible to forget checks with resin?
105 | 	what does resin provide/guarantee?
106 | 	are there potential pitfalls with resin's assertions?
107 |     how much code is required to write these assertions?  why?
108 |     how specific are the assertions to the bug you want to prevent?  why?
109 |     how did they prevent the myphpscripts login library bug?
110 | 
111 | what's the cost?
112 |     need to deploy a new php/python interpreter
113 |     need to write some assertions (policy objects?)
114 |     runtime overheads: memory to store policies, CPU time to track them
115 |     major cost: serializing policies to SQL, file system
116 | 	could that be less?
117 | 
118 | how else can you avoid these missing check problems?
119 |     IFC does data tracking in some logical sense
120 | 	trade-off: redesign/rewrite your app around some checks
121 | 	hard to redesign around multiple checks or to add a check later
122 |     java stack inspection
123 | 	can't automatically perform checks for things that are off the stack
124 | 	can check if file is being read through a sanitizing/ACL-check function
125 | 	crimps programmer's style, but in theory possible
126 |     express some of these checks in the type system
127 | 	maybe have a special kind of UntrustedString vs SafeString
128 | 	and conversely SQLString and HTMLString which get used for output
129 | 	special conversion rules for them
130 | 	could even do static checks for these data flows
131 | 	for password disclosure, ACL checks: maybe a delayed-check string?
132 | 	    when about to send out the string, tell it where you're sending it
133 | 	    almost like resin design
134 | 	problem with using the type system:
135 | 	    policies intertwined with code throughout the app
136 | 	    to add a new check, need to change types everywhere
137 | 	    resin is almost like a shadow type system
138 | 
139 | could you apply resin to other applications, or other environments?
140 |     different languages?
141 |     different machines (cluster of web servers)?
142 |     no language runtime?
143 |     untrusted/malicious code?
144 | 
145 | 


--------------------------------------------------------------------------------
/quiz2-medical-dev.html:
--------------------------------------------------------------------------------
  1 | <h1>6.858 Quiz 2 Review</h1>
  2 | 
  3 | <h2>Medical Device Security</h2>
  4 | 
  5 | <p>FDA standards: Semmelweis e.g. <code>=&gt;</code> Should wash hands</p>
  6 | 
  7 | <p>Defirbillator:</p>
  8 | 
  9 | <ul>
 10 | <li>2003: Implanted defibrillator use WiFi. What could
 11 | possibly go wrong?</li>
 12 | <li>Inside: battery, radio, hermetically sealed </li>
 13 | </ul>
 14 | 
 15 | <p>Why wireless?</p>
 16 | 
 17 | <ul>
 18 | <li>Old way: Inject a needle into arm to twist dial, risk of infection :(</li>
 19 | </ul>
 20 | 
 21 | <p><strong>Q:</strong> What are security risks of wireless?</p>
 22 | 
 23 | <ul>
 24 | <li>Unsafe practices - implementation errors.</li>
 25 | <li>Manufacturer and User Facility Device Experience (MAUDE) database
 26 | <ul>
 27 | <li>Cause of death: buffer overflow in infusion pump.</li>
 28 | <li>Error detected, but brought to safe mode, turn off pump.</li>
 29 | <li>Patient died after increase in brain pressure because
 30 | no pump, because of buffer overflow.</li>
 31 | </ul></li>
 32 | </ul>
 33 | 
 34 | <h4>Human factors and software</h4>
 35 | 
 36 | <p>Why unique?</p>
 37 | 
 38 | <p>500+ deaths</p>
 39 | 
 40 | <p>E.g. User interface for delivering dosage to patients did not properly indicate
 41 | whether it expected hours or minutes as input (hh:mm:ss). Led to order of
 42 | magnitude error: 20 min vs. the intended 20 hrs.</p>
 43 | 
 44 | <h4>Managerial issues</h4>
 45 | 
 46 | <p>Medical devices also need to take software updates.</p>
 47 | 
 48 | <p>E.g. McAffee classified DLL as malicious, quarantines,
 49 | messed up hospital services.</p>
 50 | 
 51 | <p>E.g. hospitals using Windows XP:
 52 |   - There are no more security updates from Microsoft for XP, but still new medical products shipping Windows XP.</p>
 53 | 
 54 | <h4>FDA Cybersecurity Guidance</h4>
 55 | 
 56 | <p>What is expected to be seen from manufacturers? How they
 57 | have thought through the security problems / risks /
 58 | mitigation strategies / residual risks?</p>
 59 | 
 60 | <h4>Adversary stuff</h4>
 61 | 
 62 | <p>Defibrillator &amp; Implants</p>
 63 | 
 64 | <p>This section of the notes refers to the discussion of attacks on implanted defibrillators from Kevin Fu's lecture. In one example he gave, the implanted devices are wirelessly programmed with another device called a "wand", which uses a proprietary (non-public, non-standardized) protocol. Also, the wand transmits (and the device listens) on specially licensed EM spectrum (e.g. not WiFI or bluetooth). The next two lines describe the surgical process by which the defibrillator is implanted in the patient.</p>
 65 | 
 66 | <ul>
 67 | <li>Device programmed w/ wand, speaking proprietary protocol
 68 | over specially licensed spectrum. (good idea w.r.t.
 69 | security?)</li>
 70 | <li>Patient awake but numbed and sedated</li>
 71 | <li><p>Six people weave electrodes through blood vessel....</p></li>
 72 | <li><p>Patient given a base station, looks like AP, speaks proprietary RF to implant, 
 73 | data sent via Internet to healthcare company</p></li>
 74 | <li><p>Communication between device and programmer: no crypto / auth, data sent in plaintext</p></li>
 75 | <li><p>Device stores:  Patient name, DOB, make &amp; model, serial no., more...</p></li>
 76 | <li><p>???????? Use a software radio (USRP/GNU Radio Software)</p></li>
 77 | </ul>
 78 | 
 79 | <p><strong>Q:</strong> Can you wirelessly induce a fatal heart rhythm <br />
 80 | <strong>A:</strong> Yes. Device emitted 500V shock in 1 msec. E.g. get kicked in chest by horse.</p>
 81 | 
 82 | <p>Devices fixed through software updates?</p>
 83 | 
 84 | <h4>Healthcare Providers</h4>
 85 | 
 86 | <p>Screenshot of "Hospitals Stuck with Windows XP": 600 Service Pack 0 Windows XP devices in the hospital!</p>
 87 | 
 88 | <p>Average time to infection for healthcare devices:
 89 |   - 12 days w/o protection
 90 |   - 1 year w/ antivirus</p>
 91 | 
 92 | <h4>Vendors are a common source of infection</h4>
 93 | 
 94 | <p>USB drive is a common vector for infection.</p>
 95 | 
 96 | <h4>Medical device signatures over download</h4>
 97 | 
 98 | <p>"Click here to download software update"</p>
 99 | 
100 | <ul>
101 | <li>Website appears to contain malware</li>
102 | <li>Chrome: Safe web browsing service detected "ventilator" malware</li>
103 | </ul>
104 | 
105 | <p>"Drug Compounder" example:</p>
106 | 
107 | <ul>
108 | <li>Runs Windows XP embedded</li>
109 | <li><strong>FDA expects manufacturers to keep SW up to date</strong></li>
110 | <li><strong>Manufacturers claim cannot update because of FDA</strong>
111 | <ul>
112 | <li><em>double you tea f?</em></li>
113 | </ul></li>
114 | </ul>
115 | 
116 | <h4>How significant intentional malicious SW malfunctions?</h4>
117 | 
118 | <p>E.g. 1: Chicago 1982: Somebody inserts cyanide into Tylenol
119 | E.g. 2: Somebody posted flashing images on epillepsy support group website.</p>
120 | 
121 | <h4>Why do you trust sensors?</h4>
122 | 
123 | <p>E.g. smartphones. Batteryless sensors demo. Running on an MSP430. uC believes
124 | anything coming from ADC to uC. Possible to do something related to resonant
125 | freq. of wire there?</p>
126 | 
127 | <p>Inject interference into the baseband</p>
128 | 
129 | <ul>
130 | <li>Hard to filter in the analog</li>
131 | <li><code>=&gt;</code> Higher quality audio w/ interference than microphone</li>
132 | </ul>
133 | 
134 | <p>Send a signal that matches resonant frequency of the wire.</p>
135 | 
136 | <p>Treat circuit as unintentional demodulator</p>
137 | 
138 | <ul>
139 | <li>Can use high frequency signal to trick uC into thinking</li>
140 | <li>there is a low frequency signal due to knowing interrupt
141 | frequency of uC and related properties.</li>
142 | </ul>
143 | 
144 | <p>Cardiac devices vulnerable to baseband EMI</p>
145 | 
146 | <ul>
147 | <li>Insert intentional EM interference in baseband</li>
148 | </ul>
149 | 
150 | <p>Send pulsed sinewave to trick defibrilator into thinking heart beating correctly</p>
151 | 
152 | <ul>
153 | <li>????? Works in vitro</li>
154 | <li>Hard to replicate in a body or saline solution</li>
155 | </ul>
156 | 
157 | <p>Any defenses?</p>
158 | 
159 | <ul>
160 | <li>Send an extra pacing pulse right after a beat
161 | <ul>
162 | <li>a real heart shouldn't send a response</li>
163 | </ul></li>
164 | </ul>
165 | 
166 | <h4>Detecting malware at power outlets</h4>
167 | 
168 | <p>Embedded system <code>&lt;--&gt;</code> WattsUpDoc <code>&lt;--&gt;</code> Power outlet</p>
169 | 
170 | <h4>Bigger problems than security?</h4>
171 | 
172 | <p><strong>Q:</strong> True or false: Hackers breaking into medical devices is
173 | the biggest risk at the moment.</p>
174 | 
175 | <p><strong>A:</strong> False. Wide scale unavailability of patient care and integrity of
176 | medical sensors are more important.</p>
177 | 
178 | <p>Security cannot be bolted on</p>
179 | 
180 | <ul>
181 | <li>E.g. MRI on windows 95</li>
182 | <li>E.g. Pacemaker programmer running on OS/2</li>
183 | </ul>
184 | 
185 | <p>Check gmail on medical devices, etc.</p>
186 | 
187 | <p>Run pandora on medical machine.</p>
188 | 
189 | <p>Keep clinical workflow predictable.</p>
190 | 


--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
 1 | <h1>Computer systems security notes (6.858, Fall 2014)</h1>
 2 | 
 3 | <p>Lecture notes from 6.858, taught by <a href="http://people.csail.mit.edu/nickolai/">Prof. Nickolai Zeldovich</a> and <a href="http://research.microsoft.com/en-us/people/mickens/">Prof. James Mickens</a> in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 <a href="http://css.csail.mit.edu/6.858/2014/schedule.html">course website</a>.</p>
 4 | 
 5 | <ul>
 6 | <li>Lecture <strong>1</strong>: <a href="l01-intro.html">Introduction</a>: what is security, what's the point, no perfect security, policy, threat models, assumptions, mechanism, buffer overflows</li>
 7 | <li>Lecture <strong>2</strong>: <a href="l02-baggy.html">Control hijacking attacks</a>: buffer overflows, stack canaries, bounds checking, electric fences, fat pointers, shadow data structure, Jones &amp; Kelly, baggy bounds checking</li>
 8 | <li>Lecture <strong>3</strong>: <a href="l03-brop.html">More baggy bounds and return oriented programming</a>: costs of bounds checking, non-executable memory, address-space layout randomization (ASLR), return-oriented programming (ROP), stack reading, blind ROP, gadgets</li>
 9 | <li>Lecture <strong>4</strong>: <a href="l04-okws.html">OKWS</a>: privilege separation, Linux discretionary access control (DAC), UIDs, GIDs, setuid/setgid, file descriptors, processes, the Apache webserver, chroot jails, remote procedure calls (RPC)</li>
10 | <li>Lecture <strong>5</strong>: <strong>Penetration testing</strong> <em>guest lecture</em> by Paul Youn, iSEC Partners</li>
11 | <li>Lecture <strong>6</strong>: <a href="l06-capsicum.html">Capsicum</a>: confused deputy problem, ambient authority, capabilities, sandboxing, discretionary access control (DAC), mandatory access control (MAC), Capsicum</li>
12 | <li>Lecture <strong>7</strong>: <a href="l07-nacl.html">Native Client (NaCl)</a>: sandboxing x86 native code, software fault isolation, reliable disassembly, x86 segmentation</li>
13 | <li>Lecture <strong>8</strong>: <a href="l08-web-security.html">Web Security, Part I</a>: modern web browsers, same-origin policy, frames, DOM nodes, cookies, cross-site request forgery (CSRF) attacks, DNS rebinding attacks, browser plugins</li>
14 | <li>Lecture <strong>9</strong>: <a href="l09-web-defenses.html">Web Security, Part II</a>: cross-site scripting (XSS) attacks, XSS defenses, SQL injection atacks, Django, session management, cookies, HTML5 local storage, HTTP protocol ambiguities, covert channels</li>
15 | <li>Lecture <strong>10</strong>: <strong>Symbolic execution</strong> <em>guest lecture</em> by Prof. Armando Solar-Lezama, MIT CSAIL</li>
16 | <li>Lecture <strong>11</strong>: <strong>Ur/Web</strong> <em>guest lecture</em> by Prof. Adam Chlipala, MIT, CSAIL</li>
17 | <li>Lecture <strong>12</strong>: <a href="l12-tcpip.html">TCP/IP security</a>: threat model, sequence numbers and attacks, connection hijacking attacks, SYN flooding, bandwidth amplification attacks, routing</li>
18 | <li>Lecture <strong>13</strong>: <a href="l13-kerberos.html">Kerberos</a>: Kerberos architecture and trust model, tickets, authenticators, ticket granting servers, password-changing, replication, network attacks, forward secrecy</li>
19 | <li>Lecture <strong>14</strong>: <a href="l14-forcehttps.html">ForceHTTPS</a>: certificates, HTTPS, Online Certificate Status Protocol (OCSP), ForceHTTPS</li>
20 | <li>Lecture <strong>15</strong>: <strong>Medical software</strong> <em>guest lecture</em> by Prof. Kevin Fu, U. Michigan</li>
21 | <li>Lecture <strong>16</strong>: <a href="l16-timing-attacks.html">Timing attacks</a>: side-channel attacks, RSA encryption, RSA implementation, modular exponentiation, Chinese remainder theorem (CRT), repeated squaring, Montgomery representation, Karatsuba multiplication, RSA blinding, other timing attacks</li>
22 | <li>Lecture <strong>17</strong>: <a href="l17-authentication.html">User authentication</a>: what you have, what you know, what you are, passwords, challenge-response, usability, deployability, security, biometrics, multi-factor authentication (MFA), MasterCard's CAP reader</li>
23 | <li>Lecture <strong>18</strong>: <a href="l18-priv-browsing.html">Private browsing</a>: private browsing mode, local and web attackers, VM-level privacy, OS-level privacy,  OS-level privacy, what browsers implement, browser extensions </li>
24 | <li>Lecture <strong>19</strong>: <strong>Tor</strong> <em>guest lecture</em> by Nick Mathewson, Tor Project
25 | <ul>
26 | <li>6.858 notes from 2012 on <a href="l19-tor.html">Anonymous communication</a>: onion routing, Tor design, Tor circuits, Tor streams, Tor hidden services, blocking Tor, dining cryptographers networks (DC-nets)</li>
27 | </ul></li>
28 | <li>Lecture <strong>20</strong>: <a href="l20-android.html">Mobile phone security</a>: Android applications, activities, services, content providers, broadcast receivers, intents, permissions, labels, reference monitor, broadcast intents</li>
29 | <li>Lecture <strong>21</strong>: <a href="l21-taintdroid.html">Information flow tracking</a>: TaintDroid, Android data leaks, information flow control, taint tracking, taint flags, implicit flows, x86 taint tracking, TightLip</li>
30 | <li>Lecture <strong>22</strong>: <strong>MIT's IS&amp;T</strong> <em>guest lecture</em> by Mark Silis and David LaPorte, MIT IS&amp;T</li>
31 | <li>Lecture <strong>23</strong>: <a href="l23-click-trajectories.html">Security economics</a>: economics of cyber-attacks, the spam value chain, advertising, click-support, realization, CAPTCHAs, botnets, payment protocols, ethics</li>
32 | </ul>
33 | 
34 | <h2>Papers</h2>
35 | 
36 | <p>List of papers we read (<a href="papers/">papers/</a>):</p>
37 | 
38 | <ul>
39 | <li><a href="papers/baggy.pdf">Baggy bounds checking</a></li>
40 | <li><a href="papers/brop.pdf">Hacking blind</a></li>
41 | <li><a href="papers/okws.pdf">OKWS</a></li>
42 | <li><a href="papers/confused-deputy.pdf">The confused deputy</a> (or why capabilities might have been invented)</li>
43 | <li><a href="papers/capsicum.pdf">Capsicum</a> (capabilities)</li>
44 | <li><a href="papers/nacl.pdf">Native Client</a> (sandboxing x86 code)</li>
45 | <li><a href="papers/owasp-top-10.pdf">OWASP Top 10</a>, the most critical web application security risks</li>
46 | <li><a href="papers/klee.pdf">KLEE</a> (symbolic execution)</li>
47 | <li><a href="papers/urweb.pdf">Ur/Web</a> (functional programming for the web)</li>
48 | <li><a href="papers/lookback-tcpip.pdf">A look back at "Security problems in the TCP/IP protocol suite"</a></li>
49 | <li><a href="papers/kerberos.pdf">Kerberos</a>: An authentication service for open network systems</li>
50 | <li><a href="papers/forcehttps.pdf">ForceHTTPs</a></li>
51 | <li><a href="papers/medical-sw.pdf">Trustworthy Medical Device Software</a></li>
52 | <li><a href="papers/brumley-timing.pdf">Remote timing attacks are practical</a></li>
53 | <li><a href="papers/passwords.pdf">The quest to replace passwords</a></li>
54 | <li><a href="papers/private-browsing.pdf">Private browsing modes</a></li>
55 | <li><a href="papers/tor-design.pdf">Tor</a>: the second-generation onion router</li>
56 | <li><a href="papers/android.pdf">Understanding android security</a></li>
57 | <li><a href="papers/taintdroid.pdf">TaintDroid</a>: an information-flow tracking system for realtime privacy monitoring on smartphones</li>
58 | <li><a href="papers/trajectories.pdf">Click trajectories</a>: End-to-end analysis of the spam value chain</li>
59 | </ul>
60 | 


--------------------------------------------------------------------------------
/previous-years/l14-resin.txt:
--------------------------------------------------------------------------------
  1 | Resin
  2 | =====
  3 | 
  4 | what kinds of problems is this paper trying to address?
  5 |     threat model
  6 | 	trusted: hardware/os/language runtime/db/app code
  7 | 	untrusted: external inputs (users/whois servers)
  8 | 	non-goals: buffer overflows, malicious apps
  9 |     programming errors: missing security checks in application code
 10 | 	sanitizing user inputs for code injection
 11 | 	calling access control functions for sensitive data
 12 | 	    protected wiki page; user's password
 13 | 
 14 | Example: one web server, multiple users
 15 |     users interact with each other
 16 |     reading posts in a web forum
 17 | 	avatar url / upload
 18 | 	post content
 19 | 	profile / signature
 20 |     attacker's plan: inject JS code / forge requests
 21 |     victim's browser sees this code in the HTML page, runs it
 22 |     what kind of code could attacker inject?
 23 | 	steal the cookie
 24 | 	transfer credits
 25 | 	acl
 26 | 	privileged operations (for admin)
 27 |     why doesn't the browser's same-origin policy protect the cookie?
 28 | 	as far as the browser is concerned, code came from server's origin
 29 |     lower level: the zookws web server was vulnerable
 30 | 	http://.../<SCRIPT>alert(5)</SCRIPT>
 31 | 	returns: File not found: /<SCRIPT>alert(5)</SCRIPT>
 32 | 
 33 | a similar problem: whois injection
 34 |     admin views logs: user, ip, domain
 35 |     malicious whois server
 36 |     problems arise if programmer forgets to quote external inputs
 37 | 
 38 | different kind of a problem: access control checks
 39 |     might have protected pages in a wiki, forget to call ACL function
 40 |     example: hotcrp's password disclosure
 41 |         typical web site, sends password reminders
 42 |         email preview mode displays emails instead of sending
 43 |         turns out to display pw reminders in the requesting user's browser
 44 |         kind-of like the confused deputy prob: no module is really at fault?
 45 | 
 46 | why are the checks missing?
 47 |     lots of places in the code where they need to be performed
 48 |     think of application as a black box; lots of inputs and outputs
 49 |     suppose that for a given output, only some inputs were OK
 50 |         e.g. sanitize user inputs in a SQL query, but not app's own data
 51 |     hard to tell where the output's data came from
 52 |         so, programmers try to do checks on all possible paths
 53 |     programmer forgets them on some paths from input to output
 54 |     plug-in developers may be unaware of security plan
 55 | 
 56 | what's the plan to prevent these?
 57 |     think of the checks as being associated with data flows input->output
 58 |     associate checks with data objects like user input or password strings
 59 |     perform checks whenever data gets used in some interesting way
 60 | 
 61 | what does resin provide?
 62 |     hotcrp data: password
 63 |     [ diagram from figure 1 ]
 64 |     policy objects
 65 | 	contains code to implement policy for its data
 66 | 	hotcrp: only email password to the user or the pc chair
 67 | 	what methods does the programmer have to implement in a policy object?
 68 | 	    export_check(context)
 69 | 	    merge [optional]
 70 |     filter objects
 71 | 	data flow boundaries
 72 | 	channels with contexts: http, email, ...
 73 | 	provided by default by resin for most external channels
 74 | 	    invoke export_check if possible
 75 |     data tracking
 76 | 	how does this work?  assumes a language runtime
 77 | 	    python, php have a byte code representation, sort-of like java
 78 | 	    resin tags strings, integers with a policy object
 79 | 	    changes the implementation of operations that manipulate data
 80 | 	why only tag strings and integers?  what about other things?
 81 | 	what kinds of operations propagate?
 82 | 	why not propagate across "covert" or "implicit" channels?
 83 | 	why byte-level tracking?
 84 | 	what happens when data items are combined?
 85 | 	    common: concat strings (automatic via byte-level tracking)
 86 | 	    rare: add integers
 87 | 
 88 | what are all of the uses for filter objects?
 89 |     default filters for external boundaries: sockets, pipes, http, email
 90 |     persistent serialization
 91 | 	files: extended attributes
 92 | 	database: extra columns for policies, SQL rewriting
 93 | 	example: write password to file/db
 94 |     code imports
 95 | 	interpreter's input is yet another kind of channel
 96 |     write access control
 97 | 	persistent filters on FS objects like files, directories
 98 | 	almost a different kind of check: tied to an external object, not data
 99 |     propagation rules for functions
100 | 	sha1(), strtoupper(), ..
101 | 
102 | how would you use resin to prevent missing checks?
103 |     hotcrp
104 |     cross-site scripting: profile
105 | 	UntrustedData & XFilter calls strip and removes the policy?
106 | 	define UntrustedData and JSSantitized, empty export_check
107 | 	input tagged UntrustedData
108 | 	strip function attach JSSantitized
109 | 	output filter checks strings must contain JSSantitized if UntrustedData exists
110 | 	alternative: UntrustedData policy only; filter parses and sanitizes strings
111 | 
112 | does this system actually work?
113 |     two versions of resin, one for python and one for php
114 | 	prevented known bugs in real apps
115 | 	prevented unknown bugs in real apps too
116 | 	few different kinds of bugs (ACL, XSS, SQL inj, directory traversal, ..)
117 |     is it possible to forget checks with resin?
118 | 	what does resin provide/guarantee?
119 | 	are there potential pitfalls with resin's assertions?
120 |     how much code is required to write these assertions?  why?
121 |     how specific are the assertions to the bug you want to prevent?  why?
122 |     how did they prevent the myphpscripts login library bug?
123 | 
124 | what's the cost?
125 |     need to deploy a new php/python interpreter
126 |     need to write some assertions (policy objects?)
127 |     runtime overheads: memory to store policies, CPU time to track them
128 |     major cost: serializing policies to SQL, file system
129 | 	could that be less? e.g. avoid storing email twice in hotcrp?
130 | 
131 | how else can you avoid these missing check problems?
132 |     IFC does data tracking in some logical sense
133 | 	trade-off: redesign/rewrite your app around some checks
134 | 	hard to redesign around multiple checks or to add a check later
135 |     java stack inspection
136 | 	can't automatically perform checks for things that are off the stack
137 | 	can check if file is being read through a sanitizing/ACL-check function
138 | 	crimps programmer's style, but in theory possible
139 |     express some of these checks in the type system
140 | 	maybe have a special kind of UntrustedString vs SafeString
141 | 	and conversely SQLString and HTMLString which get used for output
142 | 	special conversion rules for them
143 | 	could even do static checks for these data flows
144 | 	for password disclosure, ACL checks: maybe a delayed-check string?
145 | 	    when about to send out the string, tell it where you're sending it
146 | 	    almost like resin design
147 | 	problem with using the type system:
148 | 	    policies intertwined with code throughout the app
149 | 	    to add a new check, need to change types everywhere
150 | 	    resin is almost like a shadow type system
151 | 
152 | could you apply resin to other applications, or other environments?
153 |     different languages?
154 |     different machines (cluster of web servers)?
155 |     no language runtime?
156 |     untrusted/malicious code?
157 | 
158 | 


--------------------------------------------------------------------------------
/previous-years/l22-usability-2.txt:
--------------------------------------------------------------------------------
  1 | Security Usability
  2 | ==================
  3 | 
  4 | is this problem real?  concrete examples of things that go wrong?
  5 | 
  6 | why is usable security a big problem?
  7 |     secondary tasks: users concerned with something other than security
  8 |     negative goal / weakest link: must consider entire system
  9 |     abstract, hard to reason about; little feedback: security often not tangible
 10 |     users don't fully understand threats, mechanisms they're using
 11 | 
 12 | why do we need users in the loop?
 13 |     good reasons: users should be ultimately in control of their security
 14 |     bad reasons: programmers didn't know what to do, so they asked the user
 15 |     backwards compatibility
 16 | 
 17 | what does the paper think constitutes usability for PGP?
 18 |     encrypt/decrypt, sign/verify signatures
 19 |     generate and distribute public key for encryption
 20 |     generate and publish public key for signing
 21 |     obtain other users' keys for verifying signatures
 22 |     obtain other users' keys for encrypting
 23 |     avoid errors (trusting wrong keys, accidentally not encrypting, ..)
 24 | 
 25 | how do they evaluate it?
 26 | 
 27 | cognitive walkthrough
 28 |     inspection by a developer trying to simulate a user's mindset
 29 |     overly-simplistic metaphors
 30 | 	physical keys are similar to symmetric crypto, not public-key crypto
 31 | 	quill pens lack the idea of a key being involved; key vs signature
 32 | 	leads to faulty intuition
 33 |     not exposing key type information more explicitly
 34 | 	good principle: if user needs to worry about something, expose it well
 35 | 	users had to decide how to encrypt and sign a particular message
 36 | 	old vs new key type icons not well documented
 37 | 	figure 3: recipient dialog box talks about users, not keys
 38 |     implicit trust policy that might not be obvious to users
 39 | 	web-of-trust model, keys can be trusted through multiple marginal sigs
 40 | 	user might not realize what's going on
 41 |     not making key server operations explicit?  unclear what's the precise risk
 42 | 	failing to upload revocations to the key server
 43 | 	publicizing or revoking keys unintentionally
 44 |     irreversible operations not well described
 45 | 	deleting private key: should tell user they won't be able to decrypt, ..
 46 | 	publicizing/revoking keys: warn the user it's a permanent change
 47 |     too much info
 48 | 	UI focused on exposing what's technically hard: key trust management
 49 | 	maybe a good model would be to ask the user to specify a threat model
 50 | 	    beginner: worried about opportunistic attackers stealing plaintext
 51 | 	    medium: worried about attacker injecting malicious keys?
 52 | 	    advanced: worried about attacker compromising some friends?
 53 | 	    more advanced: worried about cryptographic attack on small key sizes
 54 | 	worry: users not good at estimating risk
 55 | 	    e.g. a worm might easily compromise friends' machines and sign keys
 56 | 
 57 | lab experiment
 58 |     users confused about how the keys fit into the security model
 59 | 	is something a key or a message?
 60 | 	    maybe extract as much info as possible from supplied data?
 61 | 	    could tell the user it's a key vs message based on headers etc
 62 | 	where do keys come from?  who generates them?
 63 | 	need to use recipient's key rather than my own (sender's)
 64 | 	key icons confusing because they don't differentiate public vs private
 65 |     noone managed to handle mixed key types in a single message
 66 | 	practical solution was to send separate messages to each recipient
 67 | 	perhaps sacrifice generality for usability?
 68 |     key trust questions were not prominent
 69 | 	some users concerned about why they should trust keys
 70 | 	one user assumed keys were OK because signed by campaign manager
 71 | 	    (but is campaign manager key's OK?)
 72 | 	noone used PGP's key trust model
 73 |     overall results
 74 | 	4/12 managed to send an encrypted, signed email
 75 | 	3/12 disclosed the secret message in plaintext
 76 |     what does this mean?
 77 | 	how effective is PGP in practice?
 78 | 	    maybe not so dismal for users that learn to use it over time
 79 | 	    on the other hand, easy to make dangerous mistakes
 80 | 	    all users disinclined to use PGP further
 81 | 	what other experiments would be valuable?
 82 | 	    no attackers in the experiment
 83 | 	    would users notice a bad signature?
 84 | 
 85 | phishing attacks
 86 |     look-alike domains
 87 | 	visually similar (bankofthevvest.com)
 88 | 	exploit incorrect user intuition (ebay-security.com)
 89 | 	unfortunately even legitimate companies often outsource some services!
 90 | 	    e.g. URLs like "ebay.somesurveysite.com"
 91 |     visual deception
 92 | 	copy logos, site layout
 93 | 	inject look-alike security indicators
 94 | 	create new windows that look like other dialog boxes
 95 | 
 96 | why is phishing such a big problem?  what UI security problems contribute to it?
 97 |     novice users don't understand the threats they are facing
 98 |     users don't have a clear mental model of the browser's security policy
 99 | 	users don't understand technical details of what constitutes an origin
100 | 	users don't understand what to look for in an SSL certificate / EV certs
101 | 	users don't understand implications of security decisions
102 | 	    allow cookie?  allow non-SSL content?
103 | 	    java security model: grant code from developer X access to FS/net?
104 |     browsers have complex security indicators
105 | 	need to look at origin in URL bar, SSL certificate
106 | 	security indicators can be absent instead of indicating a warning/error
107 | 	e.g. if site is non-SSL, nothing out-of-the-ordinary appears to the user
108 | 
109 | techniques to combat phishing?
110 |     most common: maintain a database of known phishing sites
111 |     why isn't this fully effective?
112 |     active vs passive warnings
113 |     habituation: users accustomed to warnings/errors
114 |     users focused on getting their work done
115 | 	if the warning gives an option to continue, users may think it's OK
116 | 
117 |     more intrusive measures are often more effective here
118 | 	replace passwords with some other form of auth (smartcard, PAKE, etc)
119 | 	    only works for credentials; attackers might still steal DOB, SSN, ..
120 | 	turn phishing into online attack
121 | 	    site must display an agreed-upon image before user enters password
122 | 	    can be hard for users to comprehend how and what this defends from
123 | 
124 | other human factors in system security?
125 |     social engineering attacks
126 |     least privilege can conflict with allowing users to do their work
127 |     differentiating between trust in users vs trust in users' machines
128 | 
129 | principles for designing usable secure systems?
130 |     avoid false positives in security warnings (can make them errors then?)
131 | 	active security warnings to force user to make a choice (cannot ignore)
132 | 	present users with useful choices when possible
133 | 	    users want to perform their task, don't want to choose "stop" option
134 | 	    e.g. try to look up the correct key in a PGP key server?
135 | 		 search google for an authentic web site vs phishing attack?
136 |     secure defaults; secure by design; "invisible security"
137 | 	when does this work?
138 | 	when is this insufficient?
139 |     intuitive security mechanisms that make sense to the user
140 | 	some of the windows "privacy" knobs or wizards that give a few options
141 |     train users
142 | 	users unlikely to spend time to learn on their own
143 | 	interesting idea: try to train users as part of normal workflow
144 | 	    try to mount phishing attacks on user by sending spam to them
145 | 	    if they fall for an attack, tell them what they should've looked for
146 | 	    can get tiresome after a while, if not done properly..
147 | 	security training games
148 | 
149 | 


--------------------------------------------------------------------------------
/previous-years/l21-captcha.txt:
--------------------------------------------------------------------------------
  1 | CAPTCHAs
  2 | ========
  3 | 
  4 | Administrivia.
  5 |     This week, Wed: in-lecture quiz.
  6 |     Next week, Mon + Wed: in-lecture final project presentations.
  7 | 	10 minutes per group.
  8 | 	We will have a projector set up if you want to use one.
  9 | 	Feel free to do a demo (e.g., 5 minute talk + 5 minute demo).
 10 | 	Volunteers for Monday?  If not, we will just pick at random.
 11 |     Turn in code + writeup by Friday next week (i.e., Dec 10th).
 12 | 
 13 | Goal of this paper: better understand the economics of security.
 14 |     Context: earlier paper, "Spamalytics", studied economics of botnets, spam.
 15 |     Adversaries profitably send spam, mount denial-of-service attacks, etc.
 16 |     The bulk of botnet activity is work like this (spam, DoS).
 17 |     Botnet operators sell access to botnets, so there's a real market for this.
 18 | 
 19 | What web sites would use CAPTCHAs?
 20 |     Open services that allow any user to interact with their site.
 21 |     Applications that have user accounts but allow anyone to sign up.
 22 | 
 23 | Why would a web site want to use a CAPTCHA?
 24 |     Prevent adversary from causing DoS (e.g., too many Google searches).
 25 |     Prevent adversary from spamming users.
 26 | 	Many examples: email spam, social network spam, blog comments.
 27 |     Prevent adversary from signing up for many accounts?
 28 |     Harness humans for some task.
 29 | 	reCAPTCHA: OCR books.
 30 | 	Solve CAPTCHAs from other sites?  Interesting but probably not worth it.
 31 |     What if a user legitimately signs up for an account and sends spam?
 32 |     What if adversary bypasses CAPTCHA and signs up for account?
 33 | 	Can probably detect an adversary sending spam relatively fast.
 34 | 	Still want CAPTCHA to prevent those first few messages before detection.
 35 | 
 36 | Why do sites care if users are humans or software?
 37 |     Maintain some form of per-person fairness, + hope good users outnumber bad.
 38 |     Advertising revenue.
 39 |     What about ad-blocking software?
 40 | 
 41 | If a site doesn't want to implement CAPTCHAs, what are the alternatives?
 42 |     Track based on IPs.
 43 | 	IPs are cheap for botnet operators.
 44 | 	False positives due to large NATs.
 45 |     Implement stronger authentication.
 46 |     Rely on some other authentication mechanism.
 47 | 	Email address, Google account.
 48 | 	At extreme end, bank account, even if no money is charged.
 49 |     How does Wikipedia work with no CAPTCHAs?
 50 | 	Strong logging, auditing, recovery.
 51 | 	Selective mechanisms to require long-lived accounts.
 52 | 	Measure account life in time, or in number of un-reverted edits?
 53 | 
 54 | Bypassing CAPTCHAs.
 55 |     Plan 1: write software to recognize characters / challenges in images.
 56 |     Plan 2: use humans to solve CAPTCHAs.
 57 | 
 58 | Why does the paper argue the technical approach (plan 1) is not effective?
 59 |     Up-front cost: about $10k to implement solver for CAPTCHA.
 60 |     CPU cost: a few seconds of CPU time per CAPTCHA solved.
 61 | 	Amazon EC2 prices, order-of-magnitude: $0.10 for an hour of CPU.
 62 | 	CPU cost for solving a CAPTCHA is ~$10^-4 ($0.0001), could be less.
 63 |     Using humans: $1 for 1,000 CAPTCHA solutions, or $0.001 per CAPTCHA.
 64 |     Break-even point: solve order-of-magnitude 10M CAPTCHAs.
 65 |     Worse yet, accuracy rate of automated solver is poor (e.g., 30%).
 66 |     Thus, break-even point for plan 1 might be higher by 3x.
 67 |     How do we tell if this break-even point is too high?
 68 | 	Can CAPTCHA developers switch algorithms faster than this?
 69 | 	Experimentally, paper says reCAPTCHA can change fast enough.
 70 | 	Thus, investment not worth it.
 71 | 
 72 | Human-based CAPTCHA solving: Figure 3.
 73 |     Well-defined API between application and CAPTCHA-solving site.
 74 |     Back-end site for workers, with a web-based UI.
 75 |     Some internal protocol between the front- and back-end sites.
 76 |     How do the authors find out these things?
 77 | 	Looks like a lot of manual work finding these sites.
 78 | 	Interviewed an operator of one such site.
 79 |     How reliable are these sites?
 80 | 	80-90% availability (Table 1).
 81 | 	10-20% error rate (Fig. 4).
 82 |     What's the cost range?
 83 | 	$0.50 -- $20.00 per 1,000 CAPTCHAs solved.
 84 | 	Wide variance in adaptability, accuracy, latency, capacity.
 85 | 
 86 | Does low accuracy rate matter?
 87 |     Service provider could detect many incorrect CAPTCHAs.
 88 |     What would a service provider do in this case?
 89 | 	Can blacklist an IP address after several incorrect answers.
 90 | 	If overall rate across IPs goes down, deploy new CAPTCHA scheme?
 91 |     Even humans have a 75-90% accuracy rate, depending on the CAPTCHA.
 92 | 	Assuming the humans are similar, service shouldn't blacklist.
 93 | 
 94 | Does latency matter?
 95 |     CAPTCHA solver cannot be significantly slower than human.
 96 |     Service would be able to tell the real human & adversary apart.
 97 |     Regular humans can solve CAPTCHAs in ~10 seconds.
 98 |     Software can solve CAPTCHAs in several seconds: fast enough.
 99 |     CAPTCHA-solving services seem to add little latency (Fig. 7).
100 | 
101 | How scalable is this?
102 |     One service appears to have 400+ workers.
103 |     Measured much like network analysis: watch for queueing.
104 | 
105 | How much are the workers getting paid?
106 |     Quite little: $2-4 per day!
107 |     Workers get ~quarter of front-end cost.
108 |     Many workers seem to be in China, India, Russia.
109 |     Cute tricks for identifying workers:
110 | 	Ask to decode 3-digit numbers in specific language.
111 | 	Ask to write down the current time, to find timezone.
112 | 
113 | How much profit does an adversary get from abusing an open service?
114 |     Email spam: relatively little, but non-zero.
115 | 	Earlier work suggests a rough estimate of $0.00001 (10^-5) per msg.
116 | 	How do we measure the profit from sending spam?
117 |     Comment spam: not known, might be higher?
118 | 	Is it possible to quantify or estimate?
119 | 	Possibly look at the ad costs for the page hosting the comments.
120 |     Vandalism, DoS attacks: hard to quantify, externalities.
121 | 
122 | Are CAPTCHAs still useful, worthwhile?
123 |     An easy way to impose some non-zero cost on potential adversaries.
124 |     Why do adversaries sign up for Gmail accounts to send spam?
125 | 	Gmail's servers unlikely to be marked as spam senders.
126 | 	Botnet IP addresses are, on the other hand, likely marked as spam.
127 |     At $0.001, 1 CAPTCHA is worth 100 emails (at $0.0001 profit per msg).
128 | 	Borderline-profitable.
129 | 	Bad place to be in terms of security parameters.
130 | 
131 |     Users seem to have become more-or-less OK with solving CAPTCHAs.
132 |     Can we provide better forms of CAPTCHAs?
133 |     Example in paper: Microsoft's Asirra, solvers adapted within days.
134 |     Can sites make the cost of solving a CAPTCHA high?
135 | 
136 | How to protect more valuable services?
137 |     Gmail: SMS-based verification after a few signups from an IP address.
138 | 	Interesting: gmail accounts went from $8 per 1,000 to unavailable!
139 |     Trade-off between defense mechanism usability and security.
140 |     Apparently, users do go away from a site if they must solve CAPTCHAs.
141 |     Do computational puzzles help?  Micropayments?
142 |     Can TPMs help, perhaps on the client machines?
143 | 
144 | Is it ethical to do the kind of research in this paper?
145 |     Authors argue they don't significantly change what's going on.
146 |     They don't solve any additional CAPTCHAs by hand.
147 |     Instead, they re-submit CAPTCHAs back into the system to be solved.
148 |     They don't use the solutions they purchased for any adversarial activity.
149 |     They do inject money into the market, but perhaps not significant.
150 | 
151 | Other courses, if you're interested in security.
152 |     6.857: Computer and Network Security, in the spring.
153 |     6.875: Cryptography and Cryptanalysis, in the spring.
154 | 
155 | 


--------------------------------------------------------------------------------
/previous-years/l20-bots.txt:
--------------------------------------------------------------------------------
  1 | Botnets
  2 | =======
  3 | 
  4 | botnet: network of many machines under someone's control
  5 | 
  6 | what are botnets good for?
  7 |     using the resources of bot nodes:
  8 | 	IP addrs (spam, click fraud), bandwidth (DoS), maybe CPU (??)
  9 |     steal sensitive user data (bank account info, credit cards, etc)
 10 |     impersonate user (inject requests to transfer money on user's behalf)
 11 |     extortion (encrypt user's data, demand payment for decryption)
 12 |     attackers might be able to extract a lot of benefit from high-value machines
 13 | 	one botnet had control of machines of officials of diff governments
 14 | 	could enable audio, video and stream it out of important meetings?
 15 | 	other candiates: stealing secret designs from competitor company?
 16 |     what sorts of attacks are counter-productive for attacker?
 17 | 	making the machine unusable for end-user (unless trying extortion)
 18 | 
 19 | how does the botnet grow?  (largely orthogonal from botnet operation)
 20 |     this particular botnet (Torpig): drive-by downloads
 21 | 	user's browser loads a malicious page (e.g. attacker purchased adspace)
 22 | 	malicious page looks for vulnerabilities in browser or plug-ins
 23 | 	if it finds a way to execute native code, downloads bot code
 24 | 	can we prevent or detect this?  maybe look for unusual new processes?
 25 | 	    botnet in paper: injects DLLs into existing processes
 26 | 		can use a debugging interface to modify existing process
 27 | 		some processes support plugins/modules (IE, windows explorer)
 28 | 		once DLL running in some other process, looks less suspicious?
 29 | 
 30 |     other schemes: worms (self-replicating attack malware)
 31 | 	why worms?
 32 | 	    harder to detect (no single attack source)
 33 | 	    compromise more machines (attacker now behind firewalls)
 34 | 	    faster (much less than an hour for every internet-connected machine)
 35 | 	usually exploit a few wide-spread vulnerabilities
 36 | 	simple worms: exploit some vulnerability in network-facing service
 37 | 	    easy strategy: try to spread to other machines at random
 38 | 		e.g. guessing random IPs works (but inefficient)
 39 | 	    use user's machine as source of other victims
 40 | 		for worms that spread via email, try user's email address book
 41 | 		try other victims in the same network as the current machine
 42 | 		try machines in user's ssh known_hosts file
 43 | 	    use other databases to find candidate victims
 44 | 		google for "powered by phpBB"
 45 | 	    try to propagate to any servers that the user connects to
 46 | 		hides communication patterns!
 47 | 	more complex worms possible (from web server to browser and back)
 48 | 	    requires finding wide-spread bugs in multiple apps at once
 49 | 	    less common as a result?
 50 | 	can we prevent or detect this?
 51 | 	    prevent: could try to isolate machines after you've detected it
 52 | 	    worm fingerprinting in the network (traffic patterns)
 53 | 	    monitor unused machines, email addresses, etc for suspicious traffic
 54 | 		in theory shouldn't be getting anything legitimate
 55 | 	    what would show up if you monitored traffic to unused subnet?
 56 | 		network mapping by researchers?
 57 | 		random probes by worms poking at IP addresses
 58 | 		"backscatter" from source-spoofing
 59 | 		could use these to infer what's happening "out there"
 60 | 	    detect by planting honeypots
 61 | 		if machine starts generating traffic, probably infected
 62 | 
 63 | once some machine is infected, how does the botnet operate?
 64 |     bot master, command and control (C&C) server(s), bots talk to C&C servers
 65 |     bots receive commands from C&C servers
 66 |     some bots accept commands from the network (e.g. run an open proxy server)
 67 |     upload stolen data either to the same C&C servers or some other server
 68 | 
 69 | how do bot masters try to avoid being taken down?
 70 |     change the C&C server's IP address ("fast flux")
 71 | 	can move from one ISP to another after getting abuse complaints
 72 | 	how to inform your bots that your IP address changed?  DNS
 73 | 	domain name is a single point of failure for bot master
 74 |     dynamic domain names ("domain flux")
 75 | 	how does this work?
 76 | 	how do you take down access to a botnet using this?
 77 | 	is there still a single point of failure here?
 78 | 	    currently many different domain registrars, little cooperation
 79 | 	conficker generated many more dynamic domain names than torpig
 80 | 	makes it impractical to register all of these names ahead of time
 81 |     peer-to-peer control networks (Storm botnet)
 82 | 	harder for someone else to take down: no single server
 83 | 	harder for botmaster to hide botnet internals: no protected central srvr
 84 | 
 85 | how did torpig work?
 86 |     mebroot installs itself into the MBR, so gets to inject itself early on
 87 |     loads modules from mebroot C&C server
 88 |     mebroot C&C server responds with torpig DLL to inject into various apps
 89 |     torpig DLL collects any data that matches pre-defined patterns
 90 | 	usernames and passwords; credit card numbers; ...
 91 |     torpig DLL contacts torpig's C&C server for info about what sites to target
 92 | 	torpig's C&C server using domain flux: weekly and daily domains
 93 |     "injection server" responsible for stealing credentials for a specific site
 94 | 	redirects visits to bank login page to fake login page
 95 | 	in-browser DLL subverts any browser protections (SSL, lock icon)
 96 |     lots of "outsourcing" going on: mebroot, torpig, torpig build customers?
 97 | 
 98 |     all traffic encrypted
 99 | 	but these bots implement their own crypto: bad plan, can get broken
100 | 	conficker used well-known crypto, and was thus much harder to break
101 | 
102 | how did these guys take over the botnet?
103 |     attackers did not register every torpig dynamic domain name ahead of time
104 |     bots did not properly authenticate responses from C&C server
105 |     (torpig "owners" eventually took back control through mebroot's C&C)
106 | 
107 | how big is the torpig botnet?
108 |     1.2 million IPs
109 |     each bot has a "nid" that reflects its hardware config (disk serial number)
110 |     ~180k unique nid's
111 |     ~182k unique (nid+os+...)'s
112 |     40 VMs (nid's match a standard configuration of vmware or qemu)
113 |     lots of IP reuse
114 |     aggregate bandwidth is likely over 17 Gbps
115 | 
116 | how effective is torpig?
117 |     authors collected all data during the 10 days they had control of torpig
118 |     collected lots of account information: millions of passwords
119 | 	many users reuse passwords across sites
120 |     8310 accounts at financial institutions
121 |     1660 credit/debit card numbers
122 | 	30 came from a single compromised at-home call center node
123 | 	pattern-matching works well: don't have to know app ahead of time
124 |     kept producing a steady stream of new financial data throughout the 10 days
125 | 	what's going on?
126 | 	probably users don't enter their CC#, bank password every day
127 | 
128 | how effective is spam?
129 |     separate paper looked at the economics of sending spam
130 | 	about 0.005% users visit URLs in spam messages (1 out of 20,000)
131 | 	less than 10% of those users "bought" whatever the site was selling
132 | 	so send ~200,000 spam messages for one real customer
133 | 	unclear if it's cost-effective (esp. if bots are nearly-free)
134 | 
135 | how to defend against bots?
136 |     are TPMs of any help?
137 | 	maybe a way to keep your credentials safe (and avoid simple passwords)
138 |     resource abuse: annoying because it gets your machine blacklisted
139 | 	VMM-level scheme to track user activity?
140 |     make their operation not cost-effective
141 | 	need to get a good idea of what's most profitable for botmasters
142 | 
143 | did these guys make it more difficult to mount similar attacks in the future?
144 |     probably torpig will get fixed
145 |     other papers written about takeovers on different bot nets
146 |     other bots employ much stronger security measures to prevent takeover
147 | 
148 | 


--------------------------------------------------------------------------------
/previous-years/l23-voting.txt:
--------------------------------------------------------------------------------
  1 | Electronic voting
  2 | =================
  3 | 
  4 | final projects reminder
  5 |     10-minute presentations about your projects on Wednesday
  6 | 	we will have a projector that you can use
  7 |     code and write-up describing your project due on Friday
  8 |     will only start grading on monday morning, if you need extension..
  9 | 
 10 | quiz solutions posted on course web site
 11 | HKN course eval link posted on course web site
 12 | 
 13 | ---
 14 | 
 15 | what are the security goals in elections?
 16 |     availability: voters can vote
 17 |     integrity: votes cannot be changed; results reflect all votes
 18 |     registration: voters should vote at most once
 19 |     privacy: voters should not be able to prove how they voted (eg to sell vote)
 20 | 
 21 | what's the threat model?
 22 |     lots of potential attackers
 23 | 	officials, vendors, candidates themselves, activists, governments, ..
 24 | 	    may be interested in obtaining a particular outcome
 25 | 	voters may want to sell votes
 26 |     real world: anything is fair game
 27 | 	intimidation
 28 | 	impersonation (incl. dead people)
 29 | 	denial of service
 30 | 	ballot box stuffing, miscounting, ..
 31 |     electronic voting machine attacks
 32 | 	buffer overflows
 33 | 	logic bugs
 34 | 	insider attacks
 35 | 	physical attacks
 36 | 	crashing / corrupting
 37 | 	..
 38 |     ideal designs focus on making the attack cost high
 39 | 	auditing with penalties if detected
 40 | 
 41 | what are the alternatives?
 42 |     vote in public: raise hands, ..
 43 |     written paper ballots
 44 |     optical-scan paper ballots
 45 |     punched paper ballots
 46 |     DRE (what this paper is about): direct-recording electronic machine
 47 |     absentee voting by mail
 48 | 	vote-selling potential
 49 |     internet voting
 50 | 	greater voter turnout
 51 | 	vote-selling potential
 52 | 	more practical problem: worms/viruses voting?
 53 | 
 54 | why DRE?
 55 |     partly a response to voting problems in florida in the 2000 election
 56 |     hoped: easier-to-use UI, faster results, more accurate counting, ..
 57 |     interesting set of constraints from a research point of view
 58 | 	high integrity, ideally verifiable
 59 | 	most of the process should be transparent and auditable
 60 | 	cannot expose individual voter's choices
 61 | 	cannot allow individual voters to prove their vote
 62 | 
 63 | how does the machine work?
 64 |     133MHz CPU
 65 |     32MB RAM
 66 | 
 67 |     on-board flash memory
 68 |     EPROM socket
 69 |     "ext flash" socket
 70 |     boot selector switches, determine which of above 3 device is used to boot
 71 | 
 72 |     internal speaker
 73 | 
 74 |     external devices:
 75 | 	touch-sensitive LCD panel, keypad, headphones
 76 | 	printer -- why?
 77 | 	smart card reader/writer -- why?
 78 | 	irda transmitter/receiver -- why?
 79 | 
 80 | 	power switch, keyboard port, PC-card slots (behind locked metal door)
 81 | 
 82 | what does the boot sequence look like?
 83 |     bootloader runs from selected source
 84 |     internal flash contains a gzip'ed OS image that gets loaded into RAM
 85 | 	includes image for root file system
 86 |     internal flash contains file system that stores votes, among other things
 87 | 
 88 | what's on the memory card?
 89 |     machine state configured via election.brs file
 90 |     votes stored on memory card (and in the built-in flash) in election.brs
 91 |     data encrypted using a fixed DES key (hard-coded in the software)
 92 | 
 93 | machine states: pre-download, pre-election testing, election, post-election
 94 |     what's the point of L&A testing?
 95 | 	want to distinguish test votes from real votes
 96 | 	want to make it difficult to erase existing votes
 97 | 	also tips off the software that it's being tested!
 98 | 
 99 | why smartcards?
100 |     contain a secure token from sign-in desk to the voting machine
101 |     ideal property: cannot fake a token, cannot duplicate token
102 |     how to implement?  faking is easy, duplication is harder
103 | 	can give each token a unique ID, store used tokens on machine
104 | 	potentially vulnerable to multiple votes on different machines
105 | 	can have smartcard destroy the token after use, no read/write API
106 |     in practice, turned out the machine was not using any smartcard crypto
107 |     attacker can easily manufacture fake smartcards and vote many times
108 |     (attacker can also manufacture an "admin" smartcard and manage the machine)
109 | 
110 | what's the point of printing out receipt tapes post-election?
111 |     in theory can do a recount based on these tapes; compare with check-in data
112 |     assumes the attack is mounted after the election happens
113 | 	corrupt or lost memory cards, compromised tabulation, ..
114 | 
115 | what attacks did the authors explore?
116 |     exploiting physical access to inject malicious code
117 |     vote stealing
118 |     denial of service
119 |     viruses/worms
120 | 
121 | specific bugs
122 |     unauthenticated smartcards
123 |     unauthenticated firmware updates (fboot.nb0)
124 |     unauthenticated OS updates (nk.bin)
125 |     unauthenticated debug mode flag (explorer.glb)
126 |     unauthenticated wipe command (EraseFFX.bsq)
127 |     unauthenticated code injection (.ins files with buffer overflows)
128 |     poor physical security (cheap lock)
129 |     easy to change boot source
130 |     easy to change components like EPROM
131 |     insufficient audit logs (no integrity; election.adt just has "Ballot cast")
132 |     sound when machine reboots, but can be prevented with headphones
133 | 
134 | what to do when an audit shows an error?
135 |     with this machine: denial of service attack, effectively
136 |     ideally would be able to reconstruct what happened or recount manually?
137 | 
138 | how to scrub a machine after a potential compromise?
139 |     can't trust anything: all memory/code easily changed by attacker
140 |     need to install a known-good EPROM, use that to overwrite bootloader, OS
141 |     can take a long time, esp. if problem spread to many machines
142 | 
143 | how to prevent these attacks?
144 |     TPM / secure boot?
145 |     would signed files be enough?
146 | 	attacker can get a hold of signed "debug mode" file and he's done?
147 | 	signed software updates might not be the latest version
148 | 	    attacker installs old version, exploits bug
149 | 	    might want to prevent rollbacks (but may want to allow, too?)
150 |     read-only memory for software?
151 | 	physical switches to allow updates
152 | 	could make it more difficult to write a fast-spreading virus/worm
153 |     physical access control
154 | 	probably a good idea, to some extent
155 | 	auditing physical access leads to easy DoS attacks
156 | 	need a strong audit mechanism to prevent DoS (i.e., can recount)
157 |     append-only memory for auditing?
158 | 	disable the "flash" (rewrite) circuitry from flash memory?
159 | 	or just have a dedicated "audit" controller
160 | 	system already has a separate battery-management PIC
161 |     OS-level protection?
162 |     language security?
163 |     operating / setup procedures?
164 | 	who has access to the machine, chain of custody, ...
165 |     parallel testing?
166 | 
167 | what is software-independence?
168 |     malicious software alone cannot change election results (undetectably)
169 |     e.g. software helps print out ballot, voter makes sure ballot is OK
170 |     or prints out a paper tape with all votes, which is counted by hand
171 | 
172 | usability for voters?
173 |     paper doesn't describe the UI, unfortunately..
174 |     "machine ate my vote"?
175 |     could invalidate smartcard and crash?
176 | 
177 | usability for officials?
178 |     potentially same problems as PGP
179 |     do officials have the right mental model to worry about potential attacks?
180 | 
181 | end-to-end integrity
182 |     voting integrity has 3 parts:
183 | 	cast-as-intended
184 | 	collected-as-cast
185 | 	counted-as-collected
186 |     above techniques only help ensure cast-as-intended
187 |     need more end-to-end security to ensure other 2 properties
188 |     Twin scheme by Rivest and Smith
189 | 
190 | 


--------------------------------------------------------------------------------
/previous-years/l21-dropbox.txt:
--------------------------------------------------------------------------------
  1 | Looking inside the (drop)box 
  2 | ==================
  3 | 
  4 | why are we reading this paper?
  5 |   code obfuscation is a common goal in the real world
  6 |      skype,  dropbox
  7 |      gmail
  8 |      malware
  9 |   closed versus open design
 10 |      contrast bitlocker and dropbox client
 11 | 
 12 | this paper has several aspects
 13 |   code obfuscation weaknesses
 14 |       focus of this lecture
 15 |   user authentication weaknesses
 16 |       not our focus, technically less interesting
 17 |       automatic login without user credentials fixed (i think)
 18 |  aside: etiquette with finding security flaws
 19 |      report before you publish 
 20 | 
 21 | what is Dropbox's goal for obfuscation?
 22 |    don't know, but ...
 23 |    no open-source client
 24 |       dropbox e.g., can change the wire protocol at will
 25 |    make it difficult to for competitors to develop a client 
 26 |       portable fs client is tricky
 27 | 
 28 | what is the threat model?
 29 |    adversary has access to obfuscated code and can run it
 30 |    adversary reverse re-engineers client to avoid the above goals
 31 |    sidenote: malware may have additional threats to protect against
 32 |      e.g., make it difficult to fingerprint so that anti-virus application cannot remove malware
 33 | 
 34 | challenging threat, because:
 35 |    code must run correctly on adversary's processor
 36 |    code may have to make systems calls
 37 |    code may have to be linked dynamically with host libraries
 38 |    adversary can observe processor and systems calls
 39 | 
 40 | general approach: code obfuscation
 41 |     Given a program P, produce O(P)
 42 |     O(P) has same functions as P but a black box
 43 |           there is nothing substantial one can learn from O(P)
 44 |     O(P) isn't much slower than P
 45 | 
 46 | minimum requirement: adversary cannot reconstruct P
 47 |     ignore programs that are trivially learnable from excuting w. different inputs
 48 |     easy to avoid complete failure
 49 |        execute only if an input matches some SHA hash
 50 |        hash is embedded in program, but difficult to compute inverse
 51 |     difficult to succeed completely 
 52 |        program prints itself
 53 |     in general: impossible (see references)
 54 |        there is a family of interesting programs for which O(P) will fail [see references]
 55 |     but, perhaps you could do well on a particular program
 56 |     difficult to state a precise requirements for an obfuscator
 57 |        should be skeptical that it can work in practice against skilled adversary
 58 | 
 59 | code obfuscation in practice
 60 |      write C programs from which is difficult to tell what they do
 61 |          down-side: hard on developer
 62 | 	 but makes for great contests (e.g., International Obfuscated C Code Contest[)
 63 |     use an obfuscator
 64 |        Takes a program as input and produces a intermediate code
 65 | 	   You don't want to ship the original source code
 66 |        Ship program in intermediate form with interpreter to computer
 67 |   	   You don't want to ship the actual assembly
 68 | 	   Can cook up your own intermediate language that nobody knows
 69 |        Computer runs interpreter, which interprets intermediate code
 70 | 	   Interpreter reads input and outputs values
 71 | 	   The interpreter can try to hide what is actual computing
 72 | 	       Fake instructions, fake control
 73 | 	       Use inputs as index into a fine state machine and spit out values
 74 | 	       Etc.
 75 | 
 76 | dropbox's approach
 77 |    all code is written in python
 78 |        compiles programs to bytecode
 79 |        interpreter executes bytecode
 80 |    dropbox application
 81 |        contains encrypted python byte code
 82 |           encryption method is changed often
 83 |        byte code opcodes are different than Python
 84 |        contains a special interpreter
 85 |        application is built/packaged in non-standard way
 86 |            special "linker"
 87 | 
 88 | dynamic linking
 89 |    what are the .so files in the downloaded dropbox directory?
 90 |       dynamically-linkable libraries
 91 |    modern applications are not a single file
 92 |    when application runs and unresolved references are resolved at runtime
 93 |       e.g., application makes a system call
 94 |       dynamic linker links the application with the library with system call stubs
 95 |    adv: library is only once in memory
 96 |       with static linking: library would be N times in memory
 97 |       once with each application
 98 |   LD_PRELOAD: insert your own library in front of others
 99 |   dropbox ships its app with several libraries that are dynamically linked
100 |       but interpreter and SSL are statically linked
101 |     
102 | goal of paper: *automatically* break obfuscation (de-drop)
103 |   another goal: break user authentication
104 |   demo:
105 |     look at dropbox binary
106 |     ls: no pyc files
107 |     gdb binary
108 |     nm binary
109 |     objdump -S binary
110 | 
111 |     run dropboxd with LD preload
112 |      extracts pyc_decrypted
113 |     cd pyc_decrypted/client_api
114 |     python
115 |       import hashing
116 |       dir (hashing)
117 |    run uncompyle2 hashing.pyc
118 | 
119 | Paper: how to de-crypt pyc files?
120 |   study modified python interpreter
121 |       diffed Python27.ddl from dropbox with standard
122 |   r_object is patched
123 |      decrypt decrypts bytecode
124 |   how to extract encrypted bytecode?
125 |      inject code into dropbox binary using LD_PRELOAD
126 |      injected code overwrites strlen
127 |          when strlen is called by dropbox, injected code runs
128 |      inject Python code using PyRun_SimpleString 
129 |        not patched
130 |        can run arbitrary python code in dropbox context
131 |          GIL must be acquired by injected code
132 |      call PyMarshal_ReadLastObjectFromFile()
133 |          reads encrypted pyc into memory
134 | 	    but, co_code is not exposed to Python!
135 |             linear memory search to find co_code
136 | 	 serialize it back to a file
137 | 	    but, marshal.dumps is NOP
138 | 	    inject PyPy's _marshal.py
139 |               written in python!
140 | 
141 | How to remap opcodes?
142 |    manual reconstruct opcode mapping
143 |       time intensive, but opcode hasn't changed since 1.6.0
144 |    frequency analysis for common modules
145 |      decrypted dropbox bytecode 
146 |      standard bytecode
147 | 
148 | How to get user credentials?
149 |    hostid are used for authentication
150 |       established during registration
151 |          not affected by changing password!
152 |       stored in encrypted sql database
153 |          components of decryption key are stored on device
154 |          linux: custom obfuscator
155 |          except host_int comes from server
156 |    Can also be extracted from dropbox client logs
157 |       enable logging based MD5 checksum of "DBDEV"
158 |       md5("a2y6shaya") = "c3da6009e4"
159 |       patched now.
160 |   Snooping on objects, looking for host_id and host_int
161 |   Login to web site for logintray is based only on host_id and host_int
162 |      Dropbox uses now "better" logintray ...
163 |      Dropbox should probably use SRP (or something else good)
164 | 
165 | How to learn what dropbox internal APIs are?
166 |    Patch all SSL objects, every second
167 |        "monkey patch"  == dynamic modifications of a class at runtime without
168 |             modifying the original source code
169 |         maybe derived from guerrilla (as in an sneaky attack) patch?
170 |    No two-factor authentication for access to drop-box account
171 |    One use: open-source client
172 | 
173 | Is the dropbox obfuscation the best you can do?
174 |    No.
175 |    How could you do better?  
176 |       Hide instructions much better
177 |       Obscure control flow
178 |    But, is it worth it?
179 | 
180 | Closed versus open design
181 |    Downside of closed designs
182 |       easy to miss assumptions because right eyes don't  look at it
183 |    Downside of open design
184 |       you competitor has access to it too
185 |    Ideal case: minimal secret, make most of design open
186 |       maybe not always possible to make the secret small?
187 | 
188 | References
189 |     http://www.math.ias.edu/~boaz/Papers/obfuscate.ps
190 |     http://www.math.ias.edu/~boaz/Papers/obf_informal.html
191 |     https://github.com/kholia/dedrop
192 |     uncompyle2 https://github.com/wibiti/uncompyle2
193 | 


--------------------------------------------------------------------------------
/previous-years/l18-dealloc.txt:
--------------------------------------------------------------------------------
  1 | Secure deallocation
  2 | ===================
  3 | 
  4 | Aside: some recent reverse-engineering of Stuxnet by Symantec.
  5 |     http://www.symantec.com/connect/blogs/stuxnet-breakthrough
  6 |     Stuxnet targets specific frequency converters.
  7 |     Manufactured by companies headquartered in either Finland or Tehran.
  8 |     Used to drive motors at high speeds.
  9 |     Stuxnet watches for a specific frequency band.
 10 |     When detected, changes frequencies to low or high for short periods.
 11 | 
 12 | Problem: disclosure of sensitive data.
 13 |     1. Many kinds of sensitive data in applications.
 14 |     2. Copies of sensitive data exist for a long time in running system.
 15 |     3. Many ways for data to be disclosed (often unintentionally).
 16 | 
 17 | What kinds of sensitive data are these authors concerned about?
 18 |     Passwords, crypto keys, etc.
 19 |     Small amounts of data that can be devastating if disclosed.
 20 |     Bulk data, such as files in a file system.
 21 | 	Sensitive, but not as acute.
 22 | 	Hard to reduce data lifetime (the only knob this paper is using).
 23 | 	Small leaks might not be a disaster (unlike with a private key).
 24 | 
 25 | Where could copies of sensitive data exist in a running system?
 26 |     Example applications: typing password into Firefox; Zoobar web server.
 27 |     Process memory: heap, stack.
 28 | 	IO buffers, X event queues, string processing libraries.
 29 | 	Language runtime makes copies (immutable strings, Lisp objects, ..)
 30 |     Thread registers.
 31 |     Files, backups of files, ...
 32 |     Swapped memory, hibernate for laptops.
 33 |     Kernel memory.
 34 | 	IO buffers: keyboard, mouse inputs.
 35 | 	Kernel stack, freed pages, saved thread registers.
 36 | 	Network packet buffers.
 37 | 	Pipe buffers contain data sent between processes.
 38 | 	Random number generator inputs.
 39 | 
 40 | How does data get disclosed?
 41 |     Any vulnerability that allows code execution.
 42 |     Logging / debugging statements.
 43 |     Core dumps.
 44 |     DRAM cold-boot attacks.
 45 |     Stolen disks, or just disposing of old disks.
 46 |     Revealing uninitialized memory.
 47 | 	Applications with memory management bugs.
 48 | 	Linux kernel didn't zero net buffers, sent "garbage" data in packets.
 49 | 	Same with directories, "garbage" data was written to disk upon mkdir.
 50 | 	MS Word (used to?) contain "garbage" in saved files, such as old text.
 51 | 
 52 | How serious is it?
 53 |     What data copies might persist for a long time?
 54 | 	Process memory: Looks like yes.
 55 | 	    How do they figure this out?
 56 | 	    Use valgrind -- could do something similar in DynamoRIO.
 57 | 	    Track all memory allocs, reads, writes, frees.
 58 | 	Process registers: Maybe floating-point?  Still, probably not that bad.
 59 | 	Files, backups: lives on disk, long-term.
 60 | 	Swap: lives on disk, possibly long-term, expensive to erase.
 61 | 	Kernel memory.
 62 | 	    Experiments in paper show live data after many weeks (Sec 3.2).
 63 | 	    How do they figure this out?
 64 | 		Place many random 20-byte "stamps" in memory.
 65 | 		Periodically read all phys. memory in kernel, look for stamps.
 66 | 	    How can data continue to persist for so long?
 67 | 	    Memory should be getting reused?
 68 | 		To some extent, depends on the workload.
 69 | 		Even with an expensive workload, may not eliminate all stamps.
 70 | 		Holes in long-lived kernel data structures, slab allocators.
 71 | 	    Persistence across reboots, even.
 72 |     Are there really that many data disclosure bugs?
 73 | 	Some examples of past bugs.
 74 | 	Worse yet: data disclosure bugs not treated with much urgency?
 75 | 
 76 | Paper's goal:
 77 |     Try to minimize the amount of time that sensitive data exists.
 78 |     Not focusing on fixing data disclosure mechanisms (hard to generalize).
 79 | 
 80 | How do we reduce/avoid data copies?
 81 |     Process memory: need application's help.  Mostly what this paper is about.
 82 |     Process registers: not really needed.
 83 |     Swap: mlock(), mlockall() on Unix.  Encrypted swap.
 84 |     File system: Bitlocker.  Vanish, if the application is involved.
 85 |     Kernel memory: need to modify the kernel.  Partly discussed in paper.
 86 | 
 87 | Paper's model for thinking about data lifetime in memory.
 88 |     Interesting operations: allocation, write, read, free.
 89 |     Conceptually applies to any memory.
 90 | 	malloc(), stack allocation on function call, global variables, ..
 91 |     Ideal lifetime for data: from first write to last read (before write/free).
 92 | 	Can't do any better: data must stay around.
 93 |     Natural lifetime: from first write to next write
 94 | 		      (potentially after free and re-alloc).
 95 | 	Natural lifetime is what most systems do today.
 96 | 	Data lives until overwritten by something else re-using that memory.
 97 | 
 98 | Why is natural lifetime too long?
 99 |     Bursty memory allocation: memory freed, never allocated again.
100 |     "Holes": not every byte of an allocation might be written to.
101 | 	Holes in the stack.
102 | 	Unused members in structs.
103 | 	Padding in structs.
104 | 	Variable-length data (e.g., packets or path names).
105 | 
106 | How can we do better than natural lifetime?
107 |     "Secure deallocation": erase data from memory when region is freed.
108 | 	Safe: programs should not rely on data living past free.
109 | 	How close to ideal is this?
110 | 	Depends on program, experiments show usually good (except for GUIs).
111 |     Can we do better?
112 | 	Might be able to figure out last read through program analysis.
113 | 	Seems tricky to do in a general-purpose way.
114 | 	Programmers can manually annotate, or manually clear data.
115 | 
116 | Secure deallocation in a process.
117 |     Heap: zero out the memory in free().
118 | 	What about memory leaks?  Rely on OS to clean up on process exit.
119 | 	Private allocators?  Modify, or rely on reuse or returning memory to OS.
120 |     Stack: two plans.
121 | 	1. Augment the compiler to zero out stack frames on function return.
122 | 	2. Periodically zero out memory below stack pointer, from the OS.
123 | 	Advantages / disadvantages:
124 | 	    1 is precise, but maybe expensive (CPU time, memory bandwidth).
125 | 	    2 is cheaper, but may not clear right away, or delete everything.
126 | 	    1 requires re-compiling code; 2 works with unmodified binaries.
127 |     Static data in process memory: rely on OS to clean up on exit.
128 | 
129 | Secure deallocation in the kernel.
130 |     Can we apply the same plan as in the applications?  Why or why not?
131 | 	Vague argument about kernel being performance-sensitive.
132 | 	Not clear exactly why this is (applications are also perf-sensitive?).
133 |     What kinds of data do we want to clear in the kernel?
134 | 	Data that applications are processing: IO buffers, anon process memory.
135 | 	Not internal kernel data (e.g., pointers).
136 | 	Not application data that lives on disk (files, directories).
137 |     Page allocation: track pages that contain sensitive data ("polluted").
138 | 	Three lists of free pages:
139 | 	    - Zeroed pages.
140 | 	    - Polluted non-zero pages.
141 | 	    - Unpolluated non-zero pages.
142 | 	How is the polluted bit updated?
143 | 	    Manually set in kernel code when page is used for process memory.
144 | 	    Cleared when polluted free page is zeroed or overwritten.
145 |     Smaller kernel objects: caller of kfree() must say if object is polluted.
146 | 	Objects presumably include network buffers, pipes, user IO, ..
147 | 	Memory allocator then erases data just like free() in user-space.
148 |     Circular queues: semi-static allocation / specialized allocator.
149 | 	E.g., terminal buffers, PRNG inputs.
150 | 	Erase data when elements removed from queue.
151 | 
152 | More efficient clearing of kernel memory.
153 |     No numbers to explain why optimizations are needed, or which ones matter..
154 |     Page zeroing: return different pages depending on callers to alloc.
155 | 	Insight: zeroed pages are "expensive", polluted pages are "cheap".
156 | 	1. Can return polluted page if caller will overwrite entire page.
157 | 	   E.g., new page to be used to read an entire page from disk.
158 | 	2. Avoid returning zeroed pages if caller doesn't care about contents.
159 | 	   If not enough memory, return zeroed page, or zero a polluted page.
160 | 	   Cannot simply return polluted page: sensitive data may persist.
161 |     Batch page zeroing: why?
162 | 	Allows the optimization of caller overwriting page to take place.
163 | 	May improve interactive performance, by deferring the cost of zeroing.
164 |     Specialized zeroing strategies.
165 | 	Variable-length buffers: packets (implemented), path names (not).
166 | 	Clear out just the used part (e.g., 64 byte pkt in 1500-byte buffer).
167 | 
168 | Side-effects of secure deallocation.
169 |     Might make some bugs more predictable, or make bugs go away.
170 |     Periodic stack clearing may make uninitialized stack bugs less predictable.
171 | 
172 | Performance impact?
173 |     Seems to be low, but a bit hard to tell what's going on in the kernel.
174 | 
175 | What happens in a higher-level language (PHP, Javascript, ..)?
176 |     May need to modify language runtime to erase stack.
177 |     If runtime uses own allocator (typical), need to modify that as well.
178 |     Otherwise, free() may be sufficient.
179 | 
180 | How does garbage collection interact with secure deallocation?
181 |     Reference-counting GC can free, erase objects fast in most cases.
182 |     Periodic garbage collection may unnecessarily prolong data lifetime.
183 | 
184 | 


--------------------------------------------------------------------------------
/previous-years/l19-backtracker.txt:
--------------------------------------------------------------------------------
  1 | Backtracking intrusions
  2 | =======================
  3 | 
  4 | Overall problem: intrusions are a fact of life.
  5 |     Will this ever change?
  6 |     Buggy code, weak passwords, wrong policies / permissions..
  7 | 
  8 | What should an administrator do when the system is compromised?
  9 |     Detect the intrusion ("detection point").
 10 | 	Result of this stage is a file, network conn, file name, or process.
 11 |     Find how the attacker got access ("entry point").
 12 | 	This is what Backtracker helps with.
 13 |     Fix the problem that allowed the compromise
 14 |        (e.g., weak password, buggy program).
 15 |     Identify and revert any damage caused by intrusion
 16 |        (e.g., modified files, trojaned binaries, their side-effects, etc).
 17 | 
 18 | How would an administrator detect the intrusion?
 19 |     Modified, missing, or unexpected file; unexpected or missing process.
 20 |     Could be manual (found extra process or corrupted file).
 21 |     Tripwire could point out unexpected changes to system files.
 22 |     Network traffic analysis could point out unexpected / suspicious packets.
 23 |     False positives is often a problem with intrusion detection.
 24 | 
 25 | What good is finding the attacker's entry point?
 26 |     Curious administrator.
 27 |     In some cases, might be able to fix the problem that allowed compromise.
 28 | 	User with a weak / compromised password.
 29 | 	Bad permissions or missing firewall rules.
 30 | 	Maybe remove or disable buggy program or service.
 31 | 	Backtracker itself will not produce fix for buggy code.
 32 | 	Can we tell what vulnerability the attacker exploited?
 33 | 	    Not necessarily: all we know is object name (process, socket, etc).
 34 | 	    Might not have binary for process, or data for packets.
 35 |     Probably a good first step if we want to figure out the extent of damage.
 36 | 	Initial intrusion detection might only find a subset of changes.
 37 | 	Might be able to track forward in the graph to find affected files.
 38 | 
 39 | Do we need Backtracker to find out how the attacker gained access?
 40 |     Can look at disk state: files, system logs, network traffic logs, ..
 41 |     Files might not contain enough history to figure out what happened.
 42 |     System logs (e.g., Apache's log) might only contain network actions.
 43 |     System logs can be deleted, unless otherwise protected.
 44 | 	Of course, this is also a problem for Backtracker.
 45 |     Network traffic logs may contain encrypted packets (SSL, SSH).
 46 | 	If we have forward-secrecy, cannot decrypt packets after the fact.
 47 | 
 48 | Backtracker objects
 49 |     Processes, files (including pipes and sockets), file names.
 50 |     How does Backtracker name objects?
 51 | 	File name: pathname string.
 52 | 	    Canonical: no ".." or "." components.
 53 | 	    Unclear what happens to symlinks.
 54 | 	File: device, inode, version#.
 55 | 	    Why track files and file names separately?
 56 | 	    Where does the version# come from?
 57 | 	    Why track pipes as an object, and not as dependency event?
 58 | 	Process: pid, version#.
 59 | 	    Where does the version# come from?
 60 | 	    How long does Backtracker have to track the version# for?
 61 | 
 62 | Backtracker events
 63 |     Process -> process: fork, exec, signals, debug.
 64 |     Process -> file: write, chmod, chown, utime, mmap'ed files, ..
 65 |     Process -> filename: create, unlink, rename, ..
 66 |     File -> process: read, exec, stat, open.
 67 |     Filename -> process: open, readdir, anything that takes a pathname.
 68 |     File -> filename, filename -> file: none.
 69 |     How does Backtracker name events?
 70 | 	Not named explicitly.
 71 | 	Event is a tuple (source-obj, sink-obj, time-start, time-end).
 72 |     What happens to memory-mapped files?
 73 | 	Cannot intercept every memory read or write operation.
 74 | 	Event for mmap starts at mmap time, ends at exit or exec.
 75 |     Implemented: process fork/exec, file read/write/mmap, network recv.
 76 | 	In particular, none of the filename stuff.
 77 | 
 78 | How does Backtracker avoid changing the system to record its log?
 79 |     Runs in a virtual machine monitor, intercept system calls.
 80 |     Extracts state from guest virtual machine:
 81 | 	Event (look at system call registers).
 82 | 	Currently running process (look at kernel memory for current PID).
 83 | 	Object being accessed (look at syscall args, FD state, inode state).
 84 | 	Logger has access to guest kernel's symbols for this purpose.
 85 |     How to track version# for inodes or pids?
 86 | 	Might be able to use NFS generation numbers for inodes.
 87 | 	Need to keep a shadow data structure for PIDs.
 88 | 	Bump generation number when a PID is reused (exit, fork, clone).
 89 | 
 90 | What do we have to trust?
 91 |     Virtual machine monitor trusted to keep the log safe.
 92 |     Kernel trusted to keep different objects isolated except for syscalls.
 93 |     What happens if kernel is compromised?
 94 | 	Adversary gets to run arbitrary code in kernel.
 95 | 	Might not know about some dependencies between objects.
 96 |     Can we detect kernel compromises?
 97 | 	If accessed via certain routes (/dev/kmem, kernel module), then yes.
 98 | 	More generally, kernel could have buffer overflow: hard to detect.
 99 | 
100 | Given the log, how does Backtracker find the entry point?
101 |     Present the resulting dependency graph to the administrator.
102 |     Ask administrator to find the entry point.
103 | 
104 | Optimizations to make the graph manageable.
105 |     Distinction: affecting vs. controlling an object.
106 | 	Many ways to affect execution (timing channels, etc).
107 | 	Adversary interested in controlling (causing specific code to execute).
108 | 	High-control vs. low-control events.
109 | 	Prototype does not track file names, file metadata, etc.
110 |     Trim any events, objects that do not lead to detection point.
111 |     Use event times to trim events that happened too late for detection point.
112 |     Hide read-only files.
113 | 	Seems like an instance of a more general principle.
114 | 	Let's assume adversary came from the network.
115 | 	Then, can filter out any objects with no (transitive) socket deps.
116 |     Hide nodes that do not provide any additional sources.
117 | 	Ultimate goal of graph: help administrator track down entry point.
118 | 	Some nodes add no new sources to the graph.
119 | 	More general than read-only files (above):
120 | 	    Can have socket sources, as long as they're not new socket sources.
121 | 	    E.g., shell spawning a helper process.
122 | 	    Could probably extend to temporary files created by shell.
123 |     Use several detection point.
124 | 	Sounds promising, but not really evaluated.
125 |     Potentially unsound heuristics:
126 | 	Filter out low-control events.
127 | 	Filter out well-known objects that cause false positives.
128 | 	E.g., /var/log/utmp, /etc/mtab, ..
129 | 
130 | How can an adversary elude Backtracker?
131 |     Avoid detection.
132 |     Use low-control events.
133 |     Use events not monitored by Backtracker (e.g., ptrace).
134 |     Log in over the network a second time.
135 | 	If using a newly-created account or back door, will probably be found.
136 | 	If using a password stolen via first compromise, might not be found.
137 |     Compromise OS kernel.
138 |     Compromise the event logger (in VM monitor).
139 |     Intertwine attack actions with other normal events.
140 | 	Exploit heuristics: write attack code to /var/log/utmp and exec it.
141 | 	Read many files that were recently modified by others.
142 | 	    Other recent modifications become candidate entry points for admin.
143 |     Prolong intrusion.
144 | 	Backtracker stores fixed amount of log data (paper suggests months).
145 | 	Even before that, there may be changes that cause many dependencies.
146 | 	    Legitimate software upgrades.
147 | 	    Legitimate users being added to /etc/passwd.
148 | 	    Much more difficult to track down intrusions across such changes.
149 | 
150 | Can we fix file name handling?
151 |     What to do with symbolic links?
152 |     Is it sufficient to track file names?
153 | 	Renaming top-level directory loses deps for individual file names.
154 | 	More accurate model: file names in each directory; dir named by inode.
155 |     Presumably not addressed in the paper because they don't implement it.
156 | 
157 | How useful is Backtracker?
158 |     Easy to use?
159 | 	Administrator needs to know a fair amount about system, Backtracker.
160 | 	After filtering, graphs look reasonably small.
161 |     Reliable / secure?
162 | 	Probably works fine for current attacks.
163 | 	Determined attacker can likely bypass.
164 |     Practical?
165 | 	Overheads probably low enough.
166 | 	Depends on VM monitor knowing specific OS version, symbols, ..
167 | 	Not clear what to do with kernel compromises
168 | 	Probably still OK for current attacks / malware.
169 |     Would a Backtracker-like system help with Stuxnet?
170 | 	Need to track back across a ~year of logs.
171 | 	Need to track back across many machines, USB devices, ..
172 | 	Within a single server, may be able to find source (USB drive or net).
173 | 	Stuxnet did compromise the kernel, so hard to rely on log.
174 | 
175 | Do we really need a VM?
176 |     Authors used VM to do deterministic replay of attacks.
177 |     Didn't know exactly what to log yet, so tried different logging techniques.
178 |     In the end, mostly need an append-only log.
179 |     Once kernel compromised, no reliable events anyway.
180 |     Can send log entries over the network.
181 |     Can provide an append-only log storage service in VM (simpler).
182 | 
183 | 


--------------------------------------------------------------------------------
/previous-years/l20-traceback.txt:
--------------------------------------------------------------------------------
  1 | Denial of service attacks
  2 | =========================
  3 | 
  4 | What kinds of DoS attacks can an adversary mount?
  5 |     Exhaust resources of some service.
  6 | 	Network bandwidth.
  7 | 	CPU time (e.g., image processing, text searching, etc).
  8 | 	Disk bandwidth (e.g., complex SQL queries touching a lot of data).
  9 | 	Disk space, memory.
 10 |     Deny service by exploiting some vulnerability in protocol, application.
 11 | 	In TCP, if adversary can guess TCP sequence numbers, can send RST.
 12 | 	    Terminates TCP connection.
 13 | 	In 802.11, deauthenticate packets were (still?) not authenticated.
 14 | 	    Adversary can forge deauthenticate packets, disconnect client.
 15 | 	In BGP, routers perform little authentication on route announcements.
 16 | 	    A year or so ago, Pakistan announced BGP route for Youtube.
 17 | 	    In April, China announced BGP routes for many addresses.
 18 |     Poorly-designed or poorly-implemented protocols or apps can be fixed.
 19 |     Resource exhaustion attacks are often harder to fix.
 20 | 
 21 | Why do attackers mount DoS attacks?
 22 |     "Spite", but increasingly less so.
 23 |     Extortion.  Force victim to incur cost of defense or downtime.
 24 |     Extortion (used to be?) relatively common for online gambling sites.
 25 | 	High-value, time-sensitive, downtime is very costly.
 26 | 
 27 | Network bandwidth DoS attacks.
 28 |     Adversary unlikely to directly have overwhelming network bandwidth.
 29 |     Thus, key goal for an adversary is amplification.
 30 |     One way to amplify bandwidth: reflection.
 31 | 	Early trick: "smurf", send source-spoofed ICMP ping to broadcast addr.
 32 | 	More likely today: source-spoofed UDP DNS queries.
 33 | 	Why don't adversaries use TCP services for reflection?
 34 |     Higher-level amplification: compromise machines via malware, form botnet.
 35 | 	Most prevalent today, can send well-formed TCP connections.
 36 | 	Why are TCP connections more interesting for adversaries?
 37 | 	Reflected ICMP, UDP packets much easier to filter out.
 38 | 
 39 | CPU time attacks.
 40 |     Complex applications perform large amounts of computation for requests.
 41 |     SSL handshake, PDF generation, Google search, airline ticket searches.
 42 |     High-end DoS attackers do this routinely to incur maximum cost per request.
 43 | 
 44 | Disk bandwidth attacks.
 45 |     Disk is often the slowest part of the system (100 seeks per second?)
 46 |     Systems optimized to avoid disk whenever possible: use caches.
 47 |     Caches work due to statistical distributions.
 48 |     Adversary can construct an unlikely distribution, ask for unpopular data.
 49 |     Caches no longer effective, many queries hit disk, system grinds to a halt.
 50 |     Hard to control, predict, or even detect.
 51 | 
 52 | Space exhaustion attacks (disk space, memory).
 53 |     Once a user is authenticated, relatively easy to enforce quotas.
 54 |     Many protocols require servers to store state on behalf of unknown clients.
 55 | 
 56 | How to defend against DoS attacks in general?
 57 |     Accountability: track down the attacker.
 58 | 	Becoming harder to do, at a conceptual level, with botnets, Tor, ..
 59 |     Require authentication to access services.
 60 | 	Lowest level (IP) does not provide authentication by default.
 61 |     Require clients to prove they've spent some resources.
 62 | 	Might be plausible if adversary's goal is to exhaust server resources.
 63 | 	Captchas.
 64 | 	Cryptographic puzzles.
 65 | 	    Given challenge (C,n) find R so that low n bits of SHA1(C||R) are 0.
 66 | 	    Easy to synthesize challenge and verify answer.
 67 | 	    Easy to scale up the challenge, if under attack.
 68 | 	    Deliver/verify challenge over some protocol not susceptible to DoS.
 69 | 	    One slight problem: CPU speeds vary a lot.
 70 | 	    More memory-intensive puzzles also exist, might be more fair.
 71 | 	Micropayments.
 72 | 	    Some "e-stamp" proposals tried, but micropayments are hard.
 73 | 	Bandwidth (Speak-up by Mike Walfish).
 74 | 	Big problem: adversary can get more resources through botnets.
 75 | 
 76 | Specific problem: IP address spoofing.
 77 |     What's the precise problem?
 78 | 	Adversary can put any IP address as source when sending packet.
 79 | 	Not all networks perform sanity-checks on source IP addresses.
 80 | 	Hard for victim to track down who is responsible for the traffic.
 81 |     What resources can adversary exhaust in this manner?
 82 | 	Can send arbitrary packets, exhausting bandwidth.
 83 | 	Can issue any queries to UDP services (e.g., DNS), exhausting CPU time.
 84 | 	Cannot establish fully-open TCP connections (must guess sequence#).
 85 | 	Can create half-open TCP conns, exhausting server memory (SYN flood).
 86 |     SYN flood problem: three-way TCP handshake (SYN, SYN-ACK, ACK).
 87 | 	Server must keep state about the received SYN and sent SYN-ACK.
 88 | 	Needed to figure out what connection the third ACK packet is for.
 89 |     One solution: use cryptography to off-load state onto the client.
 90 | 	SYN cookies: encode server-side state into sequence number.
 91 | 	    seq = MAC(client & server IPs, ports, timestamp) || timestamp
 92 | 	Server computes seq as above when sending SYN-ACK response.
 93 | 	Server can verify state is intact by verifying hash (MAC) on ACK's seq.
 94 | 	Not quite ideal: need to think about replay attacks within timestamp.
 95 | 	Another problem: if third packet lost, noone retransmits.
 96 | 	Maybe not a big deal in case of a DoS attack.
 97 | 	Only a problem for protocols where server speaks first.
 98 | 
 99 | What's the best we can hope for in an IP traceback scheme?
100 |     No way to authenticate messages from any given router.
101 |     Goal: suffix of the real attack path.
102 |     Adversary is free to make up his or her own routers.
103 |     Infact, this is realistic, since adversary may be an actual ISP.
104 |     Rely on fact that adversary's packets must repeatedly traverse suffix.
105 | 
106 | Typical constraints for deploying IP traceback, in order of increasing hardness:
107 |     Routers are hard to change.
108 |     Routers cannot do a lot of processing per packet.
109 |     End-hosts are hard to change.
110 |     Packets formats are nearly impossible to change.
111 | 
112 | Manual tracing through the network.
113 |     1. Find a pattern for the attack packets (e.g., destination address).
114 |     2. Call up your ISP, ask them to tcpdump and say where packets come from.
115 |     3. Repeat calling up the next ISP and asking them to do the same.
116 |     Slow, tedious, non-scalable, hard to get cooperation from far-away ISPs.
117 | 
118 | Controlled flooding.
119 |     Clever idea: flood individual links you suspect might be used by attack.
120 |     See how the flood affects the incoming DoS packets.
121 |     Potentially works for a single source of attack, but causes DoS by itself.
122 | 
123 | Ideal packet marking: record every link traversed by a packet.
124 |     Problem: requires a lot of space in each packet.
125 | 
126 | Trade-off: record individual links with some probability ("edge sampling").
127 |     Each packet gets marked with two link endpoints and a distance counter.
128 |     How do we reconstruct the path from the individual links?
129 |     How do we decide when to mark a packet?  Small probability.
130 |     What if the packet is already marked?  Why overwrite?
131 |     Why do we need a distance counter?
132 |     Why do we need the two endpoints to each mark the packet with their own IP?
133 | 	Could have one router write down its own IP and the next hop's IP.
134 | 	However, routers have many interfaces, with a separate IP for each.
135 | 	Makes it difficult for end-node machine to piece together route.
136 | 	Don't know when two IPs belong to the same router.
137 | 
138 | Making edge sampling work in IP packets.
139 |     Challenge: encoding edge information into IP packet.
140 | 	Ideally, want to store 2 IPs (2 x 32 bits) and distance (8 bits).
141 | 	Authors only found space for 16 bits in the rarely-used fragment ID.
142 |     Trick 1: Edge IDs.
143 | 	XOR the IPs of neighboring nodes into a single 32-bit edge ID.
144 | 	How much does this save us?
145 | 	How can we reconstruct the path?
146 | 	Start with first hop, keep XORing with increasingly larger distances.
147 |     Trick 2: Integrity checking scheme to know when we've XORed the right IDs.
148 | 	Potential problem: attack may come from many sources.
149 | 	As a result, XORing with edge-ID of some distance may not be right.
150 | 	Approach: make IPs easy to verify, by bit-interleaving hash of IP.
151 | 	Can validate candidate IP addresses by checking their hash.
152 | 	Doesn't save us space (yet), only increases edge IDs to 64 bits.
153 |     Trick 3: Break up edge IDs into fragments (e.g., 8 bit chunks of 64 bits).
154 | 	Encoding in the IP header:
155 | 	    3-bit offset (which 8-bit chunk out of 64-bit edge ID).
156 | 	    5-bit distance (up to traceback-enabled 32 hops away).
157 | 	    8-bit data (i.e., a particular fragment of the 64-bit edge ID).
158 | 	How to reconstruct?
159 | 	    Know the right offset for each chunk, and the right distance.
160 | 	    Try all combinations of offsets for given distance to match hash.
161 | 	    Once we know IP address for one hop, move on to the next distance.
162 |     Trick 4: What happens if the fragment-ID field is in use?
163 | 	Drop fragmented packet with some prob., replace with entire edge info.
164 | 	Probability needed for fragmented packets is less: no matching needed.
165 | 
166 | How practical is the proposed IP traceback scheme?
167 |     What happens if not all routers implement this scheme?
168 |     How do we know when the traceback information stops being a legal suffix?
169 |     How expensive is it to reconstruct edges from fragments?
170 | 
171 | 


--------------------------------------------------------------------------------
/previous-years/l17-vanish.txt:
--------------------------------------------------------------------------------
  1 | Vanish
  2 | ======
  3 | 
  4 | Problem: sensitive data can be difficult to get rid of.
  5 |     Emails, shared documents, even files on a desktop computer.
  6 |     Adversary may get old data after they break in, gain access.
  7 |     Difficult to prevent certain kinds of "break ins": legal subpoenas, etc.
  8 |     Would like to have data become inaccessible after some period of time.
  9 | 
 10 | How serious of a problem is this?
 11 |     Seems like there are some interesting use cases that the paper discusses.
 12 |     Especially useful for ensuring email messages cannot be recovered later on.
 13 | 
 14 | Strawman 1: why not attach metadata with expiration date (e.g., email header)?
 15 |     Copies of data may be stored on servers: backups, logs (e.g., email).
 16 |     Even with no copies, data may be stored on broken machine: hard to erase.
 17 |     Adversary may be able to obtain sensitive data from those copies.
 18 |     Goal: do not require any explicit data deletion.
 19 | 
 20 | Strawman 2: why not encrypt email messages with recipient's public key?
 21 |     Adversary may steal the user's private key.
 22 |     Adversary may use a court order or subpoena to obtain private key.
 23 |     Goal: ensure data is inaccessible even if recipient's key compromised.
 24 | 
 25 | Strawman 3: why not use an online service specifically for this purpose?
 26 |     Simple service, in principle:
 27 | 	Encrypt messages with a specified expiration time.
 28 | 	Decrypt only ciphertexts whose expiration time is in the future.
 29 |     Service is trusted (if compromised, can recover old "expired" data).
 30 | 	Security services were targeted by law enforcement in the past.
 31 | 	E.g., Hushmail incident.
 32 |     Hard to deploy service specifically for an unknown new application.
 33 | 	Difficult to justify resources for services that's not used yet.
 34 |     Goal: no new services.
 35 | 
 36 | Strawman 4: why not use specialized hardware?
 37 |     Need a reliable source of time; TPM hardware does not provide one.
 38 |     In principle, smartcard could serve as distributed encrypt/decrypt service.
 39 |     If we can't use a standard TPM chip, difficult to deploy new hardware.
 40 |     Goal: no new hardware.
 41 | 
 42 | Vanish design, step 1: reduce problem to limiting lifetime of random keys.
 43 |     To create a vanishing data object (VDO), create fresh data encryption key K.
 44 |     Encrypt the real data with this key: C = E_K(D).
 45 |     Strawman VDO is now (C, K).
 46 |     Next, we will make sure key K vanishes at the right time..
 47 |     Why is this step useful?
 48 |     1. Need to worry about vanishing of a small, fixed-size object (key K).
 49 |     2. The key K itself doesn't leak any information about data.
 50 | 
 51 | Vanish design, step 2: store the secret key in a DHT.
 52 |     Quick aside on how DHTs work..
 53 |     Logical view:
 54 | 	Many machines (e.g., ~1M for Vuze DHT) talk to each other.
 55 | 	Store key-value pairs, where keys are 160-bit things called "indexes".
 56 | 	Storage is distributed across the nodes in the DHT.
 57 | 	(Thus, the name: distributed hash table.)
 58 |     API:
 59 | 	lookup(index) -> set of nodes
 60 | 	store(node, index, value) -> node stores the (index, value) entry
 61 | 	get(node, index) -> value, if stored at that node
 62 |     The tricky function is lookup (others are just talking to one node).
 63 | 	Vuze DHT works by constructing a single 160-bit address/name space.
 64 | 	160 bits works well, because it's large and fits a SHA-1 hash.
 65 | 	Nodes get 160-bit identifiers (SHA-1 hash of, e.g., node's public key).
 66 | 	Nodes are responsible for indexes near their own 160-bit ID.
 67 | 	    That is, lookup(index) returns nodes with IDs near index.
 68 | 	Nodes talk to other nodes with nearby ID values, to replicate data.
 69 | 	    (Also need to talk to a few nodes far away, for lookup to work).
 70 |     Intermediate step (not quite Vanish):
 71 | 	Choose random "access key" L.
 72 | 	Store data key K at index L in the DHT.
 73 | 	Strawman VDO is now (C, L).
 74 |     How to recover the VDO before it expires?
 75 | 	Straightforward: fetch key K from index L in the DHT.
 76 |     What causes data to vanish?
 77 | 	In the Vuze DHT, values expire after 8 hours (fixed timeout).
 78 | 	More generally, DHTs experience churn (nodes join and leave the DHT).
 79 | 	Once a node leaves DHT, it will re-join with a different ID.
 80 | 	Difficult to track down nodes that used to store some index in the past.
 81 |     Why does Vanish choose an "access key" L instead of using, say, H(C)?
 82 | 	Ensures that Vanish does not reduce security.
 83 | 	The only things revealed to the DHT are random values (e.g., L and K).
 84 | 	Not dependent on actual sensitive data (plaintext D or ciphertext C).
 85 | 
 86 | Vanish design, step 3: split up the key into multiple pieces, store the pieces.
 87 |     Why does Vanish do this?
 88 |     1. Individual nodes may go away prematurely, want reliability until timeout.
 89 |     2. Individual nodes can be malicious, can be subpoenaed, can be buggy..
 90 |     Problem shown in Figure 4 (with N=1).
 91 | 	Less than 100% availability before 8 hours.  Why?
 92 | 	More than 0% availability after 8 hours.  Why?
 93 | 
 94 | Secret sharing (by Adi Shamir).
 95 |     Given secret K, want to split it up into shares K_1, .., K_N.
 96 |     Given some threshold M of shares (<= N), should be able to reconstruct K.
 97 |     Construction: random polynomial of degree M-1, whose constant coeff is K.
 98 | 	Assume we can operate mod some large constant (e.g. 2^128 for AES keys).
 99 | 	Polynomial is f(x) = z_{M-1} x^{M-1} + .. + z_1 x^1 + K  (mod 2^128).
100 |     To generate N secret shares, compute f(1), f(2), .., f(N).
101 |     To reconstruct secret given M shares, solve polynomial and compute f(0).
102 |     With fewer than M shares, there is a unique solution for any f(0) value.
103 | 	This means adversary doesn't know what f(0)=K is, with <M shares.
104 | 
105 | Secret sharing in Vanish.
106 |     Choose two parameters (N and threshold) and split up K into K_1, .., K_N.
107 |     Generate N indexes (I_1, .., I_N) pseudo-randomly using L as the seed.
108 |     Store K_i at index I_i.
109 |     The VDO is now (C, L, N, threshold).
110 |     Recovery: re-generate I_i's from L, look up K_i shares, reconstruct K.
111 |     How to pick N and threshold?
112 | 	Larger N: better resistance to adversary stealing shares.
113 | 	Larger threshold: somewhat more difficult to compromise enough shares.
114 | 	Smaller threshold: more reliable (i.e., can get enough shares).
115 |     How precisely can Vanish force expiration after T = 8 hours?
116 | 	With Vuze, N=50 and 60-70% seems to work OK.
117 | 	Churn is a big factor in availability (i.e., timeout before T).
118 | 	Misconfigured nodes are big factor in security (i.e., timeout after T).
119 | 
120 | Is Vanish useful?
121 |     Only works with online access, but doesn't require much bandwidth.
122 |     Can we prefetch shares?  Yes, but that would defeat the point of Vanish..
123 |     Applications:
124 | 	1. FireVanish (plugin for Firefox/Gmail similar to GnuPG).
125 | 	2. Vanishing file application (not described in much detail).
126 |     Is this usable?
127 | 	Seems plausible -- builds on GnuPG plugin.
128 | 	Application (e.g., Gmail) cannot access underlying plaintext data.
129 | 	E.g., cannot search over VDO messages.
130 |     Performance?
131 | 	What's the dominant cost?
132 | 	For encapsulating VDO, dominant cost is storing key shares, O(N).
133 | 	Can optimize by pre-creating keys periodically.
134 | 	For decapsulating VDO, dominant cost is fetching key shares.
135 | 	    Still expensive: 2 seconds for N=20, 19 seconds for N=200.
136 | 	    Cannot optimize without prefetching and defeating the point.
137 | 
138 | Vanish attack vectors: mostly all about getting enough key shares.
139 |     Of course, need access to the VDO to make shares useful.
140 |     If adversary gets access to VDO before expiration, game over.
141 |     (Might help to encrypt VDO: stalling on revealing key will expire VDO.)
142 | 
143 | Who might mount an attack?
144 |     Legitimate recipients of VDO assumed to be trusted.
145 |     Email service provider might sniff on all VDOs.
146 | 	Can preemptively fetch key shares for all VDOs (using L).
147 | 	How to prevent?  Encrypt VDO with some encryption scheme.
148 | 	Even if outer key is revealed later, it's too late to use L.
149 |     ISP can monitor DHT requests to intercept stores of key shares.
150 | 	How to prevent?  Use Tor, maybe different exit node for each share.
151 |     Intermediate DHT nodes can watch lookup requests, save value just in case.
152 | 	How to prevent?  Lookup nearby indexes instead of real index.
153 |     Some other party that has no other knowledge about traffic or VDO.
154 | 	Need to blindly store all possible shares ahead of time.
155 | 
156 | How difficult is it to store all possible keys in a DHT?
157 |     Tricky question.
158 |     Paper claim: expensive (~$860K/yr) because Vuze DHT is large (1M nodes).
159 |     To capture a sizable fraction of all DHT values, need ~100K nodes.
160 | 	Vuze replicates all values to 20 nearby nodes, so no need for 1M.
161 |     Running DHT nodes seemed expensive: need CPU time, memory, bandwidth.
162 | 
163 | Shortly afterwards, two groups of researchers found problems with this analysis.
164 |     Turns out it's much cheaper to store all values from a DHT.
165 |     Observation: Vuze is quite aggressive at replicating data (to avoid loss).
166 |     Once node joins, nearby nodes send copies of data near its ID (~3 minutes).
167 | 	Need to join each node for 3 minutes out of every 8 hours!
168 | 	~100x reduction in number of concurrent DHT nodes that adversary runs.
169 |     Moreover, adversary can implement DHT nodes in C instead of Java.
170 | 	Much lower CPU, memory requirements.
171 |     Patch: change Vuze to not replicate Vanish data so aggressively.
172 | 	Requires specialized changes to Vuze, something that's against goal?
173 | 	Interesting follow-on work: Comet at OSDI2010.
174 |     Patch: split up secret among multiple DHTs (paper proposes Vuze + OpenDHT).
175 | 	Use secret-sharing to break up key into 2 pieces, one for each DHT.
176 | 	Then use secret-sharing again within each DHT.
177 | 
178 | 


--------------------------------------------------------------------------------
/previous-years/l07-xfi.txt:
--------------------------------------------------------------------------------
  1 | XFI
  2 | ===
  3 | 
  4 | questions on lab 2?
  5 | 
  6 | what's this paper trying to achieve?
  7 |     run legacy x86 binary code safely
  8 |     "safely" defined by P1..P7 in paper
  9 | 
 10 | what are the potential uses?
 11 |     plugins (device drivers, media codecs, browser plugins, ..)
 12 |     hardening existing applications (httpd from lab 1?)
 13 |     running untrusted code downloaded from some web site
 14 | 	what sorts of things would or would not XFI solve?
 15 |     "mobile code", as in last lecture
 16 | 	x86 instead of javascript in the browser?
 17 | 	app code in the kernel (video codecs, packet filters)
 18 | 
 19 | what do we want to prevent the untrusted module from doing?
 20 |     corrupting memory that doesn't belong to it
 21 |     read secret data that doesn't belong to it
 22 |     invoke system calls
 23 |     invoke other code that it shouldn't be able to call
 24 |     .. or have exploits like buffer overflow that lead to the above
 25 | 
 26 | what should the module be able to do?
 27 |     read/write its own memory
 28 |     execute its own code
 29 |     call certain approved external functions
 30 | 
 31 | why XFI?
 32 |     use hardware protection?
 33 | 	might be too expensive, or not available (inside kernel)
 34 |     use a high-level language?
 35 | 	not practical for legacy code; not practical in kernel w/o runtime
 36 |     use a restricted language?
 37 | 	works in some cases: packet filtering language for tcpdump (BPF)
 38 | 
 39 | so what's the plan?
 40 |     instrument calls, memory accesses, privileged instructions (eg syscall)
 41 |     make sure that all uses conform to our policy
 42 | 
 43 | what's CFI and why do we need it?
 44 |     direct: want to make sure the XFI module doesn't call arbitrary code
 45 | 	might not want the module calling system() or making syscalls
 46 |     indirect: need it to make sure we have memory access checks in all places
 47 |     indirect: need it to make sure malicious code doesn't jump past checks
 48 |     problem: x86 disassembly is tricky
 49 | 	25 CD 80 00 00    (AND %eax, $0x80cd)
 50 | 	jump to second byte (CD 80) to invoke linux syscall
 51 |     cannot verify every possible offset (likely false positives)
 52 |     CFI ensures reliable disassembly and thus reliable software guards
 53 | 
 54 | how does CFI work?
 55 |     plan:
 56 | 	disassemble the module linearly
 57 | 	ensure entry points are on legal instruction boundaries that we saw
 58 | 	ensure internal jumps go to legal instruction boundaries that we saw
 59 | 
 60 |     construct a call graph ahead of time (program analysis)
 61 | 	for each call site, figure out what might be called
 62 | 	sometimes hard to tell: calling a function pointer
 63 | 	conservative answer: any function whose address is ever computed
 64 | 
 65 |     ensure that each call goes to one of the possible call sites
 66 | 	simple for static cases
 67 | 	need to have a runtime check for computed jumps
 68 | 
 69 |     is this good enough?
 70 | 	should be able to prevent module from directly calling system
 71 | 	    (i.e. will enforce external callers as we wanted)
 72 | 	should make disassembly reliable
 73 | 	what if the call graph allows arbitrary calls within the module?
 74 | 	    might jump around in strange ways internally -- what could happen?
 75 | 	    preserves reliable disassembly and external functions called
 76 | 	    still need to ensure it doesn't bypass software guards we'll insert
 77 | 	    (soln: CFI only allows arcs to function start)
 78 | 
 79 |     how do they implement it?
 80 | 	assign each arc in the CFG a random ID
 81 | 	place this random ID at the start of a function (in a prefetchnta)
 82 | 	check the presence at the call site
 83 | 
 84 |     figure 2: impl sketch
 85 | 	what prevents the attacker from jumping directly to "call EBX"?
 86 | 	can an attacker synthesize a valid-looking target?
 87 | 	    avoid ID in the check instruction itself
 88 | 	    non-executable data
 89 | 	    what if attacker can load another module later?
 90 | 
 91 |     what about returns?
 92 | 	return addresses saved on a special stack
 93 | 
 94 | memory protection
 95 |     inline checks: figure 3
 96 | 	relies on CFI's nice property
 97 | 	not just jumps to instruction boundaries, but jumps to function start
 98 | 	prevents jumps to memory reference after the mrguard
 99 |     fastpath vs slowpath memory
100 | 	fastpath: contiguous range of memory for private use by module
101 | 	    can potentially have a different fastpath for each memory ref
102 | 	    if we can guess which range of memory it's likely to be
103 | 	slowpath: other regions that program may have access to
104 | 	    e.g. stack, code (read-only), arguments passed in from the outside
105 | 
106 | why do they need two stacks?
107 |     need to protect special values (return addr, frame ptrs) on the stack
108 |     but memory protection can only protect contiguous regions, not what's in it
109 |     so place all stack allocations accessed via pointers on one stack
110 | 	protected as a single memory range
111 |     static analysis protects individual values on the other "scoped" stack
112 | 	XFI keeps track of the use of each element (e.g. return value unchanged)
113 |     what happens when you grow the stack?
114 | 	ASP: in theory, mrguard should be enough
115 | 	    can we put it in fastpath memory?
116 | 	    probably not since stack is allocated at runtime
117 | 	SSP: mrguard would not allow (not accessible by pointer)
118 | 	    need a separate "stack bottom" check when scoped stack is grown
119 | 
120 | would XFI prevent exploits of buffer overflows in httpd?
121 |     simple buffer overflow, clobbering return address
122 |     corrupting a function pointer on the stack
123 | 	code injection
124 | 	return-to-libc
125 | 	what would the attacker need to do/know to successfully exploit?
126 |     corrupting a data structure containing the file to read/execute
127 | 
128 | how does XFI avoid privileged instructions (e.g. page table changing)?
129 |     verifier statically makes sure there are no such instructions
130 | 
131 | how does their verifier work?
132 |     figure 4
133 |     verification states
134 |     static checks for immediate memory references or immediate jumps
135 |     requires a proof for indirect/computed memory accesses or jumps
136 |     can do one memory check for an entire basic block (ie no branches)
137 |     verification states keep track of where the return address is on the SSP
138 | 	how does it keep track of where the next return address is?
139 |     verifier ensures that stack pointers are preserved across function call
140 | 	origASP=ASP, origSSP=SSP[+4] at return instruction
141 |     can an attacker do a jump to instruction 1 from elsewhere?
142 | 	that would be a static jump, needs no CFI runtime check, in-bounds
143 | 	bypasses mrguard
144 |     how does the verifier chain together basic blocks?
145 | 	compute all possible transitions between basic blocks (static + CFI)
146 | 	make sure verif. states at parent block imply verif. states at children
147 | 
148 | what happens on a fault?
149 |     presumably some existing error mechanism
150 |     hope the caller knows what to do with errors
151 |     maybe throw an exception?
152 | 
153 | what's all the stuff the rewriter has to do?
154 |     works on unmodified binaries without source code access
155 | 	requires access to some debug information, though
156 | 	needs to figure out how the stack is being used, etc
157 |     CFI: compute the control flow graph
158 | 	 insert labels and label checks at all jump targets and jump sites
159 |     memory: move stack allocations that are accessed by pointer to diff stack
160 | 	    generate any needed verification states
161 | 	    insert mrguard calls as necessary to make the verification go thru
162 | 
163 | how does the XFI module interface with the rest of the world?
164 |     stubs that set up stacks on incoming calls, copy args or set slowpath perms
165 | 	revoke perms on return
166 |     stubs that call out (external code doesn't have CFI labels)
167 | 
168 | what's in the final TCB?
169 |     verifier is trusted
170 |     rewriter not trusted
171 |     rest of the app code is trusted
172 | 	better not be trickable by clever invocations by malicious code!
173 | 	e.g. if there's a sort() function in libc that takes a function ptr
174 | 	almost like the "luring attack" from java
175 |     stubs going in/out of XFI module are trusted
176 | 	set up stacks, add/remove slowpath permissions to arg memory
177 |     error handlers are trusted
178 | 
179 | evaluating security/protection of XFI
180 |     prevents some buffer overflows, heap overflow
181 |     even prevents some data overwrites because of separate stacks
182 |     does not prevent everything (nimda exploited backdoors, "luring" code)
183 |     had to make some changes for windows drivers (avoid misrepresentation)
184 | 	doesn't seem that bad: just need to provide custom stubs?
185 | 
186 | what are the tricks to make things perform well?
187 |     as much static analysis as possible
188 | 	clump multiple mrguard's together using verification states if possible
189 |     large chunk of fastpath memory
190 | 	need not be fully allocated ahead of time; just virtual memory
191 |     copy arguments to fastpath memory rather than access the ptrs directly
192 |     special cases for stack growth
193 | 	SSP: stack bottom kept by windows in a convenient place
194 | 	ASP: use page address bits?
195 |     slowpath data structure impl
196 | 	very simple: linear array of start+end addrs
197 | 	why does it perform OK?  only a few different ranges
198 | 
199 | what's the performance of XFI?
200 |     seems OK
201 |     main factors: read-protection (expensive!), arg passing (expensive!)
202 | 
203 | is XFI too strict for some code?
204 |     JITs that generate x86
205 |     might not perform well for code accessing many shared data structures
206 | 
207 | what doesn't XFI prevent?
208 |     drivers: DMA attacks
209 |     denial of service, liveness failures
210 |     exploit unsafe assumptions that callers make (or other called functions)
211 | 	for windows drivers, had to ensure driver "identity" didn't change
212 | 
213 | where could you use XFI?
214 |     OKWS?
215 |     would you use it to confine Java code?
216 | 	might look like a capability design in terms of functions it can invoke
217 | 	hard to pass around complex Java objects across prot. domains
218 | 
219 | 


--------------------------------------------------------------------------------
/previous-years/l11-spins.html:
--------------------------------------------------------------------------------
 1 | 
 2 | <!-- saved from url=(0054)http://css.csail.mit.edu/6.858/2009/lec/l11-spins.html -->
 3 | <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><style type="text/css"></style></head><body dir="ltr" style="max-width:8.5in" class=" hasGoogleVoiceExt">
 4 | 
 5 | <h1>Wireless Sensor Networks (notes by Marten van Dijk)</h1>
 6 | 
 7 | <b>Read:</b> A. Perrig, R. Szewczyk, J.D. Tygar, V. Wen, and D.E. Culler, "SPINS: Security Protocols for Sensor Networks", Wireless Networks 8, 521-534, 2002.
 8 | 
 9 | <p>
10 | <b>Model (assumptions, security requirements, possible threats):</b>
11 | 
12 | </p><p>
13 | What is a sensor network? Thousands to millions of small sensors form self-organizing wireless networks. Sensors have limited processing power, storage, bandwidth, and energy (this gives low production costs). For example, use TinyOS, a small, event-driven OS, see Table 1. Serious security and privacy questions arise if third parties can read or tamper with sensor data.
14 | 
15 | </p><p>
16 | Examples: emergency response information, energy management, medical monitoring, logistics and inventory management, battlefield management.
17 | 
18 | </p><p>
19 | What are the differences between wireless sensor networks (WSN) and mobile ad hoc networks (MANET)? The number of sensor nodes in a WSN can be several orders of magnitude larger than the nodes in a MANET. Sensor nodes are densely deployed. Sensor nodes are prone to failures. The topology of a WSN changes very frequently. Sensor nodes mainly use a broadcast communication paradigm, whereas most MANETs are based on point-to-point communication. Sensor nodes are limited in processing power, storage, bandwidth, and energy.
20 | 
21 | </p><p>
22 | What are the components of a sensor node? Sensing unit with a sensor and analog-to-digital converter (ADC). Processor with storage. Transceiver. Power unit.
23 | 
24 | </p><p>
25 | What are the capabilities of a base station? More battery power, sufficient memory, means for communicating with outside networks.
26 | 
27 | </p><p>
28 | What are the trust assumptions? Individual sensors are untrusted. There is a known upper bound on the fraction of all sensors that are compromised. Communication infrastructure is untrusted (except that messages are delivered to the destination with non-negligible probability).  Sensor nodes trust their base station. Each node trusts itself.
29 | 
30 | </p><p>
31 | What is the protocol stack? Physical layer: simple but robust modulation, transmission, and receiving techniques; responsible for frequency selection, carrier frequency generation, signal detection, modulation. Data link layer: medium access control (MAC) protocol must be power-aware and able to minimize collision with neighbors' broadcasts, MAC protocol in a wireless multi-hop self-organizing network creates the network infrastructure (topology changes due to node mobility and failure, periodic transmission of beacons allows nodes to create a routing topology) and efficiently shares communication resources between sensor nodes (both fixed allocation and random access versions have been proposed), data link layer also implements error control and data encryption + security. Network layer: routing the data supplied by the transport layer, provide internetworking with external networks, design principles are power efficiency, data aggregation useful only when it does not hinder the collaborative effort of the sensor nodes, attribute-based addressing and location awareness. Transport layer: helps to maintain the flow of data if the application requires it, especially needed when the system is planned to be accessed through the Internet or other external networks. Application layer: largely unexplored.
32 | 
33 | </p><p>
34 | What are performance metrics? Fault tolerance or reliability: is the ability to sustain sensor network functionalities without interruption due to sensor node failures (non-adversarial such as lack of power, physical damage, environmental interference), it is modeled as a Poisson distribution e^{-lambda*t} to capture the probability of not having a failure within the time interval (0,t). Scalability: ability to support larger networks, flexible against increase in the size of the network even after deployment, ability to utilize more dense networks (density gives the number of nodes within the transmission radius of each node; it equals N*pi*R^2/A, where N is the number of scattered sensor nodes in region A, and R is the radio transmission range). Efficiency: storage complexity (amount of memory required to store certificates, credentials, keys), processing complexity (amount of processor cycles required by security primitives and protocols), communication complexity (overhead in number and size of messages exchanged in order to provide security). Network connectivity: probability that two neighboring sensors are able to share a key (enough key connectivity is required in order to provide intended functionality). Network resilience: resistance against node capture; for each c and s, what is the probability that c compromised sensors can break s links (by reconstructing the corresponding shared secret keys)?
35 | 
36 | </p><p>
37 | What are the security requirements? Availability: ensure that service offered by the whole WSN, by any part of it, or by a single node must be available whenever required. Degradation of security services: ability to change security level as resource availability changes. Survivability: ability to provide a minimum level of service in the presence of power loss, failures, or attacks (need to thwart denial of service attacks). 
38 | 
39 | </p><p>
40 | Authentication: authenticate other nodes, cluster heads, and base stations before granting a limited resource, or revealing information. Integrity: ensure that the message or entity under consideration is not altered (data integrity is achieved by data authentication). Freshness: ensure that each message is fresh, most recent (detect replay attacks). 
41 | 
42 | </p><p>
43 | Confidentiality: providing privacy of the wireless communication channels (prevent information leakage by eavesdropping or covert channels), need semantic security, which ensures that an eavesdropper has no information about the plaintext, even if it sees multiple encryptions of the same plaintext (e.g., concatenate plaintext with a random bit string, this however requires sending more data and costs more energy). Non-repudiation: preventing malicious nodes from hiding their activities (e.g., they cannot refute the validity of a statement they signed). 
44 | 
45 | </p><p>
46 | <b>Solutions (SNEP, micro TESLA, Key Distribution):</b>
47 | 
48 | </p><p>
49 | What are the limitations in designing security? Security needs to limit the consumption of processing power. Limited power supply limits the lifetime of keys. Working memory cannot hold the variables for asymmetric cryptographic algorithms such as RSA. High overhead to create and verify signatures. Need to limit communication.
50 | 
51 | </p><p>
52 | SNEP: A and B share a master key, which they use to derive an encryption keys K_AB and K_BA, and MAC keys K'_AB and K'_BA. A and B synchronize counter values C_A=C_B. Communication from A to B: {Data}_[K_AB,C_A] = Data XOR E_{K_AB}(C_A) together with MAC_{K'_AB}({Data}_[K_AB,C_A]||C_A), see Formula (1). The MAC computation is pictured in Figure 3 using CBC mode. This gives semantic security, data authentication, weak freshness (if the message verifies correctly, a receiver knows that the message must have been sent after the previous message it (the receiver) received correctly), low communication overhead (the counter value is not sent).
53 | 
54 | </p><p>
55 | Strong freshness: see Formula (2), if B request a message from A, then B transmits to A a nonce and A includes this nonce in the MAC of its communication to B. If the MAC verifies correctly, B knows that A generated the response after B sent the request.
56 | 
57 | </p><p>
58 | Synchronize counter values: see Section 5.2 for a simple bootstrapping protocol, at any time the above protocol with strong freshness can be used to request the current counter value. To prevent denial of service attacks, allow transmitting the counter with each encrypted message in the above protocols, or attach another short MAC to the message that does not depend on the counter.
59 | 
60 | </p><p>
61 | micro TESLA: authenticated broadcast requires an asymmetric mechanism, otherwise any compromised receiver could forge messages from the sender. How can this be done without asymmetric crypto? Introduce asymmetry through delayed disclosure of symmetric keys. Idea: base station uses MAC_K with a key unknown to sensor nodes, K is a key of a key chain (K_i = F(K_{i+1}), where F is a one-way function) through which it is committed to the base station (in a key chain, keys are self-authenticating), the key chain is revealed through delayed disclosure by the base station. The key disclosure time delay is on the order of a few time intervals and greater than any reasonable round trip time. Receiver node knows the key disclosure time. Each receiver node needs to have one authentic key of the one-way key chain as a commitment to the entire chain. Sender base station and receiver nodes are loosely time synchronized. Simple bootstrapping protocol using shared secret MAC keys, see Section 5.5.
62 | 
63 | </p><p>
64 | Nodes cannot store the keys of a key chain: node may broadcast data through the base station, or uses the base station to outsource key chain management.
65 | 
66 | </p><p>
67 | Key setup: master key shared by the base station and node. How do we do key distribution? There has been a lot of research providing solutions that have good resilience, connectivity, and scalability. Controversial solution: Key infection; bootstrapping does not need to be secure, it is about security maintenance in a stationary network. Idea: transmit symmetric keys in the clear and use secrecy amplification (and other mechanisms). In secrecy amplification two nodes A and B use a third neighboring node C to set up communication between A and B. This communication channel is protected by keys K_{A,C} and K_{C,B}. It is used to exchange a nonce N. A and B replace their key K_{A,B} by H(K_{A,B}||N) and verify whether they can use this new key. If K_{A,B} is know to an adversary, but keys K_{A,C} and K_{C,B} are not, then the adversary cannot extract the new K_{A,B}! This solution has been proposed for the battlefield management application.
68 | 
69 | </p><p>
70 | Related topics: RFID tags, social networks, TinyDB.
71 | 
72 | 
73 | 
74 | </p></body></html>


--------------------------------------------------------------------------------
/previous-years/l08-browser-security.txt:
--------------------------------------------------------------------------------
  1 | Browser Security (guest lecture by Ramesh Chandra)
  2 | ==================================================
  3 | 
  4 | web app security
  5 |     server and client -- we'll mostly focus on client
  6 | 
  7 | web apps: past vs. present
  8 |     past: mainly static content, simpler security model
  9 |           user interactions resulted in round-trips to server
 10 |     present: highly dynamic content with client-side code
 11 |           advantages: responsiveness, better functionality
 12 |           more complex security model
 13 |           
 14 | threat model / assumptions
 15 |     attacker controls his/her own web site, attacker.com (sounds reasonable)
 16 |     attacker's web site is loaded in your browser (why is this reasonable?)
 17 |     attacker cannot intercept/inject packets into the network
 18 |     browser/server doesn't have buffer overflows
 19 |     
 20 | security policy / goals
 21 |     1: isolation of code from different sites
 22 |         javascript code runs in your browser, has access to lots of things
 23 |         need to have some way of isolating code from different sites
 24 |         attacker should not be able to get your bank balance, xfer money, ..
 25 | 
 26 |     2: UI security -- user needs to know what site they're talking to
 27 |         phishing attacks are usually the biggest problem in this space
 28 |         without isolation of code from diff. sites, UI security is hopeless
 29 |         how do you know you're interacting with your bank vs. an attacker?
 30 |         (if security can avoid depending on this question, all the better!)
 31 | 
 32 |     we'll largely focus on the first (isolation of code) for now
 33 |           
 34 | how does javascript fit into the web model?
 35 |     HTML elements
 36 |     script tags; inline and src=
 37 |     built-in objects like window, document, etc
 38 |     DOM
 39 |     HTML elements can invoke JS code: onClick, onLoad, ..
 40 |     single-threaded execution; event-driven programming style for network IO
 41 |     frames for composing/structuring
 42 | 
 43 | browser security model
 44 |     principal: domain of the web content's URL
 45 |         http://a.com/b.html and http://a.com/c.html are the same principal
 46 |         
 47 |     protected resource: frame
 48 |         principal is the domain of frame's location URL
 49 |         all code in the frame runs as that principal
 50 |         doesn't matter where the code came from (e.g. script src=...)
 51 |         analogous to a process in Unix
 52 |         
 53 |     protection mechanisms:
 54 |         javascript references as capabilities
 55 |             may not be able to get references to other windows/frames
 56 |             but there are many objects with global names
 57 |         access control: same origin policy
 58 |     privileged functions implement their own protection
 59 |         e.g. postMessage, XMLHttpRequest, window.open()
 60 |     
 61 |     same-origin policy (SOP)
 62 |     intuition/goal: only code from origin X can manipulate resources from X
 63 |         frame A can poke at frame B's content only if they have the same principal
 64 |             why does the browser allow any cross-frame access at all?
 65 |                 frames used for layout in addition to protection
 66 |     unfortunately, quite vague, and overly restrictive; shows in practice
 67 |         exceptions to get around restrictions:
 68 |             script, image, css src tags: why are these needed?
 69 |             frame navigation
 70 | 
 71 | 
 72 | frame navigation
 73 |     problem: navigating a frame is a special operation not governed by SOP
 74 |     subject to other access control rules, which this paper talks about
 75 |     why does the browser allow this in the first place?
 76 |         might have navigation links in one frame, other sites in another
 77 | 
 78 |     what goes wrong if attacker.com can navigate another frame?
 79 |     can substitute a phishing page for the login frame of another site (eg. bank)
 80 |     why doesn't the SSL icon "go away"?  rule: all pages came via SSL
 81 |     reasoning: original site included the other origin explicitly?
 82 |     how does the attacker get a handle on that sub-frame?
 83 |         global name space of frame/window names
 84 |             more difficult in current browser -- firefox has per-frame name space
 85 |             of frame names
 86 | 
 87 | what's their proposed fix?
 88 |     window policy: can only navigate frames in the same window
 89 |     can still mount the attack on another site if you open it within a window
 90 |     why is this still OK?  no correct answer; mostly because of the URL bar
 91 | 
 92 | mash-ups
 93 |     idea: combine data from multiple sites/origins
 94 |     eg: iGoogle combines frames from many developers in the same page
 95 |     terminology: the whole site is a "mashup"
 96 |     iGoogle is an "integrator"
 97 |     all the little boxes that are included in the page are "gadgets"
 98 |     what are the problems that we run into?
 99 |     one site's code in one frame can navigate another site's frame
100 |     window policy is of no help
101 |     why does it matter? UI for login, again
102 | 
103 | better policy: descendant/child policy
104 |     why do they argue the descendant policy is just as good as child?
105 |     in theory, parent can cover up any descendant with a floating object
106 |     when is child a better choice?
107 |     later examples where site wants to know it's talking to the right child
108 |     i.e. cases when the worry isn't the UI issues
109 |     origin propagation:
110 |     what's the reasoning for this?
111 |     would this occur in real sites?  frames used for side-by-side structure
112 | 
113 | cross-origin frame communication
114 |     when would you need it?  mashups where origins interact
115 |     why do origins need to interact on the client? can we push interactions to
116 |     server-side?
117 |         cleaner design => easier to implement
118 |         avoid extra round trips => more responsive app
119 |         better integration => better user experience
120 |     nice example: yelp wants to use google maps
121 |     mutually distrustful (in theory, at least)
122 |     alternative 1: map in another frame (open it to some location), no feedback
123 |     alternative 2: map in the same frame (script src=), no protection
124 |         yelp does this today
125 |     alternative 3: map in one frame, yelp in another frame, communication btwn
126 | 
127 |     threat model: in addition to threat model described above, we assume:
128 |         attacker's gadget can load honest gadget in a subframe
129 |         attacker's gadget can communicate with integrator and honest gadget
130 | 
131 |     goal: secure, reliable communication between origins
132 | 
133 |     how does frame communication work?
134 |     plan 1: exploiting a covert channel!  (fragment channel)
135 |         problem: no authentication (where did a message come from?)
136 |         workaround: treat as a network, run authentication protocol
137 |         all 3 impls these guys looked at had the same bug
138 | 
139 |         protocol: nonces, include sender's ID (rcpt doesn't know sender)
140 | 	idea: each side generates a nonce, gives it to the other side
141 | 	if someone gives you a message w/ nonce, it came from other side
142 | 
143 |         what's the possible attack?
144 | 	attacker can impersonate integrator when talking to gadget
145 | 
146 |         why does it matter?  gadget might have policies for diff. sites
147 | 	OK to add your contacts list gadget into facebook, access it
148 | 	not OK to access your contact list gadget by other sites
149 | 
150 |         how does the attack work?
151 | 	relay initial message to the gadget
152 | 	gadget replies back to the integrator
153 | 	integrator sends gadget's nonce to attacker,
154 | 	    to prove it's the integrator sending the msg
155 | 	now the attacker has both nonces, can impersonate in both dir'n
156 | 	might not be able to intercept msgs from gadget, though
157 | 	    they're sent directly to integrator's URI
158 |         fix is well-known: include URI (name) in second response too
159 | 
160 |     plan 2: browser developers designed a special mechanism for it
161 |         frame.postMessage("hello")
162 |         paper claims this provides authentication but not privacy; how come?
163 | 	frame can re-navigate without sender's knowledge
164 |         how can this happen?
165 | 	sender was itself in a sub-frame of attacker's site
166 | 	descendant policy allows attacker to access all sub*-frames
167 |         why didn't the fragment channel have this problem?
168 | 	tight binding between message and recipient (url#msg)
169 |         solution: make the binding explicit
170 | 
171 | protected resource: cookie
172 |     how does HTTP authentication work?
173 |     browser keeps track of a session "cookie" -- arbitrary blob from server
174 |     sends the cookie along with every request to that server
175 |     cookie often includes username and authentication proof
176 |     inside browser, same-origin policy protects cookies like frames
177 |     cookie stored in document.cookie
178 |     can only access cookies for your own origin
179 | 
180 |     possible attack: generate requests to xfer money from attacker.com
181 |     <img src="http://bank.com/xfer?amt=1000&rcpt=XYZ">
182 | 
183 |     solution: spaghetti-rules
184 |     hard to prevent GET requests, so allow those (e.g. img tags)
185 |     protect from malicious ops: include some non-cookie token in the request
186 |     protect bank account balance: only see responses from the same origin
187 |     except that's not quite true either
188 |         script src= tags run code
189 |         style src= tags load CSS style-sheets, also visible
190 |     so, to protect sensitive data, make sure it doesn't parse as JS or CSS?
191 | 
192 | another mechanism to secure mashups: safe subset of javascript
193 |     eg: FBJS, ADSafe, Caja
194 |     Facebook javascript (FBJS): compiles gadget down to a safe subset of JS
195 |         per gadget name space
196 |         accesses to global name space through secure wrappers
197 |         intercepts all events and proxies AJAX requests thru FB
198 |         gadget is embedded into FB and needs to trust FB
199 |   
200 | takeaways
201 |     web security lacks unifying set of principles
202 |         policies such as SOP have many exceptions
203 |         different browsers / runtimes (e.g. Flash) implement different policies
204 |         confusing to web developers
205 |     supporting existing web sites makes deploying fundamental fixes difficult
206 |     lesson: think about security early on in the design
207 | 
208 | 


--------------------------------------------------------------------------------
/l08-my-web-security.md:
--------------------------------------------------------------------------------
  1 | Web security
  2 | ============
  3 | 
  4 | Web security for a long time meant looking at what the server was doing, since the client-side was very simple. On the server, CGI scripts were executed and they interfaced with DBs, etc.
  5 | 
  6 | These days, browsers are very complicated:
  7 | 
  8 |  * JavaScript: pages execute client-side code
  9 |  * The Document Object Model (DOM) 
 10 |  * XMLHttpRequests: a way for JavaScript client-side code to fetch content from the web-server asynchronously
 11 |     - a.k.a AJAX
 12 |  * Web Sockets
 13 |  * Multimedia support (the `<video>` tag)
 14 |  * Geolocation (webpages can determine physically where you are)
 15 |  * Native Client, for Google Chrome
 16 | 
 17 | For web-security, this means we're screwed: huge attack surface (See Figure 1)
 18 | 
 19 |     likelihood
 20 |     of correct
 21 |     ness
 22 |     ^
 23 |     |--\
 24 |     |   --\                --- we are here
 25 |     |      --\            /
 26 |     |         \          /
 27 |     |          \      <--
 28 |     |           -----*----
 29 |     |----------------------->
 30 |       # of features
 31 | 
 32 | Problems of composition: many layers
 33 | 
 34 | One problem with the web is the _parsing contexts_ problem
 35 |     
 36 |     <script>var = "UNTRUSTED CONTENT FROM USER";</script>
 37 | 
 38 | If the _untrusted_ content had a quote in it, perhaps the attacker could modify the code into:
 39 | 
 40 |     <script>var = "UNTRUSTED CONTENT"</script> 
 41 |     <script> /* bad stuff from attacker here */ </script>
 42 | 
 43 | Web specifications are long, tedios, boring, inconsistent, the size of the EU consistution (CSS, HTML) => they are vague aspirational documents that are never implemented.
 44 | 
 45 | This lecture we'll focus on client-side web-security.
 46 | 
 47 | Desktop applications come from a single principal (Microsoft, Google, etc)
 48 | Web applications come from a bunch of principals.
 49 | 
 50 | `http://foo.com/index.html` (see Figure 2)
 51 | 
 52 |  * Can analytics code access the facebook frame content?
 53 |  * Can analytics code interact with the text inputs? Can it declare event handlers?
 54 |  * What's the relationship beteen the Facebook frame (https) and the foo.com frame (http)?
 55 | 
 56 | To answer these questions browsers use a security model called the _same origin policy_
 57 | 
 58 | *Goal:* Two websites should not be able to tamper with each other, unless they want to.
 59 | 
 60 | Defining what _tampering_ means has gotten more complicated since the web first started.
 61 | 
 62 | *Strategy:* Each _resource_ is assigned an origin. JS code (a resource itself) can only access resources from its own origin.
 63 | 
 64 | What is an origin? An origin is a network protocol scheme + hostname + port. 
 65 | Example: 
 66 | 
 67 |  * `https://facebook.com:8181`
 68 |  * `http://foo.com/index.html`, implicit port 80
 69 |  * `https://foo.com/index.html`, implicit port 443
 70 | 
 71 | Loosely speaking, you can think of an origin as an UID in UNIX, with a frame being a _process_.
 72 | 
 73 | Four ideas in implementation of origins:
 74 | 
 75 |  1. Each origin has client side resources
 76 |      * Cookies, to implement state across different HTTP requests
 77 |      * DOM storage, a fairly new interface, a key-value store
 78 |      * A JavaScript namespace, defines what functions and interface are available to the origin (like the String class)
 79 |      * The DOM tree: a JavaScript reflection of the HTML in a page
 80 | 
 81 |                   [  HTML ]     
 82 |                   /       \     
 83 |             [ HEAD ]     [ BODY ]
 84 | 
 85 |      * A visual display area
 86 |  2. Each frame gets the origin of its URL 
 87 |  3. Scripts execute with the authority of their frame origin
 88 |  4. Passive content (images, CSS files) gets **zero** authority from the browser
 89 |     * Content sniffing attacks
 90 | 
 91 | Going back to our example: 
 92 | 
 93 |  * Google analytics and jQuery can do all sorts of stuff on the foo.com frame
 94 |  * The Facebook frame's inline JS cannot do anything to the foo.com frame
 95 |    - but it can talk to the foo.com frame using the `postMessage()` API
 96 |  * The JS code in the FB frame cannot issue an AJAX request to the foo.com webserver
 97 | 
 98 | MIME types: text/html. All version of IE in the past would look at the first 256 bytes of an object and ignore the `Content-Type` header. As a result, IE would misinterpret the type of files (due to bugs). Attacker can put JS code in a .jpg file. IE coerces it into text/html and then executes the JS code in the page.
 99 | 
100 | Frames and window objects
101 | -------------------------
102 | Frames represent these sort of separate JS universes
103 | 
104 | A frame, w.r.t. to JS is an instance of a DOM node. Frames and window objects in JS point to each other. The window object acts like a namespace via which you can access any variable `x`.
105 | 
106 | Frames get the origin of the frame's URL `OR` a suffix of the original domain name.
107 | 
108 | `x.y.z.com` can say "I want to set my origin to" `y.z.com` by assigning `document.domain` to `y.z.com`. This only works (or should) with suffixes of `x.y.z.com`. So it cannot do `document.domain = a.y.z.com`. Also, cannot set `document.domain = .com` because the site would be able to impact cookies in any .com website.
109 | 
110 | Browsers distinguish between frames that assigned a value to document.domain and frames that did not.
111 | 
112 | Two frames can access each if:
113 | 
114 |  1. Both frames set `document.domain` to the same value
115 |  2. Neither of the frames has changed `document.domain` and both values match
116 | 
117 | You have `x.y.z.com` (buggy or evil) trying to attack `y.z.com`, by shortening its domain. The browser will not allow this because y.z.com will have NOT changed its document.domain while x.y.z.com has.
118 | 
119 | DOM nodes
120 | ---------
121 | 
122 | Cookies
123 | -------
124 | Cookies have a _domain_ and a _path_.
125 | 
126 |     *.mit.edu/6.858
127 | 
128 | If path is `/` then all paths in the domain have access to the cookie.
129 | 
130 | On the client side there's `document.cookie`.
131 | 
132 | Cookies have a `secure flag` which means HTTP content should not be able to access that cookie.
133 | 
134 | When the browser generates a request, it's going to include all the matching cookies in that request (ambient authority).
135 | 
136 | How can different frames access other frames' cookies? If other frames can write cookies for other frames, then an attacker could log the victim into the attacker's gmail account and possibly read emails sent by the user.
137 | 
138 | Should `foo.co.uk` be allowed to set a cookie for `co.uk`? https://publicsuffix.org contains a list of all the top-level domains so that browsers do not allow cooking setting for domains like `co.uk`.
139 | 
140 | XMLHttpRequest
141 | --------------
142 | By default JS can only generate an AJAX request if it's going to its origin.
143 | 
144 | There's a new paradigm called Cross Origin Request S. (CORS), where the server can use an ACL to allow other domains to access it. Server returns a header `Access-Control-Allow-Origin: foo.com` to indicate foo.com is allowed.
145 | 
146 | Images, CSS
147 | ------
148 | A frame can load images from any origin it desires but it cannot actually inspect the bits. But it can infer the size of the image via the placement of other nodes in the DOM. 
149 | 
150 | Same for CSS.
151 | 
152 | JavaScript
153 | ----------
154 | If you do a cross-origin fetch of JS, that is allowed, but the frame cannot look at the source code. But the JS architecture kind of lets you because you can call the `toString` method on any public function `f`. The frame can also ask the web-server to fetch the JS for it and send it.
155 | 
156 | JS code is often obfuscated.
157 | 
158 | Plugins
159 | -------
160 | Java, Flash.
161 | 
162 | A frame can run a plugin from any origin. HTML5 might make them obsolete.
163 | 
164 | Cross Site Request Forgery (CSRF)
165 | ---------------------------------
166 | An attacker can setup a page and embed a frame with the following source in it:
167 |     
168 |     http://bank.com/xfer?amount=500&to=attacker
169 | 
170 | The frame is set to be of size zero (invisible), Then the attacker gets the user to visit the page. Thus, he can steal money from the user.
171 | 
172 | This is because the URL can be guessed and is not random.
173 | 
174 | Solution: add some randomness to the URL.
175 | 
176 | The server can generate a random token and embed it in the "Transfer Money" page sent to the user.
177 | 
178 |     <form action="/transfer.cgi" ...>
179 |         <input type="hiddne" name="csrf" value="a72fedb2129985bdc">
180 | 
181 | Now the attacker has to guess the token.
182 | 
183 | Network addresses
184 | -----------------
185 | A frame can send HTTP and HTTPS requests to a host that matches its origin. The security of the same origin policy is tied to DNS security. Because origin names are DNS names, DNS rebinding attacks can work against you.
186 | 
187 | Goal: Run attacker controlled JS with the authority of some victim website `victim.com`
188 | 
189 | Approach:  
190 | 
191 |   1. Register a domain name `attacker.com`
192 |   2. Attacker sets up a DNS server to respond to requests for `*.attacker.com`
193 |   3. Attacker gets user to visit `*.attacker.com`
194 |   4. Browser generates a DNS request to `attacker.com`
195 |   5. Attacker response has a small time-to-live (TTL)
196 |   6. Meanwhile, the attacker configures the DNS server to bind `attacker.com` name to `victim.com`'s IP address
197 |   7. Now if the user asks for a DNS resolution on attacker.com, he gets an address of victim.com
198 |   8. The loaded attacker.com website wants to fetch a new object via AJAX. This request will now go to victim.com
199 |     * Bad because attacker.com website just issued an AJAX request outside its origin.
200 | 
201 | How can you fix this? 
202 | 
203 |  * Modify your DNS resolver to check that outside domains are not resolved to internal addresses.
204 |  * Enforce TTL to be 30 minutes
205 | 
206 | Pixels
207 | ------
208 | Each frame gets its own bounding box and can draw wherever it wants there. Specifically, a parent frame can draw over a child frame (see Figure 3).
209 | 
210 | Solutions:  
211 |  1. Use frame busting code (JS to figure out if you've been put in a frame by someone else)
212 | 
213 |   ```
214 |   if (self != top)  
215 |       alert("I'm a child frame, so won't load")  
216 |   ```  
217 |  2. Web server can send an HTTP response header called `X-Frame-Options` which tells the browser to not allow anyone to put its content into a frame.
218 | 
219 | Naming issues
220 | -------------
221 | `c` in ASCII versus `c` in Cyrillic allows attacker to register a `cats.com` domain that immitates the real `cats.com`
222 | 
223 | Plugins
224 | -------
225 | Subtle incompatibilites with the rest of the browser.
226 | 
227 | Java assumes different hostnames with the same IP address have the same origin (deviation from the SOP policy)
228 | 
229 | x.y.com will be in the same origin as z.y.com if they share the same IP address.
230 | 
231 | HTML5 screen sharing
232 | --------------------
233 | If you have a page that have multiple frames, a frame can take a screenshot of the entire browser.
234 | 
235 | 


--------------------------------------------------------------------------------
/previous-years/l22-usability.txt:
--------------------------------------------------------------------------------
  1 | Usable security
  2 | ===============
  3 | 
  4 | What problem does this paper try to address?
  5 | 
  6 | Is the problem of usability in security real, important?
  7 |   Concrete examples of things that go wrong?
  8 | 
  9 | Is usable security the same as general usability of your system?
 10 |   Not quite: some differences.
 11 |     General usability problems only rarely lead to disaster.
 12 |     Security usability problems often result in disaster, without user knowing.
 13 |   Security is a secondary task: user wants to do something other than security.
 14 |   Negative goal / weakest link: must consider entire system.
 15 |   Abstract, hard to reason about; little feedback: security often not tangible.
 16 |     Hard to know if you're secure or not.
 17 |     Easy to know if you got your work done.
 18 |   Users don't fully understand threats, mechanisms they're using.
 19 | 
 20 | Why do we need users in the loop?
 21 |   Good reasons: users should be ultimately in control of their security.
 22 |   Bad reasons: programmers didn't know what to do, so they asked the user.
 23 |   Backwards compatibility.
 24 | 
 25 | Good principle: safe defaults.
 26 | 
 27 | Good principle: avoid asking users to make security decisions when possible.
 28 |   Rely on safe defaults.
 29 | 
 30 | Example from the paper: security in wireless pairing.
 31 |   What's the problem?  Why does the user have to be involved?
 32 |     MITM attacks.
 33 |     Adversary assumed to control the wireless channel, can't trust messages.
 34 |     Need to verify authenticity in a way that adversary can't tamper with.
 35 |   What happens today?  E.g., pairing Bluetooth devices?
 36 |     Many different schemes.
 37 |     User can configure certificates (described in paper).
 38 |     User can enter PIN on both devices.
 39 |       Does it matter that the PIN is short (4 digits)?
 40 |       If chosen by user, might not be random enough.
 41 |       Maybe better to have one device choose the PIN.
 42 |     User can verify hashes of keys.
 43 |       "Yes" vs "Choose the correct hash" vs "Enter the hash from other device"?
 44 |     User can use some out-of-band channel: IR, audio, accelerometer, camera, ..
 45 |   What's wrong with these schemes?
 46 |   What do the authors propose?
 47 |     Users go into a physically isolated room to perform pairing.
 48 |   Any other ideas?  TEP?
 49 | 
 50 | Good principle: use active warnings/questions, not passive.
 51 |   What does active/passive mean?
 52 |     Active: user forced to make choice, perhaps taken out of previous workflow.
 53 |       Forces user to think about the choice, consider what it means.
 54 |     Passive: user asked whether they want to accept something and continue.
 55 |       Most users just want to continue, so they'll look for easiest way there.
 56 |       "OK" or "Cancel" buttons on a security dialog are likely a bad design.
 57 |   Example: "Does the hash match?" vs "Please choose the right hash".
 58 | 
 59 | Usability of SSL certificates.
 60 |   What steps do you have to go through in order to set up SSL certs w/ MIT?
 61 |     Installing a server CA: download the MIT CA certificate.
 62 |       User should do so securely: downloading via http is a bad idea.
 63 |       No indication whether you have done this securely.
 64 |     Specifying flags for server CA.
 65 |       Trust CA to identify web sites, email users, or software developers?
 66 |       No indication which are appropriate.
 67 |     Obtaining a client certificate.
 68 |       Enter Kerberos principal, password, MIT ID.
 69 |       Choose key size, certificate lifetime in days.  Why?
 70 |   Using SSL certificates with https://student.mit.edu?
 71 |     Browser prompts user when the client certificate is used.
 72 |   Using SSL certificates with https://libraries.mit.edu/ "Your account"?
 73 |     Instead of using certificates directly, web site uses some higher-level
 74 |       identity provider (similar to what we talked about in identity lecture).
 75 |     Intermediate site asks what account provider to use.
 76 |     User chooses "MIT Kerberos account (or MIT web certificate)".
 77 |     Next page: enter Kerberos user/password, or click "Use Certificate".
 78 |       [ User must know what origin is OK for entering password? ]
 79 |     Finally, browser prompts user to select the client cert to use.
 80 | 
 81 | Regardless of whether user needs client certs, how do they know they're secure?
 82 |   Say, still visiting WebSIS: when is it "secure"?
 83 |     Depends on the definition of "secure".
 84 |     One definition: browser got the page from the right server,
 85 |       where right has to do with what the user believes.
 86 |     Matters when typing some sensitive info to a page (e.g., password).
 87 |     Matters when relying on some important info from page (e.g., grades).
 88 |   Browser tries to provide a UI to indicate the origin of the site.
 89 |   User is assumed to interpret the UI's meaning correctly.
 90 |   What are the rules for interpreting the UI?
 91 |     Securely visiting a site from origin foo if the URL starts with https://foo/
 92 |       and the browser displays a lock icon next to the address bar.
 93 |     What happens in the UI when you accept a self-signed certificate?
 94 |     What happens in the UI if the certificate is expired?
 95 |     What happens in the UI if the origin name doesn't match certificate name?
 96 |       User prompted to make a decision about whether to trust certificate.
 97 |       Screen looks the same after user confirms an exception.
 98 |       Presumably the assumption is, user is aware of the approved exceptions.
 99 |   Phishing attacks often work despite SSL, HTTPS, and this UI.
100 | 
101 | Simplest attack: adversary's phishing site doesn't use SSL or HTTPS.
102 |   Users might not even know to look for SSL certificate and lock icon.
103 | 
104 | Visual deception.
105 |   Copy logos, site layout.
106 |   Inject look-alike security indicators.
107 |   Create new windows that look like other dialog boxes.
108 |   User intuitively trained to authenticate based on page contents, not URL bar.
109 | 
110 | How do you know if the browser's lock icon is authentic?
111 |   Clever attack: set the favicon to a lock icon, appears near the address bar.
112 | 
113 | Look-alike domains.
114 |   Visually similar (use a 0 instead of an o, etc).
115 |   Exploit incorrect user intuition (ebay-security.com).
116 |   Unicode domain names: difficult/impossible to visually distinguish origins.
117 | 
118 | URLs that make it hard to visually identify origin.
119 |   https://www.paypal.com@attacker.com/...
120 | 
121 | Good principle: make security indicators, identifiers explicit, unambiguous.
122 |   Avoid user having to figure out origin.
123 |   Maybe explicitly have an "insecure" indicator for non-SSL sites?
124 | 
125 | What are "extended validation" certificates?
126 |   Browser displays green box containing company name corresponding to cert.
127 |   What problem are they solving?  User validating origin name.
128 |   How would the user know to look for an extended validation certificate?
129 |     No way to know ahead of time if you should see a regular or EV cert.
130 |     Similar to the bootstrapping problem in ForceHTTPS.
131 | 
132 | How do you know if a browser's UI is authentic?
133 |   If it's taking up the entire screen, outermost window is probably OK.
134 |   Must assume browser is not in full-screen mode.
135 |   Otherwise, one web page can draw a legitimate-looking browser sub-window.
136 |   Easy to guess what window would look like for common systems (e.g., Windows).
137 | 
138 | Confusing indicators: link target in the status bar.
139 |   Can be spoofed with javascript.
140 | 
141 | Mixed content: many browsers prompt user.
142 |   "This page contains both secure and insecure items.  Load insecure items?"
143 |   Users want the page to work, so they click yes.
144 |   Again, principles of active vs passive, safe defaults.
145 | 
146 | Why is phishing such a big problem?  What UI security problems contribute to it?
147 |   Novice users don't understand the threats they are facing.
148 |   Users don't have a clear mental model of the browser's security policy.
149 |     Users don't understand technical details of what constitutes an origin.
150 |       Some browsers (recent Firefox) copy origin into separate UI box.
151 |       Helps save user from visually parsing origin out of the URL.
152 |     Users don't understand what to look for in an SSL certificate / EV certs.
153 |     Users don't understand implications of security decisions.
154 |       Allow cookie?  Allow non-SSL content?
155 |   Browsers have complex security indicators.
156 |     Need to look at origin in URL bar, SSL certificate.
157 |     Security indicators can be absent instead of indicating a warning/error.
158 |     E.g., if site is non-SSL, nothing out-of-the-ordinary appears to the user.
159 |   Web applications don't use the indicators correctly.
160 |     Even legitimate companies often outsource some services!
161 |       E.g., URLs like "ebay.surveysite.com"
162 | 
163 | How to combat phishing?
164 |   Most common: maintain a database of known phishing sites.
165 |     Why isn't this fully effective?
166 |   Active vs passive warnings.
167 |     Habituation: users accustomed to warnings/errors.
168 |     Users focused on getting their work done.
169 |     If the warning gives an option to continue, users may think it's OK.
170 |   More intrusive measures are often more effective here.
171 |     Replace passwords with some other form of auth (smartcard, PAKE, etc).
172 |       Only works for credentials; attackers might still steal DOB, SSN, ..
173 |       Smartphone may be a marginally better device than desktop browser.
174 |       Bank of America now embeds small chip in ATM card, displays 6-digit PIN.
175 |         "SafePass": PIN required to log into online banking account.
176 |     Turn phishing into online attack.
177 |       Site must display an agreed-upon image before user enters password.
178 |       E.g., "SiteKey" used by Bank of America.
179 |       Can be hard for users to comprehend how and what this defends from.
180 |       As with EV certs, how does user know when to expect it?
181 | 
182 | Other human factors in system security
183 |   Social engineering attacks
184 |   Least privilege conflicts with allowing users to do their work
185 |   Trust in user vs trust in user's machine
186 |     Could go both ways.
187 |     Sometimes user is more trusted than machine (e.g., logging into bank site).
188 |     Other times, user is less trusted (e.g., DRM, online games, etc).
189 | 
190 | How to design usable systems?
191 |   Avoid generalizing too much (e.g., global PKI vs. app-specific key mgmt).
192 |     Users may not understand abstract general concepts.
193 |     Easier to explain application-specific issues, decisions.
194 |   Avoid false positives in security warnings (and then make them errors).
195 |     Active security warnings force user to make a choice (cannot ignore).
196 |     Present users with useful choices when possible.
197 |       Users want to perform their task, don't want to choose "stop" option.
198 |       E.g., search google for an authentic web site vs phishing attack?
199 |   Secure defaults; secure by design; "invisible security".
200 |     When does this work?
201 |     When is this insufficient?
202 |   Intuitive security mechanisms that make sense to the user.
203 |     Some of the Windows "privacy" knobs or wizards that give few options.
204 |   Train users (problem: attacks are rare, users will not learn on their own).
205 |     Users unlikely to spend time to learn on their own.
206 |     Interesting idea: try to train users as part of normal workflow.
207 |       Try to mount phishing attacks on user by sending spam to them.
208 |       If they fall for an attack, tell them what they should've looked for.
209 |       Can get tiresome after a while, if not done properly..
210 |     Security training games.
211 |   System designers, developers are not representative users.
212 | 
213 | 


--------------------------------------------------------------------------------
/previous-years/l19-cryptdb.txt:
--------------------------------------------------------------------------------
  1 | Database security and CryptDB
  2 | =============================
  3 | 
  4 | 1. Quick SQL recap
  5 | ------------------
  6 | 
  7 | Databases are structured storage; a database consists of tables (or relations). For example:
  8 | 
  9 | table emp:
 10 | 
 11 | name  SSN  salary
 12 | 
 13 | Alice  2   200
 14 | Bob    3   50
 15 | Chris  4   50
 16 | 
 17 | table jobs:
 18 | name   job
 19 | 
 20 | Alice  boss
 21 | 
 22 | Here are some basic SQL queries:
 23 | 
 24 | SELECT *           FROM emp WHERE name = "Alice"    => selects all rows with name being "Alice" (this is the first row in emp)
 25 |        name        FROM emp WHERE salary = 50       => selects all rows with salary being 50 (these are the last two rows in emp)
 26 |                             WHERE salary > 50       => selects all rows with salary > 50 (this is the first row in emp) 
 27 |                             ORDER BY name           => sorts rows by name (they are already sorted by name!)
 28 |                             GROUP BY salary         => groups together rows that have the same salary value (they are already grouped by salary!)
 29 |        sum(salary) FROM emp                         => selects the sum of the salaries (300)
 30 | 
 31 | other operations: !=, DISTINCT, +, MAX, MIN, GREATEST, COUNT   
 32 | 
 33 | How do you count the number of people with a salary of 100?
 34 | SELECT count(*) FROM emp WHERE salary = 100;
 35 | 
 36 | JOIN: puts together two tables by a common value
 37 | - for example, I may want to view together the fields for Alice from emp and jobs
 38 | SELECT * FROM emp, jobs WHERE emp.name = jobs.name
 39 | Result is 
 40 | name  SSN salary name   job
 41 | Alice 2   200    Alice   boss
 42 | 
 43 | - another example:  what is the salary of someone with the job boss:
 44 | SELECT salary FROM emp, jobs WHERE emp.name = jobs.name AND job = "boss"
 45 | 
 46 | Other types of queries:
 47 | INSERT : inserts rows in a table
 48 | DELETE : deletes rows
 49 | UPDATE : updates rows
 50 | 
 51 | 
 52 | 2. Traditional DB security
 53 | --------------------------
 54 | 
 55 | Databases can store confidential information such as medical or financial information, 
 56 | so we need ways to prevent data leakage or tampering.
 57 | 
 58 | Most common DB security measures today are permissions and encryption of data at rest.
 59 | 
 60 | 1. Permissions: 
 61 |   Users have accounts with the database.
 62 |   
 63 |   An administrator can set what each user can access.
 64 |   GRANT <privilege list> ON <table name/view name> TO <user/role>
 65 |          SELECT                                        e.g., Alice
 66 |          INSERT
 67 |          DELETE
 68 |          UPDATE
 69 |  
 70 |   There are also permissions for creating, dropping, and altering tables.
 71 | 
 72 | 2. Encrypting data at rest:
 73 |    Protects against people stealing disks.
 74 |    Key is usually at a centralized server, or provided by the client when connecting 
 75 |    to the DB server.
 76 |    The DB server keeps the key for the duration of a connection. 
 77 |    To process a SQL query, the DB server decrypts the data and processes the query 
 78 |    on the unencrypted data. Therefore, the DB server has access to unencrypted data.
 79 | 
 80 |    Common mistakes to avoid: don't store the key on disk with encrypted data and don't 
 81 |    backup unencrypted data :)
 82 | 
 83 | 
 84 | Why are these measures not enough? What kind of attacks are still possible?
 85 | - an attacker with access to the DB server can get the key or unencrypted data. 
 86 | This is what CryptDB aims to prevent.
 87 | 
 88 | 
 89 | 3. CryptDB
 90 | ----------
 91 | 
 92 | A database management system (DBMS) for protecting data confidentiality against
 93 | attacks to the DB server.
 94 | 
 95 | Use cases:
 96 | -- consider a company that places the DB in the cloud but keeps the application
 97 | in the local cluster. This happens for query analytics (e.g., some company has
 98 | a lot of data about user activity and wants to learn what is the average response
 99 | time of users, how popular some item was, etc.). Cloud administrators have root
100 | access to the DB servers and can see plaintext data at the server.
101 | -- also in the local cluster, one might want to remove the DB server from the
102 | trust perimeter.
103 | 
104 | 
105 | According to the Verizon Data Breach Investigations Report, internal agents were 
106 | involved in almost half of data breaches.
107 | 
108 | Also, you heard the Snowden case..
109 | 
110 | 
111 | Concrete threat model:
112 | - DB server is under attack by a passive adversary (adversary can read all data, but does not 
113 | modify the DB or the query results)
114 | - Application is in a trusted location
115 | 
116 | Approach:
117 | DB server sees *only* encrypted data
118 | DB server never gets the decryption key
119 | DB server processes SQL queries over encrypted data to provide DB functionality
120 | 
121 | Therefore, even an adversary with root access to the DB server can only see encrypted data.
122 | 
123 | Setup:
124 | CryptDB adds proxy on the application side.
125 | Proxy is:
126 | - trusted, stores a master key
127 | - does not store actual data, but only metadata such as the schema (e.g., table/column names)
128 | - encrypts queries from the application and sends them to the server
129 | - decrypts results from the DB server and gives them to the application
130 | - does not do any query execution -- the DB server executes the SQL queries
131 | 
132 | How to compute on encrypted data?
133 |   One possibility: FHE (fully homomorphic encryption): an encryption schemes that supports
134 |   any function, has strong security. However, it is prohibitively impractical, currently 10^9 times slower
135 |   than regular computation => FHE not an option for DBs
136 | 
137 |   
138 |   Instead we won't strive for general functionality. If we look at SQL operations, even though
139 |   there are more than 100, most of them can be implemented with a small set of basic operations:
140 |   get/put, +, >, =.
141 | 
142 |   We will use a fast encryption scheme for each one of these. Turns out this can support almost
143 |   all SQL queries in practice.
144 | 
145 | Set of encryption schemes
146 | --------------------------
147 | 
148 | Scheme  | Functionality   |   SQL operation examples     |     Security
149 | -------------------------------------------------------------------------------------------------------------- 
150 | RND        get/put           select, insert, delete, count     semantic security (strong security notion, randomized)
151 |                                                                (roughly: given ciphertext, any two plaintexts are equally likely)
152 | HOM         +                       sum, +                     semantic
153 | SEARCH    keyword search        restricted LIKE                ~semantic
154 | DET         =                 GROUP BY, =, !=, DISTINCT        deterministic
155 | JOIN      = across columns          join by =                  deterministic
156 | OPE        >                ORDER BY, >=, MAX, MIN, GREATEST   only reveals order
157 | 
158 | 
159 | HOM has property that HOM(x) * HOM(y) mod n^2 = HOM(x+y), where n^2 is some value from the public key. So
160 | the DB server can perform this operation without needing to know some secret key.
161 | 
162 | DET allows equality on encrypted data as on unencrypted data.
163 | 
164 | OPE allows order on encrypted data as on unencrypted data.
165 | 
166 | How to encrypt data with these?
167 | -----------------------------------
168 | - different queries perform different operations over the same data, 
169 | so need different encryption schemes
170 | - we don't know queries ahead of time
171 | 
172 | Strawman solution: encrypt each data with each one of these separately
173 |   - main problem: security: reveal order for every column when an application may
174 |     never perform order on a certain column
175 |   - also wastes space 
176 | 
177 | Solution: onions of encryptions
178 |   - stack encryption schemes:
179 |     - onion equality: RND(DET(JOIN(value)))
180 |     - onion order:    RND(OPE(value)) 
181 |     - onion add for integers: HOM(value)
182 |     - onion search for text: SEARCH(text)
183 | 
184 |  - note that functionality increases as you go more towards the core of the onion
185 |  - security increases towards the outside of the onion: topmost schemes are (~) semantically secure
186 |  - different keys for different levels of the onion
187 |  
188 | To execute "SELECT * from emp where salary = 100", what onion level do we need for salary? 
189 | - need DET, so remove RND
190 | 
191 |  - adjust onion level by peeling off a level: proxy gives key to server for specific level; 
192 |    key can only decrypt that level
193 |  - lowest onion level not removed
194 |  - do not put back onion level so all future queries with same operations do not need to do peeling
195 | 
196 | After removing RND, proxy rewrites query "SELECT * from emp where salary = 100" into
197 | SELECT * from salary where salary-eq = DET(JOIN(100)), 
198 | where salary-eq is the column containing the equality onion for salary.
199 | 
200 | In fact, the proxy anonymizes names of tables and columns so the query looks like this:
201 | SELECT * from table1 where C3-eq = DET(JOIN(100))
202 | 
203 | 
204 | Why do we need three onions? Can we stack HOM on top of OPE?
205 | - not all encryption schemes are stackable; e.g. cannot use HOM on top of OPE for addition any more
206 | because it adds underlying OPE values
207 | 
208 | 
209 | Since encryption schemes cannot be combined arbitrarily, the proxy needs to rewrite queries
210 | carefully. A nontrivial example:
211 | 
212 | SELECT sum(greatest(salary, 150)) FROM emp;
213 | 
214 | gets rewritten by the proxy into:
215 | - adjust query: remove RND on onion order because we need OPE
216 | 
217 | - new query:
218 | SELECT cryptdb_add(
219 |       if C3-ord >= OPE(150)   
220 |          return C3-add
221 |       else
222 | 	 return HOM(150)
223 | ) FROM table1;
224 | 
225 | Basically, to implement the GREATEST operator, the proxy does the comparison based on OPE encryptions
226 | and then returns the corresponding homomorphic encryption so that cryptdb_add can do summation.
227 | 
228 | 
229 | Performance:
230 | -----------
231 | 
232 | - practical because most operations happen on unencrypted data as on encrypted data
233 | - onion adjustment is infrequent, only when seeing a new operation
234 | - 26% throughput overhead on an industry benchmark as compared to vanilla MySQL
235 |   
236 | What queries it cannot support
237 | ------------------------------
238 | - supports all queries on an industry benchmark and 99.5% of queries at a popular MySQL server
239 | 
240 | - cannot support fancy math such as sine: SELECT * FROM .. WHERE sin(x) > y;
241 | 
242 | - also, it cannot support simpler operations if they are nested in certain ways, 
243 | e.g. a+b>c: need HOM for a+b, but cannot compare HOM, need OPE for compare, but don't have OPE
244 | encryption of (a+b)
245 | 
246 | Is CryptDB getting used?
247 | - new system (2011), starting to get used:
248 |   -- Google uses CryptDB's design to build the Encrypted BigQuery service (Enquirer) 
249 |   -- Lincoln Labs
250 |   -- SAP, sql.mit.edu, some startup
251 | 
252 | 
253 | DEMO
254 | ----
255 | If interested, checkout the code and play with it: 
256 | http://css.csail.mit.edu/cryptdb/
257 | 
258 | 
259 | 
260 | Multi-principal CryptDB
261 | -----------------------
262 | [this section was only discussed at a high level]
263 | - Multi-principal CryptDB tries to limit damage from proxy attacks
264 | - instead of having one master key at the proxy, each user has a user key that can 
265 | only decrypt that user's data
266 | - when logged in, each user provides their key to the proxy and query processing happens
267 | similarly to before
268 | - confidentiality guarantee: if a user is logged out during an attack, his/her data is safe
269 | - provides no guarantee for a logged in user during an attack 
270 | 


--------------------------------------------------------------------------------
/previous-years/l06-java.txt:
--------------------------------------------------------------------------------
  1 | Java Security
  2 | =============
  3 | 
  4 | Confinement of mobile code
  5 |     What does this paper mean by mobile code?
  6 | 	Java, Javascript, ActiveX, Flash, ...
  7 | 	Code you download and execute on your machine
  8 |     How is this different from downloading a standard desktop application?
  9 | 	E.g. you might download an entire Ubuntu CD, or just 'vi', or 'Notepad'
 10 | 	Main difference: want the 'mobile code' to run without user input
 11 | 	Need a better security plan, since arbitrary code may run at any time
 12 |     Challenge: still need to allow mobile code to do some privileged operations
 13 | 	E.g. write its own files, some network access, ..
 14 | 
 15 | Why not existing techniques?
 16 |     OS isolation (KeyKOS, Unix ala OKWS)
 17 | 	Portability: Each OS has different mech (Windows, Linux, KeyKOS, ..)
 18 | 	Performance: OS mechanisms rely on hardware isolation.
 19 | 	    Expensive to switch between protection domains.
 20 | 	    Why?  Often need to do a context switch, change page tables.
 21 | 	    Causes invalidation of hardware caches, incurring a perf. penalty.
 22 | 	Didn't KeyKOS, OKWS perform OK?
 23 | 	    This paper argues it was at the expense of ease-of-programming.
 24 | 	    Requires programmer to worry about protection domain crossings.
 25 | 	    Language-level protection domain crossings might be 1000x cheaper.
 26 |     Software isolation like XFI
 27 | 	XFI +: can use XFI with any language (not just Java)
 28 | 	XFI -: XFI mainly addresses memory safety, not other privileges
 29 | 
 30 | Two kinds of protection / security in Java
 31 |     Memory protection
 32 | 	Making sure one program does not corrupt another program's state.
 33 | 	How does java achieve this?
 34 | 	    Verify bytecode before running it.
 35 | 		Bytecode can only access data, invoke code safely.
 36 | 		No way to directly access memory, invoke syscall, ..
 37 | 	    Safety rules: type safety and private/public methods/fields.
 38 | 		Java: all code in classes, all data in objects.
 39 | 		Object fields public or private (to code in that class).
 40 | 	    Would range checking suffice?  e.g. object is of a given size?
 41 | 		probably not: can synthesize pointers
 42 |     Secure services
 43 | 	Application might want to do more things than just access memory.
 44 | 	    Access files, network
 45 | 	Many examples from past papers:
 46 | 	    Compiler in 'confused deputy'
 47 | 	    Services in OKWS
 48 | 	    setuid programs in Unix
 49 | 	Requires more than just memory safety, but policy is more varied.
 50 | 	Java: provides mechanisms to enforce some dynamic security policy.
 51 | 
 52 | Basic Java sandbox model
 53 |     Two kinds of code: trusted and untrusted
 54 | 	Differentiated by their ClassLoaders
 55 | 	ClassLoader: turns class name (string) into a Java class object
 56 | 	Resulting Java classes keep a pointer to their original ClassLoader
 57 | 	    One ClassLoader used to load java code from the local disk
 58 | 	    Another ClassLoader used to load java code from the network
 59 |     Both kinds of classes co-exist in the same JVM
 60 | 	Why?
 61 | 	Presumably system classes (trusted) invoke applet code (untrusted).
 62 | 	Want to allow system classes to, e.g., load applet code over the net.
 63 |     Operations: file system access, network access
 64 |     Who can invoke operations?
 65 | 	Trusted context: any operation
 66 | 	Untrusted context: communicate with applet's origin server
 67 |     Why do we need stack inspection to determine trusted vs untrusted context?
 68 | 	Rule: context is untrusted if any stack frame is running untrusted code
 69 | 	Why not just check the immediate caller?
 70 | 	    There may be trusted (local) code that invokes sensitive ops
 71 | 	    Untrusted code may thus perform sensitive operations indirectly
 72 | 	    E.g.: trusted void countLines(String filename)
 73 | 		  should it be allowed to access the specified file?
 74 |     Who performs this check?
 75 | 	There's a SecurityManager in Java
 76 | 	Whenever trusted code is about to perform privileged op, calls SecMgr
 77 |     Paper argues this model is too restrictive.  Why?
 78 | 	Want trusted code to do sensitive ops, even if called from untrusted.
 79 | 	Want to grant limited privileges to not-fully-trusted (remote) code.
 80 | 	    Grant screen-drawing privs to youtube.com
 81 | 	    Grant access to local storage to gmail.com
 82 | 	    => Need to differentiate between diff untrusted code (principals)
 83 |     Is this model sufficiently restrictive?  Does it always guarantee safety?
 84 | 	Trusted code can be tricked by untrusted code, without it being on stack
 85 | 	E.g., in trusted code:
 86 | 	    open(x.getFileName()), where x is an untrusted object.
 87 | 	Or, register an Swing/AWT callback with specially-crafted arguments?
 88 | 	All stack frames will be trusted, but untrusted code "caused" the op
 89 |     Is this ClassLoader-based separation reliable?
 90 | 	Assumes adversary can't get local ClassLoader to load his/her code
 91 | 	But lots of untrusted data is reachable by local ClassLoader
 92 | 	E.g. browser cache, network paths, temporary files, ..
 93 | 
 94 | Pre-requisites for more fine-grained access control in Java
 95 |     Principal
 96 | 	Some party that signs a piece of code
 97 | 	Very general: any principal we might want can be a signing key
 98 | 	Expected use: software developer signs.
 99 | 	Is this a good right model?
100 | 	What doesn't this work for?
101 | 	    Applet may want to priv-separate itself.
102 | 	    Could perhaps use many keys to sign different components.
103 |     Objects ("targets") and operations
104 | 	Some well-defined ones: network, file system, ..
105 | 	Application may want to define its own
106 |     Policy
107 | 	Not very clearly spelled out in this paper what the policy looks like
108 | 	Rules that grant access to operations on objects to diff. principals
109 | 
110 | Capabilities
111 |     Key idea: rely on Java pointers/references being unforgeable
112 |     How does this help vs. sandbox policy?
113 | 	Code can invoke methods on an object if it has an object reference.
114 | 	Also subject to Java's private/restricted method access control.
115 | 	Can grant privileges to applets (by giving them object reference).
116 |     Example: Figure 2: SubFS, kind-of like chroot
117 | 	Can attacker get fs from SubFS?
118 | 	Can attacker change rootPath?
119 | 	Can attacker make a new SubFS?
120 |     Why isn't Java pointers good enough for capability-based isolation?
121 | 	Global namespaces inside Java
122 | 	    Can access things using class names
123 | 	    - invoke class constructor
124 | 	    - access global static objects
125 | 	    - access static class methods
126 | 	Pure capability systems have no global names (e.g., KeyKOS)
127 | 
128 | Stack introspection
129 |     What's the strawman design?
130 | 	enablePrivilege(target)
131 | 	disablePrivilege(target)
132 | 	checkPrivilege(target)
133 |     Who makes these calls?
134 | 	checkPrivilege invoked before performing privileged operation, as before
135 | 	enable/disablePrivilege invoked by code that's about to call above func
136 |     WHat are these targets?
137 | 	Free-form: just need to match up between check and enable (& policy)
138 | 	In reality it's a hierarchical namespace of targets
139 | 	    e.g., { java.io.FilePermission, "read,write", "/home/alice/*" }
140 |     Who is allowed to call enablePrivilege()?
141 | 	Trusted code can call enablePrivilege() on any target
142 | 	Untrusted (applet) code: depends on policy
143 |     What does this policy look like?
144 | 	Not well-specified, but probably a set of (principal, target) pairs
145 | 	User sometimes involved in specifying policy.
146 | 	    Java runtime asks 'grant somePrivilege to someApplet'?
147 |     How does this help, vs. simple sandbox policy?
148 | 	Trusted code can call enablePrivilege() to perform trusted op.
149 | 	.. even if it was called by untrusted code higher in the stack.
150 |     What goes wrong with the strawman?
151 | 	Simple: need to ensure privileges are local to a thread.
152 | 	    Solution: keep track of privileges per thread.
153 | 	Simple: might forget to disablePrivilege()
154 | 	    Solution: disable privileges on function return.
155 | 	Tricky: trusted code invokes untrusted code ("luring" attack)
156 | 	    Untrusted code might be able to use all enablePrivilege()'s
157 | 		When might this happen?  Callbacks like AWT, ..
158 | 	    Solution: verify that entire stack from checkPrivilege() to
159 | 		enablePrivilege() consists of principals allowed that privilege
160 | 	    Is this safe?  Yes, any stack frame could call enablePrivilege().
161 |     Example: Figure 4 (SubFS using stack introspection)
162 | 	Some traces of capability design: ref to FileInputStream allows access
163 | 	Why is it OK for FS constructor to be public now?
164 | 	Why does an applet going through a TrustedServce object work?
165 | 	FS object remembers if it was constructed from trusted ctxt (usePrivs)
166 |     Who gets to define the targets?  Why does it matter?
167 | 	Might want third-party code to control its own privileges.
168 | 	What does it mean to define a target?
169 | 	    Really about who can call enable (and who can specify policy rules).
170 | 	    Anyone can always call checkPriv() on any target they want.
171 | 	Netscape design: add extra level of hierarchy to target names: princial
172 | 	    (System, "FileAccess", "/home/alice/*")
173 | 	    (google.com, "GmailFolderAccess", "Spam")
174 | 	    gmail code might grant the latter priv to client-side spam filter
175 |     How does code signing work?
176 | 	Microsoft and Netscape have different schemes
177 | 	Netscape scheme:
178 | 	    Principals sign code.
179 | 	    Code asks for privileges to targets at runtime (call some function).
180 | 	    Policy says what principals should be granted what targets.
181 | 	    If policy doesn't know, ask the user.
182 | 	    Multiple principals can sign code (why is this a good idea?).
183 | 		What happens with policy?  Need one positive, no negatives.
184 | 	Microsoft scheme:
185 | 	    One principal signs (code, targets)
186 | 	    Policy says what principals should be granted what targets
187 | 	    More explicit: principal never grants unintended targets
188 | 		Suppose Google has one signing key
189 | 		Doesn't want to grant all privileges to all of their code
190 | 		Can specify specific subsets of privileges for each signed class
191 | 		Good idea: always be explicit
192 | 
193 | Namespace management
194 |     Effectively a way to make capabilities work with existing Java libraries
195 |     No more global namespace of classes
196 |     Instead, separate namespace for every principal
197 | 	Can eliminate some classes entirely (will get undefined class exception)
198 | 	Can provide specialized, restricted implementation
199 |     Implemented using a ClassLoader that looks at caller's principal
200 |     Pitfalls?
201 | 	Might be able to abuse code that _does_ have access to protected classes
202 | 	Confused deputy problem: ambient authority (access to classes by name)
203 | 	No analog of enablePrivilege(), luring attack is possible
204 | 
205 | Mediation
206 |     Why do they think it's a bad idea to grant privileges?
207 |     In KeyKOS this seemed helpful in constructing least-privilege domains.
208 |     Worry: you might be wrong about granting privileges to some applet
209 | 	Want to revoke privileges and not worry about where they leaked.
210 | 	No way to revoke granting a capability, in a basic capability design.
211 | 	Typical trick: wrapper that checks who called (stack insp?)
212 |     Can stack inspection leak privileges?
213 | 	Avoids luring attack by checking every frame along the path to enableP
214 |     Can namespace management leak privileges?
215 | 	Trusted code can accidentally return a 'privileged object'
216 | 	Untrutsed code might invoke it, if it has access to some superclass
217 | 	    (e.g. InputStream vs FileInputStream)
218 | 
219 | Accountability
220 |     Capabilities: know who got the capability at first, but not who used it
221 |     Stack inspection: know who called enablePrivilege()
222 |     Name space: no idea what happened, aside from what the namespace policy was
223 | 
224 | Least privilege
225 |     How do the systems stack up?
226 |     Easiest to reduce privileges in capabilities
227 |     Easy to reduce some privileges in stack inspection (call disablePrivilege)
228 |     Hard to disable privileges in namespace management
229 | 
230 | Performance
231 |     Stack introspection has higher runtime cost (inspect the stack)
232 | 
233 | In practice, stack inspection seems to be the prevalent Java security model now
234 | 
235 | 


--------------------------------------------------------------------------------
/previous-years/l10-memauth.html:
--------------------------------------------------------------------------------
  1 | 
  2 | <!-- saved from url=(0056)http://css.csail.mit.edu/6.858/2009/lec/l10-memauth.html -->
  3 | <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><style type="text/css"></style></head><body dir="ltr" style="max-width:8.5in" class=" hasGoogleVoiceExt">
  4 | 
  5 | <h1>Memory Authentication (notes by Marten van Dijk)</h1>
  6 | 
  7 | <b>Read:</b> R. Elbaz, D. Champagne, C. Gebotys, R.B. Lee,
  8 | N. Potlapally, and L. Torres, "Hardware Mechanisms for Memory
  9 | Authentication: A Survey of Existing Techniques and Engines", Trans. on
 10 | Comput. Sci., pp. 1-22, 2009.
 11 | 
 12 | <p>
 13 | <b>Model (assumptions, security requirements, possible attacks):</b>
 14 | 
 15 | </p><p>
 16 | 	What is memory authentication? The ability to verify that the data
 17 | 	read from memory by the processor at a given address is the data
 18 | 	it last wrote at this address.
 19 | 
 20 | </p><p>
 21 | 	Why memory authentication? An adversary able to corrupt the
 22 | 	memory space can affect the computations performed by a trusted
 23 | 	computing platform. Assumptions: Too costly to provide a tamper
 24 | 	evident environment that includes memory. The trusted computing
 25 | 	platform is a single chip secure processor with limited on-chip
 26 | 	storage. It is resistant to all physical attacks, including
 27 | 	invasive ones. Whenever a sensitive computation is finalized,
 28 | 	the authenticity of the sequence of off-chip memory operations
 29 | 	during the computation needs to be verified. Notice that,
 30 | 	depending on the application, this happens all the time, or
 31 | 	sometimes. This leads to different authentication strategies
 32 | 	(tree based or not tree based).
 33 | 
 34 | </p><p>
 35 | 	Is it realistic to have a single chip processor which is
 36 | 	resistant to all physical attacks? It may be acceptable to
 37 | 	be only resistant to all physical attacks that cost not too
 38 | 	much. This excludes attacks performed by the common hacker. We
 39 | 	do not try to protect against attacks performed by engineers
 40 | 	with access to expensive laboratories. The required security
 41 | 	(and its costs) is part of a business model.
 42 | 
 43 | </p><p>
 44 | 	Data integrity refers to the ability to detect any adversarial
 45 | 	corruption or tampering of data. What is its relation to memory
 46 | 	authentication? Memory authentication implies data integrity.
 47 | 
 48 | </p><p>
 49 | 	The objective of memory authentication is to thwart active
 50 | 	attacks that tamper with off-chip memory contents. What are
 51 | 	these active attacks? Spoofing: existing memory block is
 52 | 	exchanged with an arbitrary fake one. Splicing/Relocation:
 53 | 	replace a memory block at address A with a block at address
 54 | 	B. Replay: a memory block located at a given address is
 55 | 	recorded and inserted at the same address at a later point
 56 | 	in time.
 57 | 
 58 | </p><p>
 59 | 	What is a software attack? A software attack is an
 60 | 	active attack performed by a compromised OS or malicious
 61 | 	application. We cannot trust the OS, even if the OS kernel
 62 | 	is designed to isolate sensitive applications from malicious
 63 | 	software. For now, we will not consider software attacks.
 64 | 
 65 | </p><p>
 66 | 	What other security requirements are of interest? Access
 67 | 	control; applications should not be able to access one
 68 | 	another's application specific data. Data confidentiality;
 69 | 	encrypt sensitive data to ensure privacy.
 70 | 
 71 | </p><p>
 72 | 	Why are these requirements not sufficient? An attacker
 73 | 	may manipulate encrypted data. So, we also need memory
 74 | 	authentication.
 75 | 
 76 | </p><p>
 77 | 	Does encryption of sensitive data really ensure privacy? What
 78 | 	about correlating memory access patterns? This may leak
 79 | 	information about the nature of the application and to which
 80 | 	other applications, with which it shares memory, it is related.
 81 | 
 82 | </p><p>
 83 | <b>Solutions (tree based solutions and a non-tree based solution):</b>
 84 | 
 85 | </p><p>
 86 | 	What is a naive solution for memory authentication? Store
 87 | 	digest of entire memory in on-chip storage. Unacceptable.
 88 | 
 89 | </p><p>
 90 | 	What is a next best solution? Store a digest of every memory
 91 | 	block (cache block), see Figure 3(a). Reduces memory bandwidth
 92 | 	overhead, but needs too much (expensive) on-chip memory.
 93 | 
 94 | </p><p>
 95 | 	What is a slightly better solution? Using nonces costs less
 96 | 	on-chip memory, see Figure 3(b). If the nonce generator runs
 97 | 	out of range, need to reset key k and update all memory (this
 98 | 	can be done in idle time). Does the nonce generator need to
 99 | 	output unique nonces, do nonces for different addresses need to
100 | 	be different? No, if we compute each MAC as a function of the
101 | 	key k, nonce N, and address A. So, we can use smaller nonces,
102 | 	leading to less on-chip storage. When a smaller nonce specific
103 | 	to a certain address runs out of range, this address cannot
104 | 	be used until key k is reset. If we use smaller nonces of say
105 | 	16 bits, can we use random nonces? No, with probability 1/
106 | 	2^16 this leads to a collision and possibly to an attack. We
107 | 	need deterministic nonces: for each update, just increment
108 | 	the corresponding nonce by 1.
109 | 
110 | </p><p>
111 | 	How can we add data confidentiality? Replace MAC by an
112 | 	encryption E, see Figure 3(c).
113 | 
114 | </p><p>
115 | 	What trick improves the current solutions? Integrity trees
116 | 	with Hash (Merkle tree), MAC + nonces, or E + nonces. See
117 | 	Figure 5. How does it work? Merkle tree update procedure is
118 | 	sequential: computation of a new hash node in a branch must
119 | 	be completed before the update of the next branch node can
120 | 	start. The PAT read and update procedures are parallelizable,
121 | 	how does this work? An intermediate node stores the MAC of
122 | 	the nonces stored in its children and these are all known
123 | 	beforehand. An intermediate node does not store the MAC
124 | 	of the MACs stored in its children (does would be similar
125 | 	to the Merkle tree). The TEC-Tree uses encryption, and
126 | 	does not store nonces. How are nonces retrieved during an
127 | 	update? Use decryption. Is a mixture of reads and updates
128 | 	parallelizable? No.
129 | 
130 | </p><p>
131 | 	How do we find the address of a parent node? Use tree
132 | 	traversal, which leads to (eq. 1).
133 | 
134 | </p><p>
135 | 	How can we make use of the cache to improve performance? Tree
136 | 	read and update procedures are terminated as soon as a cached
137 | 	hash or the root is encountered.
138 | 
139 | </p><p>
140 | 	What is a Bonsai Merkle Tree? Use many smaller trees using
141 | 	counters/nonces. Store their roots. Protect the counters
142 | 	using a special tree and store its intermediate nodes and
143 | 	leaves. See Figure 6. Notice the statement "a full page
144 | 	needs to be cryptographically processed every time a local
145 | 	counter rolls over". Don't worry about the details of
146 | 	this method. Just notice the similarity to our discussion of
147 | 	the solution presented in Figure 3(b).
148 | 
149 | </p><p>
150 | 	What do we need to do if the OS cannot be trusted? Build
151 | 	an integrity tree which covers only pages belonging to the
152 | 	application and which can only be updated when the application
153 | 	itself is running. Page table maps a page's virtual address
154 | 	to its physical address. Branch splicing attack corrupts
155 | 	the physical address corresponding to the virtual address
156 | 	of a given memory block, see Figure 7. Therefore, we need
157 | 	to build a tree over the virtual address space. The virtual
158 | 	address generated by the protected application is used to
159 | 	traverse the tree. What are the shortcomings? Not scalable:
160 | 	need a large memory capacity (allocation of physical page
161 | 	frames for the 2^64 bytes of leaf nodes that are defined
162 | 	during initialization, as well as allocation of memory for
163 | 	the non-leaf tree nodes) and a large initialization overhead
164 | 	(takes too much time to initialize such a tree). We also
165 | 	require one full-blown tree for each application requiring
166 | 	protection, rather than a single tree protecting all software
167 | 	in physical memory. How can we try to partially overcome
168 | 	these problems? Introduce a new hardware unit that builds an
169 | 	integrity tree over a reduced address space, which contains
170 | 	only those pages needed for the application's execution
171 | 	(it grows dynamically as the application memory footprint
172 | 	increases).
173 | 
174 | </p><p>
175 | 	Memory authentication without a tree structure: Lhash is
176 | 	designed for applications requiring integrity checking after
177 | 	a sequence of memory operations (as opposed to checking every
178 | 	memory operation as in tree schemes). It uses a multiset
179 | 	hash function to maintain at runtime a write and read log of
180 | 	memory locations, called WriteHash and ReadHash. These logs are
181 | 	stored on-chip. At initialization WriteHash is computed over
182 | 	the memory chunks belonging to the memory region that needs
183 | 	to be authenticated. WriteHash is updated at runtime when an
184 | 	off-chip write is performed or when a dirty cache block is
185 | 	evicted from cache. WriteHash reflects the off-chip memory
186 | 	state at any time. ReadHash is updated whenever an off-chip
187 | 	read is performed or a chunk is brought in cache. To check
188 | 	the integrity of a sequence of operations, all the blocks
189 | 	belonging to the memory region that are not present in cache
190 | 	are read, after which the ReadHash should equal the WriteHash.
191 | 
192 | </p><p>
193 | 	Requirements of multiset hash functions: compression
194 | 	(guarantees that we can store hashes in a small bounded amount
195 | 	of memory), comparability (a probabilistic algorithm that
196 | 	compares hashes, needed since a multiset not always hashes to
197 | 	the same value), incrementality (adding hashes of multisets
198 | 	together results in a hash of the union of the multisets),
199 | 	and multiset collision resistance (computationally infeasible
200 | 	to find two distinct multisets that hash to comparable hashes).
201 | 
202 | </p><p>
203 | <b>Related topics:</b>
204 | 
205 | </p><p>
206 | 	Related topics: Data authentication symmetric multi-processors:
207 | 	need to consider bus transaction authentication on cache
208 | 	to cache transfers required in cache coherency protocols,
209 | 	see Figure 8.
210 | 
211 | </p><p>
212 | 	Related topics: How can we use an untrusted server to provide
213 | 	trusted storage for a large number of directories, where the
214 | 	files in each directory may be accessed and updated by several
215 | 	different devices that may be offline at different times and
216 | 	may not be able to communicate with each other except through
217 | 	an untrusted server (over an untrusted network). The multi-user
218 | 	network file system SUNDR offers protection against forking
219 | 	attacks, a form of attack where a server uses a replay attack
220 | 	to give different users a different view of the current state
221 | 	of the system. However, it does not <i>immediately</i> detect
222 | 	forking attacks. Instead, it offers <i>fork consistency</i>,
223 | 	which essentially ensures that the system server either behaves
224 | 	correctly, or that its failure or malicious behavior will be
225 | 	detected at a later moment when users are able to communicate
226 | 	with each other (for example, once a day during night time).
227 | 	If the untrusted server has a time-stamping device that can
228 | 	be trusted (for example, by using an embedded Trusted Platform
229 | 	Module), then immediate detection of forking and replay attacks
230 | 	is possible.
231 | 
232 | </p><p>
233 | 	Related topics: Cloud storage. A client wants to make sure
234 | 	that a replica of their data exists. A proof of retrievability
235 | 	(POF) is a short message generated by a server that contains
236 | 	evidence that a correct version of the client's data is
237 | 	stored at the server. This gives the possibility for clients
238 | 	to efficiently check that their data is sufficiently backed up
239 | 	(check out the latest news on the loss of T-mobile data!).
240 | 
241 | </p><p>
242 | 	Related topics: Sparse memory. By using an authenticated search
243 | 	tree, a proof of non-existence of a leaf that corresponds to
244 | 	an encrypted address can be generated. This protects against
245 | 	denial of existence of stored values (required for sparse
246 | 	memory), replay attacks, and unauthorized access.
247 | 
248 | 
249 | </p></body></html>


--------------------------------------------------------------------------------