├── README.md ├── papers ├── baggy.pdf ├── brop.pdf ├── klee.pdf ├── nacl.pdf ├── okws.pdf ├── urweb.pdf ├── android.pdf ├── capsicum.pdf ├── kerberos.pdf ├── forcehttps.pdf ├── medical-sw.pdf ├── passwords.pdf ├── taintdroid.pdf ├── tor-design.pdf ├── owasp-top-10.pdf ├── trajectories.pdf ├── brumley-timing.pdf ├── confused-deputy.pdf ├── lookback-tcpip.pdf ├── private-browsing.pdf ├── passwords-extended.pdf └── .htaccess ├── Makefile ├── .htaccess ├── old-quizzes.md ├── old-quizzes.html ├── quiz2-tor.md ├── quiz2-tor.html ├── README.html ├── quiz2-medical-dev.md ├── index.md ├── previous-years ├── l12-resin.txt ├── l14-resin.txt ├── l22-usability-2.txt ├── l21-captcha.txt ├── l20-bots.txt ├── l23-voting.txt ├── l21-dropbox.txt ├── l18-dealloc.txt ├── l19-backtracker.txt ├── l20-traceback.txt ├── l17-vanish.txt ├── l07-xfi.txt ├── l11-spins.html ├── l08-browser-security.txt ├── l22-usability.txt ├── l19-cryptdb.txt ├── l06-java.txt └── l10-memauth.html ├── quiz2-medical-dev.html ├── index.html └── l08-my-web-security.md /README.md: -------------------------------------------------------------------------------- 1 | index.md -------------------------------------------------------------------------------- /papers/baggy.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/baggy.pdf -------------------------------------------------------------------------------- /papers/brop.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/brop.pdf -------------------------------------------------------------------------------- /papers/klee.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/klee.pdf -------------------------------------------------------------------------------- /papers/nacl.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/nacl.pdf -------------------------------------------------------------------------------- /papers/okws.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/okws.pdf -------------------------------------------------------------------------------- /papers/urweb.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/urweb.pdf -------------------------------------------------------------------------------- /papers/android.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/android.pdf -------------------------------------------------------------------------------- /papers/capsicum.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/capsicum.pdf -------------------------------------------------------------------------------- /papers/kerberos.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/kerberos.pdf -------------------------------------------------------------------------------- /papers/forcehttps.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/forcehttps.pdf -------------------------------------------------------------------------------- /papers/medical-sw.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/medical-sw.pdf -------------------------------------------------------------------------------- /papers/passwords.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/passwords.pdf -------------------------------------------------------------------------------- /papers/taintdroid.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/taintdroid.pdf -------------------------------------------------------------------------------- /papers/tor-design.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/tor-design.pdf -------------------------------------------------------------------------------- /papers/owasp-top-10.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/owasp-top-10.pdf -------------------------------------------------------------------------------- /papers/trajectories.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/trajectories.pdf -------------------------------------------------------------------------------- /papers/brumley-timing.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/brumley-timing.pdf -------------------------------------------------------------------------------- /papers/confused-deputy.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/confused-deputy.pdf -------------------------------------------------------------------------------- /papers/lookback-tcpip.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/lookback-tcpip.pdf -------------------------------------------------------------------------------- /papers/private-browsing.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/private-browsing.pdf -------------------------------------------------------------------------------- /papers/passwords-extended.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/7h3rAm/6.858-lecture-notes/master/papers/passwords-extended.pdf -------------------------------------------------------------------------------- /papers/.htaccess: -------------------------------------------------------------------------------- 1 | # Protect the htaccess file 2 | 3 | Order Allow,Deny 4 | Deny from all 5 | 6 | 7 | # Enable directory browsing 8 | Options All Indexes 9 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SRCS=$(wildcard *.md) 2 | 3 | HTMLS=$(SRCS:.md=.html) 4 | 5 | %.html: %.md 6 | @echo "Compiling $< -> $*" 7 | markdown $< >$*.html 8 | 9 | all: $(HTMLS) 10 | @echo "HTMLs: $(HTMLS)" 11 | @echo "MDs: $(SRCS)" 12 | -------------------------------------------------------------------------------- /.htaccess: -------------------------------------------------------------------------------- 1 | # Protect the htaccess file 2 | 3 | Order Allow,Deny 4 | Deny from all 5 | 6 | 7 | # Protect .git/ 8 | 9 | Order Allow,Deny 10 | Deny from all 11 | 12 | 13 | 14 | Order Allow,Deny 15 | Deny from all 16 | 17 | 18 | 19 | Order Allow,Deny 20 | Deny from all 21 | 22 | 23 | 24 | Order Allow,Deny 25 | Deny from all 26 | 27 | 28 | 29 | Order Allow,Deny 30 | Deny from all 31 | 32 | 33 | # Disable directory browsing 34 | Options All -Indexes 35 | -------------------------------------------------------------------------------- /old-quizzes.md: -------------------------------------------------------------------------------- 1 | Some questions may already be [here](http://css.csail.mit.edu/6.858/2014/quiz.html) 2 | 3 | Quiz 2 2011 4 | ----------- 5 | 6 | Q8: An "Occupy Northbridge" protestor has set up a Twitter 7 | account to broadcast messages under an assumed name. In 8 | order to remain anonymous, he decides to use Tor to log into 9 | the account. He installs Tor on his computer (from a 10 | trusted source) and enables it, launches Firefox, types in 11 | www.twitter.com into his browser, and proceeds to log in. 12 | What adversaries may be able to now compromise the protestor 13 | in some way as a result of him using Tor? Ignore security 14 | bugs in the Tor client itself. 15 | 16 | A8: The protestor is vulnerable to a malicious exit node 17 | intercepting his non-HTTPS-protected connection. (Since Tor 18 | involves explicitly proxying through an exit node, this is 19 | easier than intercepting HTTP over the public internet.) 20 | 21 | 22 | Q9: The protestor now uses the same Firefox browser to 23 | connect to another web site that hosts a discussion forum, 24 | also via Tor (but only after building a fresh Tor circuit). 25 | His goal is to ensure that Twitter and the forum cannot 26 | collude to determine that the same person accessed Twitter 27 | and the forum. To avoid third-party tracking, he deletes all 28 | cookies, HTML5 client-side storage, history, etc. from his 29 | browser between visits to different sites. How could an 30 | adversary correlate his original visit to Twitter and his 31 | visit to the forum, assuming no software bugs, and a large 32 | volume of other traffic to both sites? 33 | 34 | A9: An adversary can fingerprint the protestor's browser, 35 | using the user-agent string, the plug-ins installed on that 36 | browser, window dimensions, etc., which may be enough to 37 | strongly correlate the two visits. 38 | 39 | --- 40 | 41 | Quiz 2, 2012 42 | ------------ 43 | 44 | Q2: Alyssa wants to learn the identity of a hidden service 45 | running on Tor. She plans to set up a malicious Tor OR, set 46 | up a rendezvous point on that malicious Tor OR, and send 47 | this rendezvous point's address to the introduction point of 48 | the hidden service. Then, when the hidden service connects 49 | to the malicious rendezvous point, the malicious Tor OR will 50 | record where the connection is coming from. 51 | 52 | Will Alyssa's plan work? Why or why not? 53 | 54 | A2: Will not work. A new Tor circuit is constructed between 55 | -------------------------------------------------------------------------------- /old-quizzes.html: -------------------------------------------------------------------------------- 1 |

Some questions may already be here

2 | 3 |

Quiz 2 2011

4 | 5 |

Q8: An "Occupy Northbridge" protestor has set up a Twitter 6 | account to broadcast messages under an assumed name. In 7 | order to remain anonymous, he decides to use Tor to log into 8 | the account. He installs Tor on his computer (from a 9 | trusted source) and enables it, launches Firefox, types in 10 | www.twitter.com into his browser, and proceeds to log in. 11 | What adversaries may be able to now compromise the protestor 12 | in some way as a result of him using Tor? Ignore security 13 | bugs in the Tor client itself.

14 | 15 |

A8: The protestor is vulnerable to a malicious exit node 16 | intercepting his non-HTTPS-protected connection. (Since Tor 17 | involves explicitly proxying through an exit node, this is 18 | easier than intercepting HTTP over the public internet.)

19 | 20 |

Q9: The protestor now uses the same Firefox browser to 21 | connect to another web site that hosts a discussion forum, 22 | also via Tor (but only after building a fresh Tor circuit). 23 | His goal is to ensure that Twitter and the forum cannot 24 | collude to determine that the same person accessed Twitter 25 | and the forum. To avoid third-party tracking, he deletes all 26 | cookies, HTML5 client-side storage, history, etc. from his 27 | browser between visits to different sites. How could an 28 | adversary correlate his original visit to Twitter and his 29 | visit to the forum, assuming no software bugs, and a large 30 | volume of other traffic to both sites?

31 | 32 |

A9: An adversary can fingerprint the protestor's browser, 33 | using the user-agent string, the plug-ins installed on that 34 | browser, window dimensions, etc., which may be enough to 35 | strongly correlate the two visits.

36 | 37 |
38 | 39 |

Quiz 2, 2012

40 | 41 |

Q2: Alyssa wants to learn the identity of a hidden service 42 | running on Tor. She plans to set up a malicious Tor OR, set 43 | up a rendezvous point on that malicious Tor OR, and send 44 | this rendezvous point's address to the introduction point of 45 | the hidden service. Then, when the hidden service connects 46 | to the malicious rendezvous point, the malicious Tor OR will 47 | record where the connection is coming from.

48 | 49 |

Will Alyssa's plan work? Why or why not?

50 | 51 |

A2: Will not work. A new Tor circuit is constructed between

52 | -------------------------------------------------------------------------------- /quiz2-tor.md: -------------------------------------------------------------------------------- 1 | Tor 2 | === 3 | --- 4 | ## Resources 5 | 6 | * [Paper](http://css.csail.mit.edu/6.858/2014/readings/tor-design.pdf) 7 | * Blog posts: [1](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-1), [2](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-2), [3](https://blog.torproject.org/blog/top-changes-tor-2004-design-paper-part-3) 8 | * [Lecture note from 2012](http://css.csail.mit.edu/6.858/2012/lec/l16-tor.txt) 9 | * [Old quizzes](http://css.csail.mit.edu/6.858/2014/quiz.html) 10 | 11 | --- 12 | 13 | ## Overview 14 | 15 | - Goals 16 | - Mechanisms 17 | * Streams/Circuits 18 | * Rendezvous Points & Hidden services 19 | - Directory Servers 20 | - Attacks & Defenses 21 | - Practice Problems 22 | 23 | --- 24 | 25 | ## Goals 26 | 27 | - Anonymous communication 28 | - Responder anonymity 29 | * If I run a service like "mylittleponey.com" I don't want anyone 30 | associating me with that service 31 | - Deployability / usability 32 | * Why a security goal? 33 | + Because it increases the # of people using Tor, i.e. the _anonimity set_ 34 | - ...which in turn increases security 35 | * (adversary has more people to distinguish you amongst) 36 | - TCP layer (Why? See explanations in lecture notes above) 37 | - **NOT** P2P (because more vulnerable?) 38 | 39 | --- 40 | 41 | ## Circuit creation 42 | 43 | TODO: Define circuit 44 | 45 | Alice multiplexes many TCP streams onto a few _circuits_. Why? Low-latency system, expensive to make new circuit. 46 | 47 | TODO: Define Onion Router (OR) 48 | 49 | _Directory server_: State of network, OR public keys, OR IPs 50 | 51 | ORs: 52 | 53 | - All connected to one another with TLS 54 | - See blog post 1: Authorities vote on consensus directory document 55 | 56 | Example: 57 | 58 | [ Draw example of Alice building a new circuit ] 59 | [ and connecting to Twitter. ] 60 | 61 | --- 62 | 63 | ## Rendezvous Points & Hidden services 64 | 65 | Example: 66 | 67 | [ Add an example of Alice connecting to Bob's ] 68 | [ hidden service on Tor ] 69 | 70 | Bob runs hidden service (HS): 71 | 72 | - Decides on long term PK/SK pair 73 | - Publish introduction points, advertises on lookup service 74 | - Builds a circuit to _Intro Points_, waits for messages 75 | 76 | Alice wants to connect to Bob's HS: 77 | 78 | - Build circuit to new _Rendezvous Point (RP)_ (any OR) 79 | * Gives _cookie_ to RP 80 | - Builds circuit to one of Bob's intro points and sends message 81 | * with `{RP, Cookie, g^x}_PK(Bob)` 82 | - Bob builds circuit to RP, sends `{ cookie, g^y, H(K)}` 83 | - RP connects Alice and Bob 84 | -------------------------------------------------------------------------------- /quiz2-tor.html: -------------------------------------------------------------------------------- 1 |

Tor

2 | 3 |
4 | 5 |

Resources

6 | 7 | 13 | 14 |
15 | 16 |

Overview

17 | 18 | 29 | 30 |
31 | 32 |

Goals

33 | 34 | 55 | 56 |
57 | 58 |

Circuit creation

59 | 60 |

TODO: Define circuit

61 | 62 |

Alice multiplexes many TCP streams onto a few circuits. Why? Low-latency system, expensive to make new circuit.

63 | 64 |

TODO: Define Onion Router (OR)

65 | 66 |

Directory server: State of network, OR public keys, OR IPs

67 | 68 |

ORs:

69 | 70 | 74 | 75 |

Example:

76 | 77 |
[ Draw example of Alice building a new circuit ]
 78 | [ and connecting to Twitter.                   ]
 79 | 
80 | 81 |
82 | 83 |

Rendezvous Points & Hidden services

84 | 85 |

Example:

86 | 87 |
[ Add an example of Alice connecting to Bob's  ]
 88 | [ hidden service on Tor                        ]
 89 | 
90 | 91 |

Bob runs hidden service (HS):

92 | 93 | 98 | 99 |

Alice wants to connect to Bob's HS:

100 | 101 | 113 | -------------------------------------------------------------------------------- /README.html: -------------------------------------------------------------------------------- 1 |

Computer systems security notes (6.858, Fall 2014)

2 | 3 |

Lecture notes from 6.858, taught by Prof. Nickolai Zeldovich and Prof. James Mickens in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 course website.

4 | 5 | 33 | -------------------------------------------------------------------------------- /quiz2-medical-dev.md: -------------------------------------------------------------------------------- 1 | 6.858 Quiz 2 Review 2 | =================== 3 | 4 | Medical Device Security 5 | ----------------------- 6 | 7 | FDA standards: Semmelweis e.g. `=>` Should wash hands 8 | 9 | Defirbillator: 10 | 11 | - 2003: Implanted defibrillator use WiFi. What could 12 | possibly go wrong? 13 | - Inside: battery, radio, hermetically sealed 14 | 15 | Why wireless? 16 | 17 | - Old way: Inject a needle into arm to twist dial, risk of infection :( 18 | 19 | **Q:** What are security risks of wireless? 20 | 21 | - Unsafe practices - implementation errors. 22 | - Manufacturer and User Facility Device Experience (MAUDE) database 23 | * Cause of death: buffer overflow in infusion pump. 24 | * Error detected, but brought to safe mode, turn off pump. 25 | * Patient died after increase in brain pressure because 26 | no pump, because of buffer overflow. 27 | 28 | #### Human factors and software 29 | 30 | Why unique? 31 | 32 | 500+ deaths 33 | 34 | E.g. User interface for delivering dosage to patients did not properly indicate 35 | whether it expected hours or minutes as input (hh:mm:ss). Led to order of 36 | magnitude error: 20 min vs. the intended 20 hrs. 37 | 38 | #### Managerial issues 39 | 40 | Medical devices also need to take software updates. 41 | 42 | E.g. McAffee classified DLL as malicious, quarantines, 43 | messed up hospital services. 44 | 45 | E.g. hospitals using Windows XP: 46 | - There are no more security updates from Microsoft for XP, but still new medical products shipping Windows XP. 47 | 48 | 49 | #### FDA Cybersecurity Guidance 50 | 51 | What is expected to be seen from manufacturers? How they 52 | have thought through the security problems / risks / 53 | mitigation strategies / residual risks? 54 | 55 | 56 | #### Adversary stuff 57 | 58 | Defibrillator & Implants 59 | 60 | This section of the notes refers to the discussion of attacks on implanted defibrillators from Kevin Fu's lecture. In one example he gave, the implanted devices are wirelessly programmed with another device called a "wand", which uses a proprietary (non-public, non-standardized) protocol. Also, the wand transmits (and the device listens) on specially licensed EM spectrum (e.g. not WiFI or bluetooth). The next two lines describe the surgical process by which the defibrillator is implanted in the patient. 61 | 62 | - Device programmed w/ wand, speaking proprietary protocol 63 | over specially licensed spectrum. (good idea w.r.t. 64 | security?) 65 | - Patient awake but numbed and sedated 66 | - Six people weave electrodes through blood vessel.... 67 | 68 | - Patient given a base station, looks like AP, speaks proprietary RF to implant, 69 | data sent via Internet to healthcare company 70 | 71 | - Communication between device and programmer: no crypto / auth, data sent in plaintext 72 | - Device stores: Patient name, DOB, make & model, serial no., more... 73 | 74 | - ???????? Use a software radio (USRP/GNU Radio Software) 75 | 76 | **Q:** Can you wirelessly induce a fatal heart rhythm 77 | **A:** Yes. Device emitted 500V shock in 1 msec. E.g. get kicked in chest by horse. 78 | 79 | Devices fixed through software updates? 80 | 81 | #### Healthcare Providers 82 | 83 | Screenshot of "Hospitals Stuck with Windows XP": 600 Service Pack 0 Windows XP devices in the hospital! 84 | 85 | Average time to infection for healthcare devices: 86 | - 12 days w/o protection 87 | - 1 year w/ antivirus 88 | 89 | #### Vendors are a common source of infection 90 | 91 | USB drive is a common vector for infection. 92 | 93 | #### Medical device signatures over download 94 | 95 | "Click here to download software update" 96 | 97 | - Website appears to contain malware 98 | - Chrome: Safe web browsing service detected "ventilator" malware 99 | 100 | "Drug Compounder" example: 101 | 102 | - Runs Windows XP embedded 103 | - **FDA expects manufacturers to keep SW up to date** 104 | - **Manufacturers claim cannot update because of FDA** 105 | * _double you tea f?_ 106 | 107 | #### How significant intentional malicious SW malfunctions? 108 | 109 | E.g. 1: Chicago 1982: Somebody inserts cyanide into Tylenol 110 | E.g. 2: Somebody posted flashing images on epillepsy support group website. 111 | 112 | 113 | #### Why do you trust sensors? 114 | 115 | E.g. smartphones. Batteryless sensors demo. Running on an MSP430. uC believes 116 | anything coming from ADC to uC. Possible to do something related to resonant 117 | freq. of wire there? 118 | 119 | Inject interference into the baseband 120 | 121 | - Hard to filter in the analog 122 | - `=>` Higher quality audio w/ interference than microphone 123 | 124 | Send a signal that matches resonant frequency of the wire. 125 | 126 | Treat circuit as unintentional demodulator 127 | 128 | - Can use high frequency signal to trick uC into thinking 129 | - there is a low frequency signal due to knowing interrupt 130 | frequency of uC and related properties. 131 | 132 | Cardiac devices vulnerable to baseband EMI 133 | 134 | - Insert intentional EM interference in baseband 135 | 136 | Send pulsed sinewave to trick defibrilator into thinking heart beating correctly 137 | 138 | - ????? Works in vitro 139 | - Hard to replicate in a body or saline solution 140 | 141 | Any defenses? 142 | 143 | - Send an extra pacing pulse right after a beat 144 | * a real heart shouldn't send a response 145 | 146 | #### Detecting malware at power outlets 147 | 148 | Embedded system `<-->` WattsUpDoc `<-->` Power outlet 149 | 150 | #### Bigger problems than security? 151 | 152 | **Q:** True or false: Hackers breaking into medical devices is 153 | the biggest risk at the moment. 154 | 155 | **A:** False. Wide scale unavailability of patient care and integrity of 156 | medical sensors are more important. 157 | 158 | Security cannot be bolted on 159 | 160 | - E.g. MRI on windows 95 161 | - E.g. Pacemaker programmer running on OS/2 162 | 163 | Check gmail on medical devices, etc. 164 | 165 | Run pandora on medical machine. 166 | 167 | Keep clinical workflow predictable. 168 | 169 | -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | Computer systems security notes (6.858, Fall 2014) 2 | ================================================== 3 | 4 | Lecture notes from 6.858, taught by [Prof. Nickolai Zeldovich](http://people.csail.mit.edu/nickolai/) and [Prof. James Mickens](http://research.microsoft.com/en-us/people/mickens/) in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 [course website](http://css.csail.mit.edu/6.858/2014/schedule.html). 5 | 6 | * Lecture **1**: [Introduction](l01-intro.html): what is security, what's the point, no perfect security, policy, threat models, assumptions, mechanism, buffer overflows 7 | * Lecture **2**: [Control hijacking attacks](l02-baggy.html): buffer overflows, stack canaries, bounds checking, electric fences, fat pointers, shadow data structure, Jones & Kelly, baggy bounds checking 8 | * Lecture **3**: [More baggy bounds and return oriented programming](l03-brop.html): costs of bounds checking, non-executable memory, address-space layout randomization (ASLR), return-oriented programming (ROP), stack reading, blind ROP, gadgets 9 | * Lecture **4**: [OKWS](l04-okws.html): privilege separation, Linux discretionary access control (DAC), UIDs, GIDs, setuid/setgid, file descriptors, processes, the Apache webserver, chroot jails, remote procedure calls (RPC) 10 | * Lecture **5**: **Penetration testing** _guest lecture_ by Paul Youn, iSEC Partners 11 | * Lecture **6**: [Capsicum](l06-capsicum.html): confused deputy problem, ambient authority, capabilities, sandboxing, discretionary access control (DAC), mandatory access control (MAC), Capsicum 12 | * Lecture **7**: [Native Client (NaCl)](l07-nacl.html): sandboxing x86 native code, software fault isolation, reliable disassembly, x86 segmentation 13 | * Lecture **8**: [Web Security, Part I](l08-web-security.html): modern web browsers, same-origin policy, frames, DOM nodes, cookies, cross-site request forgery (CSRF) attacks, DNS rebinding attacks, browser plugins 14 | * Lecture **9**: [Web Security, Part II](l09-web-defenses.html): cross-site scripting (XSS) attacks, XSS defenses, SQL injection atacks, Django, session management, cookies, HTML5 local storage, HTTP protocol ambiguities, covert channels 15 | * Lecture **10**: **Symbolic execution** _guest lecture_ by Prof. Armando Solar-Lezama, MIT CSAIL 16 | * Lecture **11**: **Ur/Web** _guest lecture_ by Prof. Adam Chlipala, MIT, CSAIL 17 | * Lecture **12**: [TCP/IP security](l12-tcpip.html): threat model, sequence numbers and attacks, connection hijacking attacks, SYN flooding, bandwidth amplification attacks, routing 18 | * Lecture **13**: [Kerberos](l13-kerberos.html): Kerberos architecture and trust model, tickets, authenticators, ticket granting servers, password-changing, replication, network attacks, forward secrecy 19 | * Lecture **14**: [ForceHTTPS](l14-forcehttps.html): certificates, HTTPS, Online Certificate Status Protocol (OCSP), ForceHTTPS 20 | * Lecture **15**: **Medical software** _guest lecture_ by Prof. Kevin Fu, U. Michigan 21 | * Lecture **16**: [Timing attacks](l16-timing-attacks.html): side-channel attacks, RSA encryption, RSA implementation, modular exponentiation, Chinese remainder theorem (CRT), repeated squaring, Montgomery representation, Karatsuba multiplication, RSA blinding, other timing attacks 22 | * Lecture **17**: [User authentication](l17-authentication.html): what you have, what you know, what you are, passwords, challenge-response, usability, deployability, security, biometrics, multi-factor authentication (MFA), MasterCard's CAP reader 23 | * Lecture **18**: [Private browsing](l18-priv-browsing.html): private browsing mode, local and web attackers, VM-level privacy, OS-level privacy, OS-level privacy, what browsers implement, browser extensions 24 | * Lecture **19**: **Tor** _guest lecture_ by Nick Mathewson, Tor Project 25 | + 6.858 notes from 2012 on [Anonymous communication](l19-tor.html): onion routing, Tor design, Tor circuits, Tor streams, Tor hidden services, blocking Tor, dining cryptographers networks (DC-nets) 26 | * Lecture **20**: [Mobile phone security](l20-android.html): Android applications, activities, services, content providers, broadcast receivers, intents, permissions, labels, reference monitor, broadcast intents 27 | * Lecture **21**: [Information flow tracking](l21-taintdroid.html): TaintDroid, Android data leaks, information flow control, taint tracking, taint flags, implicit flows, x86 taint tracking, TightLip 28 | * Lecture **22**: **MIT's IS&T** _guest lecture_ by Mark Silis and David LaPorte, MIT IS&T 29 | * Lecture **23**: [Security economics](l23-click-trajectories.html): economics of cyber-attacks, the spam value chain, advertising, click-support, realization, CAPTCHAs, botnets, payment protocols, ethics 30 | 31 | Papers 32 | ------ 33 | 34 | List of papers we read ([papers/](papers/)): 35 | 36 | - [Baggy bounds checking](papers/baggy.pdf) 37 | - [Hacking blind](papers/brop.pdf) 38 | - [OKWS](papers/okws.pdf) 39 | - [The confused deputy](papers/confused-deputy.pdf) (or why capabilities might have been invented) 40 | - [Capsicum](papers/capsicum.pdf) (capabilities) 41 | - [Native Client](papers/nacl.pdf) (sandboxing x86 code) 42 | - [OWASP Top 10](papers/owasp-top-10.pdf), the most critical web application security risks 43 | - [KLEE](papers/klee.pdf) (symbolic execution) 44 | - [Ur/Web](papers/urweb.pdf) (functional programming for the web) 45 | - [A look back at "Security problems in the TCP/IP protocol suite"](papers/lookback-tcpip.pdf) 46 | - [Kerberos](papers/kerberos.pdf): An authentication service for open network systems 47 | - [ForceHTTPs](papers/forcehttps.pdf) 48 | - [Trustworthy Medical Device Software](papers/medical-sw.pdf) 49 | - [Remote timing attacks are practical](papers/brumley-timing.pdf) 50 | - [The quest to replace passwords](papers/passwords.pdf) 51 | - [Private browsing modes](papers/private-browsing.pdf) 52 | - [Tor](papers/tor-design.pdf): the second-generation onion router 53 | - [Understanding android security](papers/android.pdf) 54 | - [TaintDroid](papers/taintdroid.pdf): an information-flow tracking system for realtime privacy monitoring on smartphones 55 | - [Click trajectories](papers/trajectories.pdf): End-to-end analysis of the spam value chain 56 | -------------------------------------------------------------------------------- /previous-years/l12-resin.txt: -------------------------------------------------------------------------------- 1 | Resin 2 | ===== 3 | 4 | administrivia: 5 | quiz 1 on Wednesday 6 | Xi: office hours for quiz review questions? 7 | lab 3 out today, first part due in ~1.5 weeks 8 | 9 | what kinds of problems is this paper trying to address? 10 | missing security checks in application code 11 | sanitizing user inputs for SQL injection or cross-site scripting 12 | calling access control functions for sensitive data 13 | protected wiki page; user's password 14 | checking where code came from before running it 15 | 16 | one such problem: cross-site scripting 17 | setting: one web server, multiple users 18 | users interact with each other (e.g. get a list of online users) 19 | attacker's plan: inject JS code in a script tag as part of user name 20 | victim's browser sees this code in the HTML page, runs it 21 | what kind of code could attacker inject? 22 | maybe steal the user's HTTP cookie 23 | how? create an image tag containing document.cookie 24 | why doesn't the browser's same-origin policy protect the cookie? 25 | as far as the browser is concerned, code came from server's origin 26 | lab 1's web server was vulnerable, as it turns out! 27 | http://.../ 28 | returns: File not found: / 29 | 30 | a similar problem: SQL injection 31 | saw examples in previous lectures 32 | problems arise if programmer forgets to quote user inputs 33 | 34 | different kind of a problem: access control checks 35 | might have protected pages in a wiki, forget to call ACL function 36 | concrete example: hotcrp's password disclosure 37 | typical web site, sends password reminders 38 | email preview mode displays emails instead of sending 39 | turns out to display pw reminders in the requesting user's browser 40 | kind-of like the confused deputy prob: no module is really at fault? 41 | 42 | why are the checks missing? 43 | lots of places in the code where they need to be performed 44 | think of application as a black box; lots of inputs and outputs 45 | suppose that for a given output, only some inputs were OK 46 | e.g. sanitize user inputs in a SQL query, but not app's own data 47 | hard to tell where the output's data came from 48 | so, programmers try to do checks on all possible paths 49 | programmer forgets them on some paths from input to output 50 | 51 | what's the plan to prevent these? 52 | think of the checks as being associated with data flows input->output 53 | associate checks with data objects like user input or password strings 54 | perform checks whenever data gets used in some interesting way 55 | 56 | what does resin provide? 57 | [ diagram from figure 1 ] 58 | data tracking 59 | how does this work? assumes a language runtime 60 | python, php have a byte code representation, sort-of like java 61 | resin tags strings, integers with a policy object 62 | changes the implementation of operations that manipulate data 63 | why only tag strings and integers? what about other things? 64 | what kinds of operations propagate? 65 | why not propagate across "covert" or "implicit" channels? 66 | why byte-level tracking? 67 | what happens when data items are combined? 68 | concat two strings 69 | add two integers 70 | take a substring 71 | what happens for sha1sum() or touppercase() [which uses array lookups]? 72 | policy objects 73 | contains code to implement policy for its data 74 | what methods does the programmer have to implement in a policy object? 75 | export_check 76 | merge [optional] 77 | filter objects 78 | provided by default by resin for most external channels 79 | context information: combination of resin- and programmer-supplied 80 | how much synchronization does there need to be between filters & policies? 81 | 82 | what are all of the uses for filter objects? 83 | default filters for external boundaries 84 | persistent serialization 85 | files: extended attributes 86 | database: extra columns for policies, SQL rewriting 87 | code imports 88 | interpreter's input is yet another kind of channel 89 | write access control 90 | persistent filters on FS objects like files, directories 91 | almost a different kind of check: tied to an external object, not data 92 | propagation rules for functions 93 | sha1sum(), touppercase(), .. 94 | 95 | how would you use resin to prevent missing checks? 96 | hotcrp 97 | cross-site scripting 98 | 99 | does this system actually work? 100 | two versions of resin, one for python and one for php 101 | prevented known bugs in real apps 102 | prevented unknown bugs in real apps too 103 | few different kinds of bugs (ACL, XSS, SQL inj, directory traversal, ..) 104 | is it possible to forget checks with resin? 105 | what does resin provide/guarantee? 106 | are there potential pitfalls with resin's assertions? 107 | how much code is required to write these assertions? why? 108 | how specific are the assertions to the bug you want to prevent? why? 109 | how did they prevent the myphpscripts login library bug? 110 | 111 | what's the cost? 112 | need to deploy a new php/python interpreter 113 | need to write some assertions (policy objects?) 114 | runtime overheads: memory to store policies, CPU time to track them 115 | major cost: serializing policies to SQL, file system 116 | could that be less? 117 | 118 | how else can you avoid these missing check problems? 119 | IFC does data tracking in some logical sense 120 | trade-off: redesign/rewrite your app around some checks 121 | hard to redesign around multiple checks or to add a check later 122 | java stack inspection 123 | can't automatically perform checks for things that are off the stack 124 | can check if file is being read through a sanitizing/ACL-check function 125 | crimps programmer's style, but in theory possible 126 | express some of these checks in the type system 127 | maybe have a special kind of UntrustedString vs SafeString 128 | and conversely SQLString and HTMLString which get used for output 129 | special conversion rules for them 130 | could even do static checks for these data flows 131 | for password disclosure, ACL checks: maybe a delayed-check string? 132 | when about to send out the string, tell it where you're sending it 133 | almost like resin design 134 | problem with using the type system: 135 | policies intertwined with code throughout the app 136 | to add a new check, need to change types everywhere 137 | resin is almost like a shadow type system 138 | 139 | could you apply resin to other applications, or other environments? 140 | different languages? 141 | different machines (cluster of web servers)? 142 | no language runtime? 143 | untrusted/malicious code? 144 | 145 | -------------------------------------------------------------------------------- /quiz2-medical-dev.html: -------------------------------------------------------------------------------- 1 |

6.858 Quiz 2 Review

2 | 3 |

Medical Device Security

4 | 5 |

FDA standards: Semmelweis e.g. => Should wash hands

6 | 7 |

Defirbillator:

8 | 9 | 14 | 15 |

Why wireless?

16 | 17 | 20 | 21 |

Q: What are security risks of wireless?

22 | 23 | 33 | 34 |

Human factors and software

35 | 36 |

Why unique?

37 | 38 |

500+ deaths

39 | 40 |

E.g. User interface for delivering dosage to patients did not properly indicate 41 | whether it expected hours or minutes as input (hh:mm:ss). Led to order of 42 | magnitude error: 20 min vs. the intended 20 hrs.

43 | 44 |

Managerial issues

45 | 46 |

Medical devices also need to take software updates.

47 | 48 |

E.g. McAffee classified DLL as malicious, quarantines, 49 | messed up hospital services.

50 | 51 |

E.g. hospitals using Windows XP: 52 | - There are no more security updates from Microsoft for XP, but still new medical products shipping Windows XP.

53 | 54 |

FDA Cybersecurity Guidance

55 | 56 |

What is expected to be seen from manufacturers? How they 57 | have thought through the security problems / risks / 58 | mitigation strategies / residual risks?

59 | 60 |

Adversary stuff

61 | 62 |

Defibrillator & Implants

63 | 64 |

This section of the notes refers to the discussion of attacks on implanted defibrillators from Kevin Fu's lecture. In one example he gave, the implanted devices are wirelessly programmed with another device called a "wand", which uses a proprietary (non-public, non-standardized) protocol. Also, the wand transmits (and the device listens) on specially licensed EM spectrum (e.g. not WiFI or bluetooth). The next two lines describe the surgical process by which the defibrillator is implanted in the patient.

65 | 66 | 78 | 79 |

Q: Can you wirelessly induce a fatal heart rhythm
80 | A: Yes. Device emitted 500V shock in 1 msec. E.g. get kicked in chest by horse.

81 | 82 |

Devices fixed through software updates?

83 | 84 |

Healthcare Providers

85 | 86 |

Screenshot of "Hospitals Stuck with Windows XP": 600 Service Pack 0 Windows XP devices in the hospital!

87 | 88 |

Average time to infection for healthcare devices: 89 | - 12 days w/o protection 90 | - 1 year w/ antivirus

91 | 92 |

Vendors are a common source of infection

93 | 94 |

USB drive is a common vector for infection.

95 | 96 |

Medical device signatures over download

97 | 98 |

"Click here to download software update"

99 | 100 | 104 | 105 |

"Drug Compounder" example:

106 | 107 | 115 | 116 |

How significant intentional malicious SW malfunctions?

117 | 118 |

E.g. 1: Chicago 1982: Somebody inserts cyanide into Tylenol 119 | E.g. 2: Somebody posted flashing images on epillepsy support group website.

120 | 121 |

Why do you trust sensors?

122 | 123 |

E.g. smartphones. Batteryless sensors demo. Running on an MSP430. uC believes 124 | anything coming from ADC to uC. Possible to do something related to resonant 125 | freq. of wire there?

126 | 127 |

Inject interference into the baseband

128 | 129 | 133 | 134 |

Send a signal that matches resonant frequency of the wire.

135 | 136 |

Treat circuit as unintentional demodulator

137 | 138 | 143 | 144 |

Cardiac devices vulnerable to baseband EMI

145 | 146 | 149 | 150 |

Send pulsed sinewave to trick defibrilator into thinking heart beating correctly

151 | 152 | 156 | 157 |

Any defenses?

158 | 159 | 165 | 166 |

Detecting malware at power outlets

167 | 168 |

Embedded system <--> WattsUpDoc <--> Power outlet

169 | 170 |

Bigger problems than security?

171 | 172 |

Q: True or false: Hackers breaking into medical devices is 173 | the biggest risk at the moment.

174 | 175 |

A: False. Wide scale unavailability of patient care and integrity of 176 | medical sensors are more important.

177 | 178 |

Security cannot be bolted on

179 | 180 | 184 | 185 |

Check gmail on medical devices, etc.

186 | 187 |

Run pandora on medical machine.

188 | 189 |

Keep clinical workflow predictable.

190 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 |

Computer systems security notes (6.858, Fall 2014)

2 | 3 |

Lecture notes from 6.858, taught by Prof. Nickolai Zeldovich and Prof. James Mickens in 2014. These lecture notes are slightly modified from the ones posted on the 6.858 course website.

4 | 5 | 33 | 34 |

Papers

35 | 36 |

List of papers we read (papers/):

37 | 38 | 60 | -------------------------------------------------------------------------------- /previous-years/l14-resin.txt: -------------------------------------------------------------------------------- 1 | Resin 2 | ===== 3 | 4 | what kinds of problems is this paper trying to address? 5 | threat model 6 | trusted: hardware/os/language runtime/db/app code 7 | untrusted: external inputs (users/whois servers) 8 | non-goals: buffer overflows, malicious apps 9 | programming errors: missing security checks in application code 10 | sanitizing user inputs for code injection 11 | calling access control functions for sensitive data 12 | protected wiki page; user's password 13 | 14 | Example: one web server, multiple users 15 | users interact with each other 16 | reading posts in a web forum 17 | avatar url / upload 18 | post content 19 | profile / signature 20 | attacker's plan: inject JS code / forge requests 21 | victim's browser sees this code in the HTML page, runs it 22 | what kind of code could attacker inject? 23 | steal the cookie 24 | transfer credits 25 | acl 26 | privileged operations (for admin) 27 | why doesn't the browser's same-origin policy protect the cookie? 28 | as far as the browser is concerned, code came from server's origin 29 | lower level: the zookws web server was vulnerable 30 | http://.../ 31 | returns: File not found: / 32 | 33 | a similar problem: whois injection 34 | admin views logs: user, ip, domain 35 | malicious whois server 36 | problems arise if programmer forgets to quote external inputs 37 | 38 | different kind of a problem: access control checks 39 | might have protected pages in a wiki, forget to call ACL function 40 | example: hotcrp's password disclosure 41 | typical web site, sends password reminders 42 | email preview mode displays emails instead of sending 43 | turns out to display pw reminders in the requesting user's browser 44 | kind-of like the confused deputy prob: no module is really at fault? 45 | 46 | why are the checks missing? 47 | lots of places in the code where they need to be performed 48 | think of application as a black box; lots of inputs and outputs 49 | suppose that for a given output, only some inputs were OK 50 | e.g. sanitize user inputs in a SQL query, but not app's own data 51 | hard to tell where the output's data came from 52 | so, programmers try to do checks on all possible paths 53 | programmer forgets them on some paths from input to output 54 | plug-in developers may be unaware of security plan 55 | 56 | what's the plan to prevent these? 57 | think of the checks as being associated with data flows input->output 58 | associate checks with data objects like user input or password strings 59 | perform checks whenever data gets used in some interesting way 60 | 61 | what does resin provide? 62 | hotcrp data: password 63 | [ diagram from figure 1 ] 64 | policy objects 65 | contains code to implement policy for its data 66 | hotcrp: only email password to the user or the pc chair 67 | what methods does the programmer have to implement in a policy object? 68 | export_check(context) 69 | merge [optional] 70 | filter objects 71 | data flow boundaries 72 | channels with contexts: http, email, ... 73 | provided by default by resin for most external channels 74 | invoke export_check if possible 75 | data tracking 76 | how does this work? assumes a language runtime 77 | python, php have a byte code representation, sort-of like java 78 | resin tags strings, integers with a policy object 79 | changes the implementation of operations that manipulate data 80 | why only tag strings and integers? what about other things? 81 | what kinds of operations propagate? 82 | why not propagate across "covert" or "implicit" channels? 83 | why byte-level tracking? 84 | what happens when data items are combined? 85 | common: concat strings (automatic via byte-level tracking) 86 | rare: add integers 87 | 88 | what are all of the uses for filter objects? 89 | default filters for external boundaries: sockets, pipes, http, email 90 | persistent serialization 91 | files: extended attributes 92 | database: extra columns for policies, SQL rewriting 93 | example: write password to file/db 94 | code imports 95 | interpreter's input is yet another kind of channel 96 | write access control 97 | persistent filters on FS objects like files, directories 98 | almost a different kind of check: tied to an external object, not data 99 | propagation rules for functions 100 | sha1(), strtoupper(), .. 101 | 102 | how would you use resin to prevent missing checks? 103 | hotcrp 104 | cross-site scripting: profile 105 | UntrustedData & XFilter calls strip and removes the policy? 106 | define UntrustedData and JSSantitized, empty export_check 107 | input tagged UntrustedData 108 | strip function attach JSSantitized 109 | output filter checks strings must contain JSSantitized if UntrustedData exists 110 | alternative: UntrustedData policy only; filter parses and sanitizes strings 111 | 112 | does this system actually work? 113 | two versions of resin, one for python and one for php 114 | prevented known bugs in real apps 115 | prevented unknown bugs in real apps too 116 | few different kinds of bugs (ACL, XSS, SQL inj, directory traversal, ..) 117 | is it possible to forget checks with resin? 118 | what does resin provide/guarantee? 119 | are there potential pitfalls with resin's assertions? 120 | how much code is required to write these assertions? why? 121 | how specific are the assertions to the bug you want to prevent? why? 122 | how did they prevent the myphpscripts login library bug? 123 | 124 | what's the cost? 125 | need to deploy a new php/python interpreter 126 | need to write some assertions (policy objects?) 127 | runtime overheads: memory to store policies, CPU time to track them 128 | major cost: serializing policies to SQL, file system 129 | could that be less? e.g. avoid storing email twice in hotcrp? 130 | 131 | how else can you avoid these missing check problems? 132 | IFC does data tracking in some logical sense 133 | trade-off: redesign/rewrite your app around some checks 134 | hard to redesign around multiple checks or to add a check later 135 | java stack inspection 136 | can't automatically perform checks for things that are off the stack 137 | can check if file is being read through a sanitizing/ACL-check function 138 | crimps programmer's style, but in theory possible 139 | express some of these checks in the type system 140 | maybe have a special kind of UntrustedString vs SafeString 141 | and conversely SQLString and HTMLString which get used for output 142 | special conversion rules for them 143 | could even do static checks for these data flows 144 | for password disclosure, ACL checks: maybe a delayed-check string? 145 | when about to send out the string, tell it where you're sending it 146 | almost like resin design 147 | problem with using the type system: 148 | policies intertwined with code throughout the app 149 | to add a new check, need to change types everywhere 150 | resin is almost like a shadow type system 151 | 152 | could you apply resin to other applications, or other environments? 153 | different languages? 154 | different machines (cluster of web servers)? 155 | no language runtime? 156 | untrusted/malicious code? 157 | 158 | -------------------------------------------------------------------------------- /previous-years/l22-usability-2.txt: -------------------------------------------------------------------------------- 1 | Security Usability 2 | ================== 3 | 4 | is this problem real? concrete examples of things that go wrong? 5 | 6 | why is usable security a big problem? 7 | secondary tasks: users concerned with something other than security 8 | negative goal / weakest link: must consider entire system 9 | abstract, hard to reason about; little feedback: security often not tangible 10 | users don't fully understand threats, mechanisms they're using 11 | 12 | why do we need users in the loop? 13 | good reasons: users should be ultimately in control of their security 14 | bad reasons: programmers didn't know what to do, so they asked the user 15 | backwards compatibility 16 | 17 | what does the paper think constitutes usability for PGP? 18 | encrypt/decrypt, sign/verify signatures 19 | generate and distribute public key for encryption 20 | generate and publish public key for signing 21 | obtain other users' keys for verifying signatures 22 | obtain other users' keys for encrypting 23 | avoid errors (trusting wrong keys, accidentally not encrypting, ..) 24 | 25 | how do they evaluate it? 26 | 27 | cognitive walkthrough 28 | inspection by a developer trying to simulate a user's mindset 29 | overly-simplistic metaphors 30 | physical keys are similar to symmetric crypto, not public-key crypto 31 | quill pens lack the idea of a key being involved; key vs signature 32 | leads to faulty intuition 33 | not exposing key type information more explicitly 34 | good principle: if user needs to worry about something, expose it well 35 | users had to decide how to encrypt and sign a particular message 36 | old vs new key type icons not well documented 37 | figure 3: recipient dialog box talks about users, not keys 38 | implicit trust policy that might not be obvious to users 39 | web-of-trust model, keys can be trusted through multiple marginal sigs 40 | user might not realize what's going on 41 | not making key server operations explicit? unclear what's the precise risk 42 | failing to upload revocations to the key server 43 | publicizing or revoking keys unintentionally 44 | irreversible operations not well described 45 | deleting private key: should tell user they won't be able to decrypt, .. 46 | publicizing/revoking keys: warn the user it's a permanent change 47 | too much info 48 | UI focused on exposing what's technically hard: key trust management 49 | maybe a good model would be to ask the user to specify a threat model 50 | beginner: worried about opportunistic attackers stealing plaintext 51 | medium: worried about attacker injecting malicious keys? 52 | advanced: worried about attacker compromising some friends? 53 | more advanced: worried about cryptographic attack on small key sizes 54 | worry: users not good at estimating risk 55 | e.g. a worm might easily compromise friends' machines and sign keys 56 | 57 | lab experiment 58 | users confused about how the keys fit into the security model 59 | is something a key or a message? 60 | maybe extract as much info as possible from supplied data? 61 | could tell the user it's a key vs message based on headers etc 62 | where do keys come from? who generates them? 63 | need to use recipient's key rather than my own (sender's) 64 | key icons confusing because they don't differentiate public vs private 65 | noone managed to handle mixed key types in a single message 66 | practical solution was to send separate messages to each recipient 67 | perhaps sacrifice generality for usability? 68 | key trust questions were not prominent 69 | some users concerned about why they should trust keys 70 | one user assumed keys were OK because signed by campaign manager 71 | (but is campaign manager key's OK?) 72 | noone used PGP's key trust model 73 | overall results 74 | 4/12 managed to send an encrypted, signed email 75 | 3/12 disclosed the secret message in plaintext 76 | what does this mean? 77 | how effective is PGP in practice? 78 | maybe not so dismal for users that learn to use it over time 79 | on the other hand, easy to make dangerous mistakes 80 | all users disinclined to use PGP further 81 | what other experiments would be valuable? 82 | no attackers in the experiment 83 | would users notice a bad signature? 84 | 85 | phishing attacks 86 | look-alike domains 87 | visually similar (bankofthevvest.com) 88 | exploit incorrect user intuition (ebay-security.com) 89 | unfortunately even legitimate companies often outsource some services! 90 | e.g. URLs like "ebay.somesurveysite.com" 91 | visual deception 92 | copy logos, site layout 93 | inject look-alike security indicators 94 | create new windows that look like other dialog boxes 95 | 96 | why is phishing such a big problem? what UI security problems contribute to it? 97 | novice users don't understand the threats they are facing 98 | users don't have a clear mental model of the browser's security policy 99 | users don't understand technical details of what constitutes an origin 100 | users don't understand what to look for in an SSL certificate / EV certs 101 | users don't understand implications of security decisions 102 | allow cookie? allow non-SSL content? 103 | java security model: grant code from developer X access to FS/net? 104 | browsers have complex security indicators 105 | need to look at origin in URL bar, SSL certificate 106 | security indicators can be absent instead of indicating a warning/error 107 | e.g. if site is non-SSL, nothing out-of-the-ordinary appears to the user 108 | 109 | techniques to combat phishing? 110 | most common: maintain a database of known phishing sites 111 | why isn't this fully effective? 112 | active vs passive warnings 113 | habituation: users accustomed to warnings/errors 114 | users focused on getting their work done 115 | if the warning gives an option to continue, users may think it's OK 116 | 117 | more intrusive measures are often more effective here 118 | replace passwords with some other form of auth (smartcard, PAKE, etc) 119 | only works for credentials; attackers might still steal DOB, SSN, .. 120 | turn phishing into online attack 121 | site must display an agreed-upon image before user enters password 122 | can be hard for users to comprehend how and what this defends from 123 | 124 | other human factors in system security? 125 | social engineering attacks 126 | least privilege can conflict with allowing users to do their work 127 | differentiating between trust in users vs trust in users' machines 128 | 129 | principles for designing usable secure systems? 130 | avoid false positives in security warnings (can make them errors then?) 131 | active security warnings to force user to make a choice (cannot ignore) 132 | present users with useful choices when possible 133 | users want to perform their task, don't want to choose "stop" option 134 | e.g. try to look up the correct key in a PGP key server? 135 | search google for an authentic web site vs phishing attack? 136 | secure defaults; secure by design; "invisible security" 137 | when does this work? 138 | when is this insufficient? 139 | intuitive security mechanisms that make sense to the user 140 | some of the windows "privacy" knobs or wizards that give a few options 141 | train users 142 | users unlikely to spend time to learn on their own 143 | interesting idea: try to train users as part of normal workflow 144 | try to mount phishing attacks on user by sending spam to them 145 | if they fall for an attack, tell them what they should've looked for 146 | can get tiresome after a while, if not done properly.. 147 | security training games 148 | 149 | -------------------------------------------------------------------------------- /previous-years/l21-captcha.txt: -------------------------------------------------------------------------------- 1 | CAPTCHAs 2 | ======== 3 | 4 | Administrivia. 5 | This week, Wed: in-lecture quiz. 6 | Next week, Mon + Wed: in-lecture final project presentations. 7 | 10 minutes per group. 8 | We will have a projector set up if you want to use one. 9 | Feel free to do a demo (e.g., 5 minute talk + 5 minute demo). 10 | Volunteers for Monday? If not, we will just pick at random. 11 | Turn in code + writeup by Friday next week (i.e., Dec 10th). 12 | 13 | Goal of this paper: better understand the economics of security. 14 | Context: earlier paper, "Spamalytics", studied economics of botnets, spam. 15 | Adversaries profitably send spam, mount denial-of-service attacks, etc. 16 | The bulk of botnet activity is work like this (spam, DoS). 17 | Botnet operators sell access to botnets, so there's a real market for this. 18 | 19 | What web sites would use CAPTCHAs? 20 | Open services that allow any user to interact with their site. 21 | Applications that have user accounts but allow anyone to sign up. 22 | 23 | Why would a web site want to use a CAPTCHA? 24 | Prevent adversary from causing DoS (e.g., too many Google searches). 25 | Prevent adversary from spamming users. 26 | Many examples: email spam, social network spam, blog comments. 27 | Prevent adversary from signing up for many accounts? 28 | Harness humans for some task. 29 | reCAPTCHA: OCR books. 30 | Solve CAPTCHAs from other sites? Interesting but probably not worth it. 31 | What if a user legitimately signs up for an account and sends spam? 32 | What if adversary bypasses CAPTCHA and signs up for account? 33 | Can probably detect an adversary sending spam relatively fast. 34 | Still want CAPTCHA to prevent those first few messages before detection. 35 | 36 | Why do sites care if users are humans or software? 37 | Maintain some form of per-person fairness, + hope good users outnumber bad. 38 | Advertising revenue. 39 | What about ad-blocking software? 40 | 41 | If a site doesn't want to implement CAPTCHAs, what are the alternatives? 42 | Track based on IPs. 43 | IPs are cheap for botnet operators. 44 | False positives due to large NATs. 45 | Implement stronger authentication. 46 | Rely on some other authentication mechanism. 47 | Email address, Google account. 48 | At extreme end, bank account, even if no money is charged. 49 | How does Wikipedia work with no CAPTCHAs? 50 | Strong logging, auditing, recovery. 51 | Selective mechanisms to require long-lived accounts. 52 | Measure account life in time, or in number of un-reverted edits? 53 | 54 | Bypassing CAPTCHAs. 55 | Plan 1: write software to recognize characters / challenges in images. 56 | Plan 2: use humans to solve CAPTCHAs. 57 | 58 | Why does the paper argue the technical approach (plan 1) is not effective? 59 | Up-front cost: about $10k to implement solver for CAPTCHA. 60 | CPU cost: a few seconds of CPU time per CAPTCHA solved. 61 | Amazon EC2 prices, order-of-magnitude: $0.10 for an hour of CPU. 62 | CPU cost for solving a CAPTCHA is ~$10^-4 ($0.0001), could be less. 63 | Using humans: $1 for 1,000 CAPTCHA solutions, or $0.001 per CAPTCHA. 64 | Break-even point: solve order-of-magnitude 10M CAPTCHAs. 65 | Worse yet, accuracy rate of automated solver is poor (e.g., 30%). 66 | Thus, break-even point for plan 1 might be higher by 3x. 67 | How do we tell if this break-even point is too high? 68 | Can CAPTCHA developers switch algorithms faster than this? 69 | Experimentally, paper says reCAPTCHA can change fast enough. 70 | Thus, investment not worth it. 71 | 72 | Human-based CAPTCHA solving: Figure 3. 73 | Well-defined API between application and CAPTCHA-solving site. 74 | Back-end site for workers, with a web-based UI. 75 | Some internal protocol between the front- and back-end sites. 76 | How do the authors find out these things? 77 | Looks like a lot of manual work finding these sites. 78 | Interviewed an operator of one such site. 79 | How reliable are these sites? 80 | 80-90% availability (Table 1). 81 | 10-20% error rate (Fig. 4). 82 | What's the cost range? 83 | $0.50 -- $20.00 per 1,000 CAPTCHAs solved. 84 | Wide variance in adaptability, accuracy, latency, capacity. 85 | 86 | Does low accuracy rate matter? 87 | Service provider could detect many incorrect CAPTCHAs. 88 | What would a service provider do in this case? 89 | Can blacklist an IP address after several incorrect answers. 90 | If overall rate across IPs goes down, deploy new CAPTCHA scheme? 91 | Even humans have a 75-90% accuracy rate, depending on the CAPTCHA. 92 | Assuming the humans are similar, service shouldn't blacklist. 93 | 94 | Does latency matter? 95 | CAPTCHA solver cannot be significantly slower than human. 96 | Service would be able to tell the real human & adversary apart. 97 | Regular humans can solve CAPTCHAs in ~10 seconds. 98 | Software can solve CAPTCHAs in several seconds: fast enough. 99 | CAPTCHA-solving services seem to add little latency (Fig. 7). 100 | 101 | How scalable is this? 102 | One service appears to have 400+ workers. 103 | Measured much like network analysis: watch for queueing. 104 | 105 | How much are the workers getting paid? 106 | Quite little: $2-4 per day! 107 | Workers get ~quarter of front-end cost. 108 | Many workers seem to be in China, India, Russia. 109 | Cute tricks for identifying workers: 110 | Ask to decode 3-digit numbers in specific language. 111 | Ask to write down the current time, to find timezone. 112 | 113 | How much profit does an adversary get from abusing an open service? 114 | Email spam: relatively little, but non-zero. 115 | Earlier work suggests a rough estimate of $0.00001 (10^-5) per msg. 116 | How do we measure the profit from sending spam? 117 | Comment spam: not known, might be higher? 118 | Is it possible to quantify or estimate? 119 | Possibly look at the ad costs for the page hosting the comments. 120 | Vandalism, DoS attacks: hard to quantify, externalities. 121 | 122 | Are CAPTCHAs still useful, worthwhile? 123 | An easy way to impose some non-zero cost on potential adversaries. 124 | Why do adversaries sign up for Gmail accounts to send spam? 125 | Gmail's servers unlikely to be marked as spam senders. 126 | Botnet IP addresses are, on the other hand, likely marked as spam. 127 | At $0.001, 1 CAPTCHA is worth 100 emails (at $0.0001 profit per msg). 128 | Borderline-profitable. 129 | Bad place to be in terms of security parameters. 130 | 131 | Users seem to have become more-or-less OK with solving CAPTCHAs. 132 | Can we provide better forms of CAPTCHAs? 133 | Example in paper: Microsoft's Asirra, solvers adapted within days. 134 | Can sites make the cost of solving a CAPTCHA high? 135 | 136 | How to protect more valuable services? 137 | Gmail: SMS-based verification after a few signups from an IP address. 138 | Interesting: gmail accounts went from $8 per 1,000 to unavailable! 139 | Trade-off between defense mechanism usability and security. 140 | Apparently, users do go away from a site if they must solve CAPTCHAs. 141 | Do computational puzzles help? Micropayments? 142 | Can TPMs help, perhaps on the client machines? 143 | 144 | Is it ethical to do the kind of research in this paper? 145 | Authors argue they don't significantly change what's going on. 146 | They don't solve any additional CAPTCHAs by hand. 147 | Instead, they re-submit CAPTCHAs back into the system to be solved. 148 | They don't use the solutions they purchased for any adversarial activity. 149 | They do inject money into the market, but perhaps not significant. 150 | 151 | Other courses, if you're interested in security. 152 | 6.857: Computer and Network Security, in the spring. 153 | 6.875: Cryptography and Cryptanalysis, in the spring. 154 | 155 | -------------------------------------------------------------------------------- /previous-years/l20-bots.txt: -------------------------------------------------------------------------------- 1 | Botnets 2 | ======= 3 | 4 | botnet: network of many machines under someone's control 5 | 6 | what are botnets good for? 7 | using the resources of bot nodes: 8 | IP addrs (spam, click fraud), bandwidth (DoS), maybe CPU (??) 9 | steal sensitive user data (bank account info, credit cards, etc) 10 | impersonate user (inject requests to transfer money on user's behalf) 11 | extortion (encrypt user's data, demand payment for decryption) 12 | attackers might be able to extract a lot of benefit from high-value machines 13 | one botnet had control of machines of officials of diff governments 14 | could enable audio, video and stream it out of important meetings? 15 | other candiates: stealing secret designs from competitor company? 16 | what sorts of attacks are counter-productive for attacker? 17 | making the machine unusable for end-user (unless trying extortion) 18 | 19 | how does the botnet grow? (largely orthogonal from botnet operation) 20 | this particular botnet (Torpig): drive-by downloads 21 | user's browser loads a malicious page (e.g. attacker purchased adspace) 22 | malicious page looks for vulnerabilities in browser or plug-ins 23 | if it finds a way to execute native code, downloads bot code 24 | can we prevent or detect this? maybe look for unusual new processes? 25 | botnet in paper: injects DLLs into existing processes 26 | can use a debugging interface to modify existing process 27 | some processes support plugins/modules (IE, windows explorer) 28 | once DLL running in some other process, looks less suspicious? 29 | 30 | other schemes: worms (self-replicating attack malware) 31 | why worms? 32 | harder to detect (no single attack source) 33 | compromise more machines (attacker now behind firewalls) 34 | faster (much less than an hour for every internet-connected machine) 35 | usually exploit a few wide-spread vulnerabilities 36 | simple worms: exploit some vulnerability in network-facing service 37 | easy strategy: try to spread to other machines at random 38 | e.g. guessing random IPs works (but inefficient) 39 | use user's machine as source of other victims 40 | for worms that spread via email, try user's email address book 41 | try other victims in the same network as the current machine 42 | try machines in user's ssh known_hosts file 43 | use other databases to find candidate victims 44 | google for "powered by phpBB" 45 | try to propagate to any servers that the user connects to 46 | hides communication patterns! 47 | more complex worms possible (from web server to browser and back) 48 | requires finding wide-spread bugs in multiple apps at once 49 | less common as a result? 50 | can we prevent or detect this? 51 | prevent: could try to isolate machines after you've detected it 52 | worm fingerprinting in the network (traffic patterns) 53 | monitor unused machines, email addresses, etc for suspicious traffic 54 | in theory shouldn't be getting anything legitimate 55 | what would show up if you monitored traffic to unused subnet? 56 | network mapping by researchers? 57 | random probes by worms poking at IP addresses 58 | "backscatter" from source-spoofing 59 | could use these to infer what's happening "out there" 60 | detect by planting honeypots 61 | if machine starts generating traffic, probably infected 62 | 63 | once some machine is infected, how does the botnet operate? 64 | bot master, command and control (C&C) server(s), bots talk to C&C servers 65 | bots receive commands from C&C servers 66 | some bots accept commands from the network (e.g. run an open proxy server) 67 | upload stolen data either to the same C&C servers or some other server 68 | 69 | how do bot masters try to avoid being taken down? 70 | change the C&C server's IP address ("fast flux") 71 | can move from one ISP to another after getting abuse complaints 72 | how to inform your bots that your IP address changed? DNS 73 | domain name is a single point of failure for bot master 74 | dynamic domain names ("domain flux") 75 | how does this work? 76 | how do you take down access to a botnet using this? 77 | is there still a single point of failure here? 78 | currently many different domain registrars, little cooperation 79 | conficker generated many more dynamic domain names than torpig 80 | makes it impractical to register all of these names ahead of time 81 | peer-to-peer control networks (Storm botnet) 82 | harder for someone else to take down: no single server 83 | harder for botmaster to hide botnet internals: no protected central srvr 84 | 85 | how did torpig work? 86 | mebroot installs itself into the MBR, so gets to inject itself early on 87 | loads modules from mebroot C&C server 88 | mebroot C&C server responds with torpig DLL to inject into various apps 89 | torpig DLL collects any data that matches pre-defined patterns 90 | usernames and passwords; credit card numbers; ... 91 | torpig DLL contacts torpig's C&C server for info about what sites to target 92 | torpig's C&C server using domain flux: weekly and daily domains 93 | "injection server" responsible for stealing credentials for a specific site 94 | redirects visits to bank login page to fake login page 95 | in-browser DLL subverts any browser protections (SSL, lock icon) 96 | lots of "outsourcing" going on: mebroot, torpig, torpig build customers? 97 | 98 | all traffic encrypted 99 | but these bots implement their own crypto: bad plan, can get broken 100 | conficker used well-known crypto, and was thus much harder to break 101 | 102 | how did these guys take over the botnet? 103 | attackers did not register every torpig dynamic domain name ahead of time 104 | bots did not properly authenticate responses from C&C server 105 | (torpig "owners" eventually took back control through mebroot's C&C) 106 | 107 | how big is the torpig botnet? 108 | 1.2 million IPs 109 | each bot has a "nid" that reflects its hardware config (disk serial number) 110 | ~180k unique nid's 111 | ~182k unique (nid+os+...)'s 112 | 40 VMs (nid's match a standard configuration of vmware or qemu) 113 | lots of IP reuse 114 | aggregate bandwidth is likely over 17 Gbps 115 | 116 | how effective is torpig? 117 | authors collected all data during the 10 days they had control of torpig 118 | collected lots of account information: millions of passwords 119 | many users reuse passwords across sites 120 | 8310 accounts at financial institutions 121 | 1660 credit/debit card numbers 122 | 30 came from a single compromised at-home call center node 123 | pattern-matching works well: don't have to know app ahead of time 124 | kept producing a steady stream of new financial data throughout the 10 days 125 | what's going on? 126 | probably users don't enter their CC#, bank password every day 127 | 128 | how effective is spam? 129 | separate paper looked at the economics of sending spam 130 | about 0.005% users visit URLs in spam messages (1 out of 20,000) 131 | less than 10% of those users "bought" whatever the site was selling 132 | so send ~200,000 spam messages for one real customer 133 | unclear if it's cost-effective (esp. if bots are nearly-free) 134 | 135 | how to defend against bots? 136 | are TPMs of any help? 137 | maybe a way to keep your credentials safe (and avoid simple passwords) 138 | resource abuse: annoying because it gets your machine blacklisted 139 | VMM-level scheme to track user activity? 140 | make their operation not cost-effective 141 | need to get a good idea of what's most profitable for botmasters 142 | 143 | did these guys make it more difficult to mount similar attacks in the future? 144 | probably torpig will get fixed 145 | other papers written about takeovers on different bot nets 146 | other bots employ much stronger security measures to prevent takeover 147 | 148 | -------------------------------------------------------------------------------- /previous-years/l23-voting.txt: -------------------------------------------------------------------------------- 1 | Electronic voting 2 | ================= 3 | 4 | final projects reminder 5 | 10-minute presentations about your projects on Wednesday 6 | we will have a projector that you can use 7 | code and write-up describing your project due on Friday 8 | will only start grading on monday morning, if you need extension.. 9 | 10 | quiz solutions posted on course web site 11 | HKN course eval link posted on course web site 12 | 13 | --- 14 | 15 | what are the security goals in elections? 16 | availability: voters can vote 17 | integrity: votes cannot be changed; results reflect all votes 18 | registration: voters should vote at most once 19 | privacy: voters should not be able to prove how they voted (eg to sell vote) 20 | 21 | what's the threat model? 22 | lots of potential attackers 23 | officials, vendors, candidates themselves, activists, governments, .. 24 | may be interested in obtaining a particular outcome 25 | voters may want to sell votes 26 | real world: anything is fair game 27 | intimidation 28 | impersonation (incl. dead people) 29 | denial of service 30 | ballot box stuffing, miscounting, .. 31 | electronic voting machine attacks 32 | buffer overflows 33 | logic bugs 34 | insider attacks 35 | physical attacks 36 | crashing / corrupting 37 | .. 38 | ideal designs focus on making the attack cost high 39 | auditing with penalties if detected 40 | 41 | what are the alternatives? 42 | vote in public: raise hands, .. 43 | written paper ballots 44 | optical-scan paper ballots 45 | punched paper ballots 46 | DRE (what this paper is about): direct-recording electronic machine 47 | absentee voting by mail 48 | vote-selling potential 49 | internet voting 50 | greater voter turnout 51 | vote-selling potential 52 | more practical problem: worms/viruses voting? 53 | 54 | why DRE? 55 | partly a response to voting problems in florida in the 2000 election 56 | hoped: easier-to-use UI, faster results, more accurate counting, .. 57 | interesting set of constraints from a research point of view 58 | high integrity, ideally verifiable 59 | most of the process should be transparent and auditable 60 | cannot expose individual voter's choices 61 | cannot allow individual voters to prove their vote 62 | 63 | how does the machine work? 64 | 133MHz CPU 65 | 32MB RAM 66 | 67 | on-board flash memory 68 | EPROM socket 69 | "ext flash" socket 70 | boot selector switches, determine which of above 3 device is used to boot 71 | 72 | internal speaker 73 | 74 | external devices: 75 | touch-sensitive LCD panel, keypad, headphones 76 | printer -- why? 77 | smart card reader/writer -- why? 78 | irda transmitter/receiver -- why? 79 | 80 | power switch, keyboard port, PC-card slots (behind locked metal door) 81 | 82 | what does the boot sequence look like? 83 | bootloader runs from selected source 84 | internal flash contains a gzip'ed OS image that gets loaded into RAM 85 | includes image for root file system 86 | internal flash contains file system that stores votes, among other things 87 | 88 | what's on the memory card? 89 | machine state configured via election.brs file 90 | votes stored on memory card (and in the built-in flash) in election.brs 91 | data encrypted using a fixed DES key (hard-coded in the software) 92 | 93 | machine states: pre-download, pre-election testing, election, post-election 94 | what's the point of L&A testing? 95 | want to distinguish test votes from real votes 96 | want to make it difficult to erase existing votes 97 | also tips off the software that it's being tested! 98 | 99 | why smartcards? 100 | contain a secure token from sign-in desk to the voting machine 101 | ideal property: cannot fake a token, cannot duplicate token 102 | how to implement? faking is easy, duplication is harder 103 | can give each token a unique ID, store used tokens on machine 104 | potentially vulnerable to multiple votes on different machines 105 | can have smartcard destroy the token after use, no read/write API 106 | in practice, turned out the machine was not using any smartcard crypto 107 | attacker can easily manufacture fake smartcards and vote many times 108 | (attacker can also manufacture an "admin" smartcard and manage the machine) 109 | 110 | what's the point of printing out receipt tapes post-election? 111 | in theory can do a recount based on these tapes; compare with check-in data 112 | assumes the attack is mounted after the election happens 113 | corrupt or lost memory cards, compromised tabulation, .. 114 | 115 | what attacks did the authors explore? 116 | exploiting physical access to inject malicious code 117 | vote stealing 118 | denial of service 119 | viruses/worms 120 | 121 | specific bugs 122 | unauthenticated smartcards 123 | unauthenticated firmware updates (fboot.nb0) 124 | unauthenticated OS updates (nk.bin) 125 | unauthenticated debug mode flag (explorer.glb) 126 | unauthenticated wipe command (EraseFFX.bsq) 127 | unauthenticated code injection (.ins files with buffer overflows) 128 | poor physical security (cheap lock) 129 | easy to change boot source 130 | easy to change components like EPROM 131 | insufficient audit logs (no integrity; election.adt just has "Ballot cast") 132 | sound when machine reboots, but can be prevented with headphones 133 | 134 | what to do when an audit shows an error? 135 | with this machine: denial of service attack, effectively 136 | ideally would be able to reconstruct what happened or recount manually? 137 | 138 | how to scrub a machine after a potential compromise? 139 | can't trust anything: all memory/code easily changed by attacker 140 | need to install a known-good EPROM, use that to overwrite bootloader, OS 141 | can take a long time, esp. if problem spread to many machines 142 | 143 | how to prevent these attacks? 144 | TPM / secure boot? 145 | would signed files be enough? 146 | attacker can get a hold of signed "debug mode" file and he's done? 147 | signed software updates might not be the latest version 148 | attacker installs old version, exploits bug 149 | might want to prevent rollbacks (but may want to allow, too?) 150 | read-only memory for software? 151 | physical switches to allow updates 152 | could make it more difficult to write a fast-spreading virus/worm 153 | physical access control 154 | probably a good idea, to some extent 155 | auditing physical access leads to easy DoS attacks 156 | need a strong audit mechanism to prevent DoS (i.e., can recount) 157 | append-only memory for auditing? 158 | disable the "flash" (rewrite) circuitry from flash memory? 159 | or just have a dedicated "audit" controller 160 | system already has a separate battery-management PIC 161 | OS-level protection? 162 | language security? 163 | operating / setup procedures? 164 | who has access to the machine, chain of custody, ... 165 | parallel testing? 166 | 167 | what is software-independence? 168 | malicious software alone cannot change election results (undetectably) 169 | e.g. software helps print out ballot, voter makes sure ballot is OK 170 | or prints out a paper tape with all votes, which is counted by hand 171 | 172 | usability for voters? 173 | paper doesn't describe the UI, unfortunately.. 174 | "machine ate my vote"? 175 | could invalidate smartcard and crash? 176 | 177 | usability for officials? 178 | potentially same problems as PGP 179 | do officials have the right mental model to worry about potential attacks? 180 | 181 | end-to-end integrity 182 | voting integrity has 3 parts: 183 | cast-as-intended 184 | collected-as-cast 185 | counted-as-collected 186 | above techniques only help ensure cast-as-intended 187 | need more end-to-end security to ensure other 2 properties 188 | Twin scheme by Rivest and Smith 189 | 190 | -------------------------------------------------------------------------------- /previous-years/l21-dropbox.txt: -------------------------------------------------------------------------------- 1 | Looking inside the (drop)box 2 | ================== 3 | 4 | why are we reading this paper? 5 | code obfuscation is a common goal in the real world 6 | skype, dropbox 7 | gmail 8 | malware 9 | closed versus open design 10 | contrast bitlocker and dropbox client 11 | 12 | this paper has several aspects 13 | code obfuscation weaknesses 14 | focus of this lecture 15 | user authentication weaknesses 16 | not our focus, technically less interesting 17 | automatic login without user credentials fixed (i think) 18 | aside: etiquette with finding security flaws 19 | report before you publish 20 | 21 | what is Dropbox's goal for obfuscation? 22 | don't know, but ... 23 | no open-source client 24 | dropbox e.g., can change the wire protocol at will 25 | make it difficult to for competitors to develop a client 26 | portable fs client is tricky 27 | 28 | what is the threat model? 29 | adversary has access to obfuscated code and can run it 30 | adversary reverse re-engineers client to avoid the above goals 31 | sidenote: malware may have additional threats to protect against 32 | e.g., make it difficult to fingerprint so that anti-virus application cannot remove malware 33 | 34 | challenging threat, because: 35 | code must run correctly on adversary's processor 36 | code may have to make systems calls 37 | code may have to be linked dynamically with host libraries 38 | adversary can observe processor and systems calls 39 | 40 | general approach: code obfuscation 41 | Given a program P, produce O(P) 42 | O(P) has same functions as P but a black box 43 | there is nothing substantial one can learn from O(P) 44 | O(P) isn't much slower than P 45 | 46 | minimum requirement: adversary cannot reconstruct P 47 | ignore programs that are trivially learnable from excuting w. different inputs 48 | easy to avoid complete failure 49 | execute only if an input matches some SHA hash 50 | hash is embedded in program, but difficult to compute inverse 51 | difficult to succeed completely 52 | program prints itself 53 | in general: impossible (see references) 54 | there is a family of interesting programs for which O(P) will fail [see references] 55 | but, perhaps you could do well on a particular program 56 | difficult to state a precise requirements for an obfuscator 57 | should be skeptical that it can work in practice against skilled adversary 58 | 59 | code obfuscation in practice 60 | write C programs from which is difficult to tell what they do 61 | down-side: hard on developer 62 | but makes for great contests (e.g., International Obfuscated C Code Contest[) 63 | use an obfuscator 64 | Takes a program as input and produces a intermediate code 65 | You don't want to ship the original source code 66 | Ship program in intermediate form with interpreter to computer 67 | You don't want to ship the actual assembly 68 | Can cook up your own intermediate language that nobody knows 69 | Computer runs interpreter, which interprets intermediate code 70 | Interpreter reads input and outputs values 71 | The interpreter can try to hide what is actual computing 72 | Fake instructions, fake control 73 | Use inputs as index into a fine state machine and spit out values 74 | Etc. 75 | 76 | dropbox's approach 77 | all code is written in python 78 | compiles programs to bytecode 79 | interpreter executes bytecode 80 | dropbox application 81 | contains encrypted python byte code 82 | encryption method is changed often 83 | byte code opcodes are different than Python 84 | contains a special interpreter 85 | application is built/packaged in non-standard way 86 | special "linker" 87 | 88 | dynamic linking 89 | what are the .so files in the downloaded dropbox directory? 90 | dynamically-linkable libraries 91 | modern applications are not a single file 92 | when application runs and unresolved references are resolved at runtime 93 | e.g., application makes a system call 94 | dynamic linker links the application with the library with system call stubs 95 | adv: library is only once in memory 96 | with static linking: library would be N times in memory 97 | once with each application 98 | LD_PRELOAD: insert your own library in front of others 99 | dropbox ships its app with several libraries that are dynamically linked 100 | but interpreter and SSL are statically linked 101 | 102 | goal of paper: *automatically* break obfuscation (de-drop) 103 | another goal: break user authentication 104 | demo: 105 | look at dropbox binary 106 | ls: no pyc files 107 | gdb binary 108 | nm binary 109 | objdump -S binary 110 | 111 | run dropboxd with LD preload 112 | extracts pyc_decrypted 113 | cd pyc_decrypted/client_api 114 | python 115 | import hashing 116 | dir (hashing) 117 | run uncompyle2 hashing.pyc 118 | 119 | Paper: how to de-crypt pyc files? 120 | study modified python interpreter 121 | diffed Python27.ddl from dropbox with standard 122 | r_object is patched 123 | decrypt decrypts bytecode 124 | how to extract encrypted bytecode? 125 | inject code into dropbox binary using LD_PRELOAD 126 | injected code overwrites strlen 127 | when strlen is called by dropbox, injected code runs 128 | inject Python code using PyRun_SimpleString 129 | not patched 130 | can run arbitrary python code in dropbox context 131 | GIL must be acquired by injected code 132 | call PyMarshal_ReadLastObjectFromFile() 133 | reads encrypted pyc into memory 134 | but, co_code is not exposed to Python! 135 | linear memory search to find co_code 136 | serialize it back to a file 137 | but, marshal.dumps is NOP 138 | inject PyPy's _marshal.py 139 | written in python! 140 | 141 | How to remap opcodes? 142 | manual reconstruct opcode mapping 143 | time intensive, but opcode hasn't changed since 1.6.0 144 | frequency analysis for common modules 145 | decrypted dropbox bytecode 146 | standard bytecode 147 | 148 | How to get user credentials? 149 | hostid are used for authentication 150 | established during registration 151 | not affected by changing password! 152 | stored in encrypted sql database 153 | components of decryption key are stored on device 154 | linux: custom obfuscator 155 | except host_int comes from server 156 | Can also be extracted from dropbox client logs 157 | enable logging based MD5 checksum of "DBDEV" 158 | md5("a2y6shaya") = "c3da6009e4" 159 | patched now. 160 | Snooping on objects, looking for host_id and host_int 161 | Login to web site for logintray is based only on host_id and host_int 162 | Dropbox uses now "better" logintray ... 163 | Dropbox should probably use SRP (or something else good) 164 | 165 | How to learn what dropbox internal APIs are? 166 | Patch all SSL objects, every second 167 | "monkey patch" == dynamic modifications of a class at runtime without 168 | modifying the original source code 169 | maybe derived from guerrilla (as in an sneaky attack) patch? 170 | No two-factor authentication for access to drop-box account 171 | One use: open-source client 172 | 173 | Is the dropbox obfuscation the best you can do? 174 | No. 175 | How could you do better? 176 | Hide instructions much better 177 | Obscure control flow 178 | But, is it worth it? 179 | 180 | Closed versus open design 181 | Downside of closed designs 182 | easy to miss assumptions because right eyes don't look at it 183 | Downside of open design 184 | you competitor has access to it too 185 | Ideal case: minimal secret, make most of design open 186 | maybe not always possible to make the secret small? 187 | 188 | References 189 | http://www.math.ias.edu/~boaz/Papers/obfuscate.ps 190 | http://www.math.ias.edu/~boaz/Papers/obf_informal.html 191 | https://github.com/kholia/dedrop 192 | uncompyle2 https://github.com/wibiti/uncompyle2 193 | -------------------------------------------------------------------------------- /previous-years/l18-dealloc.txt: -------------------------------------------------------------------------------- 1 | Secure deallocation 2 | =================== 3 | 4 | Aside: some recent reverse-engineering of Stuxnet by Symantec. 5 | http://www.symantec.com/connect/blogs/stuxnet-breakthrough 6 | Stuxnet targets specific frequency converters. 7 | Manufactured by companies headquartered in either Finland or Tehran. 8 | Used to drive motors at high speeds. 9 | Stuxnet watches for a specific frequency band. 10 | When detected, changes frequencies to low or high for short periods. 11 | 12 | Problem: disclosure of sensitive data. 13 | 1. Many kinds of sensitive data in applications. 14 | 2. Copies of sensitive data exist for a long time in running system. 15 | 3. Many ways for data to be disclosed (often unintentionally). 16 | 17 | What kinds of sensitive data are these authors concerned about? 18 | Passwords, crypto keys, etc. 19 | Small amounts of data that can be devastating if disclosed. 20 | Bulk data, such as files in a file system. 21 | Sensitive, but not as acute. 22 | Hard to reduce data lifetime (the only knob this paper is using). 23 | Small leaks might not be a disaster (unlike with a private key). 24 | 25 | Where could copies of sensitive data exist in a running system? 26 | Example applications: typing password into Firefox; Zoobar web server. 27 | Process memory: heap, stack. 28 | IO buffers, X event queues, string processing libraries. 29 | Language runtime makes copies (immutable strings, Lisp objects, ..) 30 | Thread registers. 31 | Files, backups of files, ... 32 | Swapped memory, hibernate for laptops. 33 | Kernel memory. 34 | IO buffers: keyboard, mouse inputs. 35 | Kernel stack, freed pages, saved thread registers. 36 | Network packet buffers. 37 | Pipe buffers contain data sent between processes. 38 | Random number generator inputs. 39 | 40 | How does data get disclosed? 41 | Any vulnerability that allows code execution. 42 | Logging / debugging statements. 43 | Core dumps. 44 | DRAM cold-boot attacks. 45 | Stolen disks, or just disposing of old disks. 46 | Revealing uninitialized memory. 47 | Applications with memory management bugs. 48 | Linux kernel didn't zero net buffers, sent "garbage" data in packets. 49 | Same with directories, "garbage" data was written to disk upon mkdir. 50 | MS Word (used to?) contain "garbage" in saved files, such as old text. 51 | 52 | How serious is it? 53 | What data copies might persist for a long time? 54 | Process memory: Looks like yes. 55 | How do they figure this out? 56 | Use valgrind -- could do something similar in DynamoRIO. 57 | Track all memory allocs, reads, writes, frees. 58 | Process registers: Maybe floating-point? Still, probably not that bad. 59 | Files, backups: lives on disk, long-term. 60 | Swap: lives on disk, possibly long-term, expensive to erase. 61 | Kernel memory. 62 | Experiments in paper show live data after many weeks (Sec 3.2). 63 | How do they figure this out? 64 | Place many random 20-byte "stamps" in memory. 65 | Periodically read all phys. memory in kernel, look for stamps. 66 | How can data continue to persist for so long? 67 | Memory should be getting reused? 68 | To some extent, depends on the workload. 69 | Even with an expensive workload, may not eliminate all stamps. 70 | Holes in long-lived kernel data structures, slab allocators. 71 | Persistence across reboots, even. 72 | Are there really that many data disclosure bugs? 73 | Some examples of past bugs. 74 | Worse yet: data disclosure bugs not treated with much urgency? 75 | 76 | Paper's goal: 77 | Try to minimize the amount of time that sensitive data exists. 78 | Not focusing on fixing data disclosure mechanisms (hard to generalize). 79 | 80 | How do we reduce/avoid data copies? 81 | Process memory: need application's help. Mostly what this paper is about. 82 | Process registers: not really needed. 83 | Swap: mlock(), mlockall() on Unix. Encrypted swap. 84 | File system: Bitlocker. Vanish, if the application is involved. 85 | Kernel memory: need to modify the kernel. Partly discussed in paper. 86 | 87 | Paper's model for thinking about data lifetime in memory. 88 | Interesting operations: allocation, write, read, free. 89 | Conceptually applies to any memory. 90 | malloc(), stack allocation on function call, global variables, .. 91 | Ideal lifetime for data: from first write to last read (before write/free). 92 | Can't do any better: data must stay around. 93 | Natural lifetime: from first write to next write 94 | (potentially after free and re-alloc). 95 | Natural lifetime is what most systems do today. 96 | Data lives until overwritten by something else re-using that memory. 97 | 98 | Why is natural lifetime too long? 99 | Bursty memory allocation: memory freed, never allocated again. 100 | "Holes": not every byte of an allocation might be written to. 101 | Holes in the stack. 102 | Unused members in structs. 103 | Padding in structs. 104 | Variable-length data (e.g., packets or path names). 105 | 106 | How can we do better than natural lifetime? 107 | "Secure deallocation": erase data from memory when region is freed. 108 | Safe: programs should not rely on data living past free. 109 | How close to ideal is this? 110 | Depends on program, experiments show usually good (except for GUIs). 111 | Can we do better? 112 | Might be able to figure out last read through program analysis. 113 | Seems tricky to do in a general-purpose way. 114 | Programmers can manually annotate, or manually clear data. 115 | 116 | Secure deallocation in a process. 117 | Heap: zero out the memory in free(). 118 | What about memory leaks? Rely on OS to clean up on process exit. 119 | Private allocators? Modify, or rely on reuse or returning memory to OS. 120 | Stack: two plans. 121 | 1. Augment the compiler to zero out stack frames on function return. 122 | 2. Periodically zero out memory below stack pointer, from the OS. 123 | Advantages / disadvantages: 124 | 1 is precise, but maybe expensive (CPU time, memory bandwidth). 125 | 2 is cheaper, but may not clear right away, or delete everything. 126 | 1 requires re-compiling code; 2 works with unmodified binaries. 127 | Static data in process memory: rely on OS to clean up on exit. 128 | 129 | Secure deallocation in the kernel. 130 | Can we apply the same plan as in the applications? Why or why not? 131 | Vague argument about kernel being performance-sensitive. 132 | Not clear exactly why this is (applications are also perf-sensitive?). 133 | What kinds of data do we want to clear in the kernel? 134 | Data that applications are processing: IO buffers, anon process memory. 135 | Not internal kernel data (e.g., pointers). 136 | Not application data that lives on disk (files, directories). 137 | Page allocation: track pages that contain sensitive data ("polluted"). 138 | Three lists of free pages: 139 | - Zeroed pages. 140 | - Polluted non-zero pages. 141 | - Unpolluated non-zero pages. 142 | How is the polluted bit updated? 143 | Manually set in kernel code when page is used for process memory. 144 | Cleared when polluted free page is zeroed or overwritten. 145 | Smaller kernel objects: caller of kfree() must say if object is polluted. 146 | Objects presumably include network buffers, pipes, user IO, .. 147 | Memory allocator then erases data just like free() in user-space. 148 | Circular queues: semi-static allocation / specialized allocator. 149 | E.g., terminal buffers, PRNG inputs. 150 | Erase data when elements removed from queue. 151 | 152 | More efficient clearing of kernel memory. 153 | No numbers to explain why optimizations are needed, or which ones matter.. 154 | Page zeroing: return different pages depending on callers to alloc. 155 | Insight: zeroed pages are "expensive", polluted pages are "cheap". 156 | 1. Can return polluted page if caller will overwrite entire page. 157 | E.g., new page to be used to read an entire page from disk. 158 | 2. Avoid returning zeroed pages if caller doesn't care about contents. 159 | If not enough memory, return zeroed page, or zero a polluted page. 160 | Cannot simply return polluted page: sensitive data may persist. 161 | Batch page zeroing: why? 162 | Allows the optimization of caller overwriting page to take place. 163 | May improve interactive performance, by deferring the cost of zeroing. 164 | Specialized zeroing strategies. 165 | Variable-length buffers: packets (implemented), path names (not). 166 | Clear out just the used part (e.g., 64 byte pkt in 1500-byte buffer). 167 | 168 | Side-effects of secure deallocation. 169 | Might make some bugs more predictable, or make bugs go away. 170 | Periodic stack clearing may make uninitialized stack bugs less predictable. 171 | 172 | Performance impact? 173 | Seems to be low, but a bit hard to tell what's going on in the kernel. 174 | 175 | What happens in a higher-level language (PHP, Javascript, ..)? 176 | May need to modify language runtime to erase stack. 177 | If runtime uses own allocator (typical), need to modify that as well. 178 | Otherwise, free() may be sufficient. 179 | 180 | How does garbage collection interact with secure deallocation? 181 | Reference-counting GC can free, erase objects fast in most cases. 182 | Periodic garbage collection may unnecessarily prolong data lifetime. 183 | 184 | -------------------------------------------------------------------------------- /previous-years/l19-backtracker.txt: -------------------------------------------------------------------------------- 1 | Backtracking intrusions 2 | ======================= 3 | 4 | Overall problem: intrusions are a fact of life. 5 | Will this ever change? 6 | Buggy code, weak passwords, wrong policies / permissions.. 7 | 8 | What should an administrator do when the system is compromised? 9 | Detect the intrusion ("detection point"). 10 | Result of this stage is a file, network conn, file name, or process. 11 | Find how the attacker got access ("entry point"). 12 | This is what Backtracker helps with. 13 | Fix the problem that allowed the compromise 14 | (e.g., weak password, buggy program). 15 | Identify and revert any damage caused by intrusion 16 | (e.g., modified files, trojaned binaries, their side-effects, etc). 17 | 18 | How would an administrator detect the intrusion? 19 | Modified, missing, or unexpected file; unexpected or missing process. 20 | Could be manual (found extra process or corrupted file). 21 | Tripwire could point out unexpected changes to system files. 22 | Network traffic analysis could point out unexpected / suspicious packets. 23 | False positives is often a problem with intrusion detection. 24 | 25 | What good is finding the attacker's entry point? 26 | Curious administrator. 27 | In some cases, might be able to fix the problem that allowed compromise. 28 | User with a weak / compromised password. 29 | Bad permissions or missing firewall rules. 30 | Maybe remove or disable buggy program or service. 31 | Backtracker itself will not produce fix for buggy code. 32 | Can we tell what vulnerability the attacker exploited? 33 | Not necessarily: all we know is object name (process, socket, etc). 34 | Might not have binary for process, or data for packets. 35 | Probably a good first step if we want to figure out the extent of damage. 36 | Initial intrusion detection might only find a subset of changes. 37 | Might be able to track forward in the graph to find affected files. 38 | 39 | Do we need Backtracker to find out how the attacker gained access? 40 | Can look at disk state: files, system logs, network traffic logs, .. 41 | Files might not contain enough history to figure out what happened. 42 | System logs (e.g., Apache's log) might only contain network actions. 43 | System logs can be deleted, unless otherwise protected. 44 | Of course, this is also a problem for Backtracker. 45 | Network traffic logs may contain encrypted packets (SSL, SSH). 46 | If we have forward-secrecy, cannot decrypt packets after the fact. 47 | 48 | Backtracker objects 49 | Processes, files (including pipes and sockets), file names. 50 | How does Backtracker name objects? 51 | File name: pathname string. 52 | Canonical: no ".." or "." components. 53 | Unclear what happens to symlinks. 54 | File: device, inode, version#. 55 | Why track files and file names separately? 56 | Where does the version# come from? 57 | Why track pipes as an object, and not as dependency event? 58 | Process: pid, version#. 59 | Where does the version# come from? 60 | How long does Backtracker have to track the version# for? 61 | 62 | Backtracker events 63 | Process -> process: fork, exec, signals, debug. 64 | Process -> file: write, chmod, chown, utime, mmap'ed files, .. 65 | Process -> filename: create, unlink, rename, .. 66 | File -> process: read, exec, stat, open. 67 | Filename -> process: open, readdir, anything that takes a pathname. 68 | File -> filename, filename -> file: none. 69 | How does Backtracker name events? 70 | Not named explicitly. 71 | Event is a tuple (source-obj, sink-obj, time-start, time-end). 72 | What happens to memory-mapped files? 73 | Cannot intercept every memory read or write operation. 74 | Event for mmap starts at mmap time, ends at exit or exec. 75 | Implemented: process fork/exec, file read/write/mmap, network recv. 76 | In particular, none of the filename stuff. 77 | 78 | How does Backtracker avoid changing the system to record its log? 79 | Runs in a virtual machine monitor, intercept system calls. 80 | Extracts state from guest virtual machine: 81 | Event (look at system call registers). 82 | Currently running process (look at kernel memory for current PID). 83 | Object being accessed (look at syscall args, FD state, inode state). 84 | Logger has access to guest kernel's symbols for this purpose. 85 | How to track version# for inodes or pids? 86 | Might be able to use NFS generation numbers for inodes. 87 | Need to keep a shadow data structure for PIDs. 88 | Bump generation number when a PID is reused (exit, fork, clone). 89 | 90 | What do we have to trust? 91 | Virtual machine monitor trusted to keep the log safe. 92 | Kernel trusted to keep different objects isolated except for syscalls. 93 | What happens if kernel is compromised? 94 | Adversary gets to run arbitrary code in kernel. 95 | Might not know about some dependencies between objects. 96 | Can we detect kernel compromises? 97 | If accessed via certain routes (/dev/kmem, kernel module), then yes. 98 | More generally, kernel could have buffer overflow: hard to detect. 99 | 100 | Given the log, how does Backtracker find the entry point? 101 | Present the resulting dependency graph to the administrator. 102 | Ask administrator to find the entry point. 103 | 104 | Optimizations to make the graph manageable. 105 | Distinction: affecting vs. controlling an object. 106 | Many ways to affect execution (timing channels, etc). 107 | Adversary interested in controlling (causing specific code to execute). 108 | High-control vs. low-control events. 109 | Prototype does not track file names, file metadata, etc. 110 | Trim any events, objects that do not lead to detection point. 111 | Use event times to trim events that happened too late for detection point. 112 | Hide read-only files. 113 | Seems like an instance of a more general principle. 114 | Let's assume adversary came from the network. 115 | Then, can filter out any objects with no (transitive) socket deps. 116 | Hide nodes that do not provide any additional sources. 117 | Ultimate goal of graph: help administrator track down entry point. 118 | Some nodes add no new sources to the graph. 119 | More general than read-only files (above): 120 | Can have socket sources, as long as they're not new socket sources. 121 | E.g., shell spawning a helper process. 122 | Could probably extend to temporary files created by shell. 123 | Use several detection point. 124 | Sounds promising, but not really evaluated. 125 | Potentially unsound heuristics: 126 | Filter out low-control events. 127 | Filter out well-known objects that cause false positives. 128 | E.g., /var/log/utmp, /etc/mtab, .. 129 | 130 | How can an adversary elude Backtracker? 131 | Avoid detection. 132 | Use low-control events. 133 | Use events not monitored by Backtracker (e.g., ptrace). 134 | Log in over the network a second time. 135 | If using a newly-created account or back door, will probably be found. 136 | If using a password stolen via first compromise, might not be found. 137 | Compromise OS kernel. 138 | Compromise the event logger (in VM monitor). 139 | Intertwine attack actions with other normal events. 140 | Exploit heuristics: write attack code to /var/log/utmp and exec it. 141 | Read many files that were recently modified by others. 142 | Other recent modifications become candidate entry points for admin. 143 | Prolong intrusion. 144 | Backtracker stores fixed amount of log data (paper suggests months). 145 | Even before that, there may be changes that cause many dependencies. 146 | Legitimate software upgrades. 147 | Legitimate users being added to /etc/passwd. 148 | Much more difficult to track down intrusions across such changes. 149 | 150 | Can we fix file name handling? 151 | What to do with symbolic links? 152 | Is it sufficient to track file names? 153 | Renaming top-level directory loses deps for individual file names. 154 | More accurate model: file names in each directory; dir named by inode. 155 | Presumably not addressed in the paper because they don't implement it. 156 | 157 | How useful is Backtracker? 158 | Easy to use? 159 | Administrator needs to know a fair amount about system, Backtracker. 160 | After filtering, graphs look reasonably small. 161 | Reliable / secure? 162 | Probably works fine for current attacks. 163 | Determined attacker can likely bypass. 164 | Practical? 165 | Overheads probably low enough. 166 | Depends on VM monitor knowing specific OS version, symbols, .. 167 | Not clear what to do with kernel compromises 168 | Probably still OK for current attacks / malware. 169 | Would a Backtracker-like system help with Stuxnet? 170 | Need to track back across a ~year of logs. 171 | Need to track back across many machines, USB devices, .. 172 | Within a single server, may be able to find source (USB drive or net). 173 | Stuxnet did compromise the kernel, so hard to rely on log. 174 | 175 | Do we really need a VM? 176 | Authors used VM to do deterministic replay of attacks. 177 | Didn't know exactly what to log yet, so tried different logging techniques. 178 | In the end, mostly need an append-only log. 179 | Once kernel compromised, no reliable events anyway. 180 | Can send log entries over the network. 181 | Can provide an append-only log storage service in VM (simpler). 182 | 183 | -------------------------------------------------------------------------------- /previous-years/l20-traceback.txt: -------------------------------------------------------------------------------- 1 | Denial of service attacks 2 | ========================= 3 | 4 | What kinds of DoS attacks can an adversary mount? 5 | Exhaust resources of some service. 6 | Network bandwidth. 7 | CPU time (e.g., image processing, text searching, etc). 8 | Disk bandwidth (e.g., complex SQL queries touching a lot of data). 9 | Disk space, memory. 10 | Deny service by exploiting some vulnerability in protocol, application. 11 | In TCP, if adversary can guess TCP sequence numbers, can send RST. 12 | Terminates TCP connection. 13 | In 802.11, deauthenticate packets were (still?) not authenticated. 14 | Adversary can forge deauthenticate packets, disconnect client. 15 | In BGP, routers perform little authentication on route announcements. 16 | A year or so ago, Pakistan announced BGP route for Youtube. 17 | In April, China announced BGP routes for many addresses. 18 | Poorly-designed or poorly-implemented protocols or apps can be fixed. 19 | Resource exhaustion attacks are often harder to fix. 20 | 21 | Why do attackers mount DoS attacks? 22 | "Spite", but increasingly less so. 23 | Extortion. Force victim to incur cost of defense or downtime. 24 | Extortion (used to be?) relatively common for online gambling sites. 25 | High-value, time-sensitive, downtime is very costly. 26 | 27 | Network bandwidth DoS attacks. 28 | Adversary unlikely to directly have overwhelming network bandwidth. 29 | Thus, key goal for an adversary is amplification. 30 | One way to amplify bandwidth: reflection. 31 | Early trick: "smurf", send source-spoofed ICMP ping to broadcast addr. 32 | More likely today: source-spoofed UDP DNS queries. 33 | Why don't adversaries use TCP services for reflection? 34 | Higher-level amplification: compromise machines via malware, form botnet. 35 | Most prevalent today, can send well-formed TCP connections. 36 | Why are TCP connections more interesting for adversaries? 37 | Reflected ICMP, UDP packets much easier to filter out. 38 | 39 | CPU time attacks. 40 | Complex applications perform large amounts of computation for requests. 41 | SSL handshake, PDF generation, Google search, airline ticket searches. 42 | High-end DoS attackers do this routinely to incur maximum cost per request. 43 | 44 | Disk bandwidth attacks. 45 | Disk is often the slowest part of the system (100 seeks per second?) 46 | Systems optimized to avoid disk whenever possible: use caches. 47 | Caches work due to statistical distributions. 48 | Adversary can construct an unlikely distribution, ask for unpopular data. 49 | Caches no longer effective, many queries hit disk, system grinds to a halt. 50 | Hard to control, predict, or even detect. 51 | 52 | Space exhaustion attacks (disk space, memory). 53 | Once a user is authenticated, relatively easy to enforce quotas. 54 | Many protocols require servers to store state on behalf of unknown clients. 55 | 56 | How to defend against DoS attacks in general? 57 | Accountability: track down the attacker. 58 | Becoming harder to do, at a conceptual level, with botnets, Tor, .. 59 | Require authentication to access services. 60 | Lowest level (IP) does not provide authentication by default. 61 | Require clients to prove they've spent some resources. 62 | Might be plausible if adversary's goal is to exhaust server resources. 63 | Captchas. 64 | Cryptographic puzzles. 65 | Given challenge (C,n) find R so that low n bits of SHA1(C||R) are 0. 66 | Easy to synthesize challenge and verify answer. 67 | Easy to scale up the challenge, if under attack. 68 | Deliver/verify challenge over some protocol not susceptible to DoS. 69 | One slight problem: CPU speeds vary a lot. 70 | More memory-intensive puzzles also exist, might be more fair. 71 | Micropayments. 72 | Some "e-stamp" proposals tried, but micropayments are hard. 73 | Bandwidth (Speak-up by Mike Walfish). 74 | Big problem: adversary can get more resources through botnets. 75 | 76 | Specific problem: IP address spoofing. 77 | What's the precise problem? 78 | Adversary can put any IP address as source when sending packet. 79 | Not all networks perform sanity-checks on source IP addresses. 80 | Hard for victim to track down who is responsible for the traffic. 81 | What resources can adversary exhaust in this manner? 82 | Can send arbitrary packets, exhausting bandwidth. 83 | Can issue any queries to UDP services (e.g., DNS), exhausting CPU time. 84 | Cannot establish fully-open TCP connections (must guess sequence#). 85 | Can create half-open TCP conns, exhausting server memory (SYN flood). 86 | SYN flood problem: three-way TCP handshake (SYN, SYN-ACK, ACK). 87 | Server must keep state about the received SYN and sent SYN-ACK. 88 | Needed to figure out what connection the third ACK packet is for. 89 | One solution: use cryptography to off-load state onto the client. 90 | SYN cookies: encode server-side state into sequence number. 91 | seq = MAC(client & server IPs, ports, timestamp) || timestamp 92 | Server computes seq as above when sending SYN-ACK response. 93 | Server can verify state is intact by verifying hash (MAC) on ACK's seq. 94 | Not quite ideal: need to think about replay attacks within timestamp. 95 | Another problem: if third packet lost, noone retransmits. 96 | Maybe not a big deal in case of a DoS attack. 97 | Only a problem for protocols where server speaks first. 98 | 99 | What's the best we can hope for in an IP traceback scheme? 100 | No way to authenticate messages from any given router. 101 | Goal: suffix of the real attack path. 102 | Adversary is free to make up his or her own routers. 103 | Infact, this is realistic, since adversary may be an actual ISP. 104 | Rely on fact that adversary's packets must repeatedly traverse suffix. 105 | 106 | Typical constraints for deploying IP traceback, in order of increasing hardness: 107 | Routers are hard to change. 108 | Routers cannot do a lot of processing per packet. 109 | End-hosts are hard to change. 110 | Packets formats are nearly impossible to change. 111 | 112 | Manual tracing through the network. 113 | 1. Find a pattern for the attack packets (e.g., destination address). 114 | 2. Call up your ISP, ask them to tcpdump and say where packets come from. 115 | 3. Repeat calling up the next ISP and asking them to do the same. 116 | Slow, tedious, non-scalable, hard to get cooperation from far-away ISPs. 117 | 118 | Controlled flooding. 119 | Clever idea: flood individual links you suspect might be used by attack. 120 | See how the flood affects the incoming DoS packets. 121 | Potentially works for a single source of attack, but causes DoS by itself. 122 | 123 | Ideal packet marking: record every link traversed by a packet. 124 | Problem: requires a lot of space in each packet. 125 | 126 | Trade-off: record individual links with some probability ("edge sampling"). 127 | Each packet gets marked with two link endpoints and a distance counter. 128 | How do we reconstruct the path from the individual links? 129 | How do we decide when to mark a packet? Small probability. 130 | What if the packet is already marked? Why overwrite? 131 | Why do we need a distance counter? 132 | Why do we need the two endpoints to each mark the packet with their own IP? 133 | Could have one router write down its own IP and the next hop's IP. 134 | However, routers have many interfaces, with a separate IP for each. 135 | Makes it difficult for end-node machine to piece together route. 136 | Don't know when two IPs belong to the same router. 137 | 138 | Making edge sampling work in IP packets. 139 | Challenge: encoding edge information into IP packet. 140 | Ideally, want to store 2 IPs (2 x 32 bits) and distance (8 bits). 141 | Authors only found space for 16 bits in the rarely-used fragment ID. 142 | Trick 1: Edge IDs. 143 | XOR the IPs of neighboring nodes into a single 32-bit edge ID. 144 | How much does this save us? 145 | How can we reconstruct the path? 146 | Start with first hop, keep XORing with increasingly larger distances. 147 | Trick 2: Integrity checking scheme to know when we've XORed the right IDs. 148 | Potential problem: attack may come from many sources. 149 | As a result, XORing with edge-ID of some distance may not be right. 150 | Approach: make IPs easy to verify, by bit-interleaving hash of IP. 151 | Can validate candidate IP addresses by checking their hash. 152 | Doesn't save us space (yet), only increases edge IDs to 64 bits. 153 | Trick 3: Break up edge IDs into fragments (e.g., 8 bit chunks of 64 bits). 154 | Encoding in the IP header: 155 | 3-bit offset (which 8-bit chunk out of 64-bit edge ID). 156 | 5-bit distance (up to traceback-enabled 32 hops away). 157 | 8-bit data (i.e., a particular fragment of the 64-bit edge ID). 158 | How to reconstruct? 159 | Know the right offset for each chunk, and the right distance. 160 | Try all combinations of offsets for given distance to match hash. 161 | Once we know IP address for one hop, move on to the next distance. 162 | Trick 4: What happens if the fragment-ID field is in use? 163 | Drop fragmented packet with some prob., replace with entire edge info. 164 | Probability needed for fragmented packets is less: no matching needed. 165 | 166 | How practical is the proposed IP traceback scheme? 167 | What happens if not all routers implement this scheme? 168 | How do we know when the traceback information stops being a legal suffix? 169 | How expensive is it to reconstruct edges from fragments? 170 | 171 | -------------------------------------------------------------------------------- /previous-years/l17-vanish.txt: -------------------------------------------------------------------------------- 1 | Vanish 2 | ====== 3 | 4 | Problem: sensitive data can be difficult to get rid of. 5 | Emails, shared documents, even files on a desktop computer. 6 | Adversary may get old data after they break in, gain access. 7 | Difficult to prevent certain kinds of "break ins": legal subpoenas, etc. 8 | Would like to have data become inaccessible after some period of time. 9 | 10 | How serious of a problem is this? 11 | Seems like there are some interesting use cases that the paper discusses. 12 | Especially useful for ensuring email messages cannot be recovered later on. 13 | 14 | Strawman 1: why not attach metadata with expiration date (e.g., email header)? 15 | Copies of data may be stored on servers: backups, logs (e.g., email). 16 | Even with no copies, data may be stored on broken machine: hard to erase. 17 | Adversary may be able to obtain sensitive data from those copies. 18 | Goal: do not require any explicit data deletion. 19 | 20 | Strawman 2: why not encrypt email messages with recipient's public key? 21 | Adversary may steal the user's private key. 22 | Adversary may use a court order or subpoena to obtain private key. 23 | Goal: ensure data is inaccessible even if recipient's key compromised. 24 | 25 | Strawman 3: why not use an online service specifically for this purpose? 26 | Simple service, in principle: 27 | Encrypt messages with a specified expiration time. 28 | Decrypt only ciphertexts whose expiration time is in the future. 29 | Service is trusted (if compromised, can recover old "expired" data). 30 | Security services were targeted by law enforcement in the past. 31 | E.g., Hushmail incident. 32 | Hard to deploy service specifically for an unknown new application. 33 | Difficult to justify resources for services that's not used yet. 34 | Goal: no new services. 35 | 36 | Strawman 4: why not use specialized hardware? 37 | Need a reliable source of time; TPM hardware does not provide one. 38 | In principle, smartcard could serve as distributed encrypt/decrypt service. 39 | If we can't use a standard TPM chip, difficult to deploy new hardware. 40 | Goal: no new hardware. 41 | 42 | Vanish design, step 1: reduce problem to limiting lifetime of random keys. 43 | To create a vanishing data object (VDO), create fresh data encryption key K. 44 | Encrypt the real data with this key: C = E_K(D). 45 | Strawman VDO is now (C, K). 46 | Next, we will make sure key K vanishes at the right time.. 47 | Why is this step useful? 48 | 1. Need to worry about vanishing of a small, fixed-size object (key K). 49 | 2. The key K itself doesn't leak any information about data. 50 | 51 | Vanish design, step 2: store the secret key in a DHT. 52 | Quick aside on how DHTs work.. 53 | Logical view: 54 | Many machines (e.g., ~1M for Vuze DHT) talk to each other. 55 | Store key-value pairs, where keys are 160-bit things called "indexes". 56 | Storage is distributed across the nodes in the DHT. 57 | (Thus, the name: distributed hash table.) 58 | API: 59 | lookup(index) -> set of nodes 60 | store(node, index, value) -> node stores the (index, value) entry 61 | get(node, index) -> value, if stored at that node 62 | The tricky function is lookup (others are just talking to one node). 63 | Vuze DHT works by constructing a single 160-bit address/name space. 64 | 160 bits works well, because it's large and fits a SHA-1 hash. 65 | Nodes get 160-bit identifiers (SHA-1 hash of, e.g., node's public key). 66 | Nodes are responsible for indexes near their own 160-bit ID. 67 | That is, lookup(index) returns nodes with IDs near index. 68 | Nodes talk to other nodes with nearby ID values, to replicate data. 69 | (Also need to talk to a few nodes far away, for lookup to work). 70 | Intermediate step (not quite Vanish): 71 | Choose random "access key" L. 72 | Store data key K at index L in the DHT. 73 | Strawman VDO is now (C, L). 74 | How to recover the VDO before it expires? 75 | Straightforward: fetch key K from index L in the DHT. 76 | What causes data to vanish? 77 | In the Vuze DHT, values expire after 8 hours (fixed timeout). 78 | More generally, DHTs experience churn (nodes join and leave the DHT). 79 | Once a node leaves DHT, it will re-join with a different ID. 80 | Difficult to track down nodes that used to store some index in the past. 81 | Why does Vanish choose an "access key" L instead of using, say, H(C)? 82 | Ensures that Vanish does not reduce security. 83 | The only things revealed to the DHT are random values (e.g., L and K). 84 | Not dependent on actual sensitive data (plaintext D or ciphertext C). 85 | 86 | Vanish design, step 3: split up the key into multiple pieces, store the pieces. 87 | Why does Vanish do this? 88 | 1. Individual nodes may go away prematurely, want reliability until timeout. 89 | 2. Individual nodes can be malicious, can be subpoenaed, can be buggy.. 90 | Problem shown in Figure 4 (with N=1). 91 | Less than 100% availability before 8 hours. Why? 92 | More than 0% availability after 8 hours. Why? 93 | 94 | Secret sharing (by Adi Shamir). 95 | Given secret K, want to split it up into shares K_1, .., K_N. 96 | Given some threshold M of shares (<= N), should be able to reconstruct K. 97 | Construction: random polynomial of degree M-1, whose constant coeff is K. 98 | Assume we can operate mod some large constant (e.g. 2^128 for AES keys). 99 | Polynomial is f(x) = z_{M-1} x^{M-1} + .. + z_1 x^1 + K (mod 2^128). 100 | To generate N secret shares, compute f(1), f(2), .., f(N). 101 | To reconstruct secret given M shares, solve polynomial and compute f(0). 102 | With fewer than M shares, there is a unique solution for any f(0) value. 103 | This means adversary doesn't know what f(0)=K is, with 3 | 4 | 5 |

Wireless Sensor Networks (notes by Marten van Dijk)

6 | 7 | Read: A. Perrig, R. Szewczyk, J.D. Tygar, V. Wen, and D.E. Culler, "SPINS: Security Protocols for Sensor Networks", Wireless Networks 8, 521-534, 2002. 8 | 9 |

10 | Model (assumptions, security requirements, possible threats): 11 | 12 |

13 | What is a sensor network? Thousands to millions of small sensors form self-organizing wireless networks. Sensors have limited processing power, storage, bandwidth, and energy (this gives low production costs). For example, use TinyOS, a small, event-driven OS, see Table 1. Serious security and privacy questions arise if third parties can read or tamper with sensor data. 14 | 15 |

16 | Examples: emergency response information, energy management, medical monitoring, logistics and inventory management, battlefield management. 17 | 18 |

19 | What are the differences between wireless sensor networks (WSN) and mobile ad hoc networks (MANET)? The number of sensor nodes in a WSN can be several orders of magnitude larger than the nodes in a MANET. Sensor nodes are densely deployed. Sensor nodes are prone to failures. The topology of a WSN changes very frequently. Sensor nodes mainly use a broadcast communication paradigm, whereas most MANETs are based on point-to-point communication. Sensor nodes are limited in processing power, storage, bandwidth, and energy. 20 | 21 |

22 | What are the components of a sensor node? Sensing unit with a sensor and analog-to-digital converter (ADC). Processor with storage. Transceiver. Power unit. 23 | 24 |

25 | What are the capabilities of a base station? More battery power, sufficient memory, means for communicating with outside networks. 26 | 27 |

28 | What are the trust assumptions? Individual sensors are untrusted. There is a known upper bound on the fraction of all sensors that are compromised. Communication infrastructure is untrusted (except that messages are delivered to the destination with non-negligible probability). Sensor nodes trust their base station. Each node trusts itself. 29 | 30 |

31 | What is the protocol stack? Physical layer: simple but robust modulation, transmission, and receiving techniques; responsible for frequency selection, carrier frequency generation, signal detection, modulation. Data link layer: medium access control (MAC) protocol must be power-aware and able to minimize collision with neighbors' broadcasts, MAC protocol in a wireless multi-hop self-organizing network creates the network infrastructure (topology changes due to node mobility and failure, periodic transmission of beacons allows nodes to create a routing topology) and efficiently shares communication resources between sensor nodes (both fixed allocation and random access versions have been proposed), data link layer also implements error control and data encryption + security. Network layer: routing the data supplied by the transport layer, provide internetworking with external networks, design principles are power efficiency, data aggregation useful only when it does not hinder the collaborative effort of the sensor nodes, attribute-based addressing and location awareness. Transport layer: helps to maintain the flow of data if the application requires it, especially needed when the system is planned to be accessed through the Internet or other external networks. Application layer: largely unexplored. 32 | 33 |

34 | What are performance metrics? Fault tolerance or reliability: is the ability to sustain sensor network functionalities without interruption due to sensor node failures (non-adversarial such as lack of power, physical damage, environmental interference), it is modeled as a Poisson distribution e^{-lambda*t} to capture the probability of not having a failure within the time interval (0,t). Scalability: ability to support larger networks, flexible against increase in the size of the network even after deployment, ability to utilize more dense networks (density gives the number of nodes within the transmission radius of each node; it equals N*pi*R^2/A, where N is the number of scattered sensor nodes in region A, and R is the radio transmission range). Efficiency: storage complexity (amount of memory required to store certificates, credentials, keys), processing complexity (amount of processor cycles required by security primitives and protocols), communication complexity (overhead in number and size of messages exchanged in order to provide security). Network connectivity: probability that two neighboring sensors are able to share a key (enough key connectivity is required in order to provide intended functionality). Network resilience: resistance against node capture; for each c and s, what is the probability that c compromised sensors can break s links (by reconstructing the corresponding shared secret keys)? 35 | 36 |

37 | What are the security requirements? Availability: ensure that service offered by the whole WSN, by any part of it, or by a single node must be available whenever required. Degradation of security services: ability to change security level as resource availability changes. Survivability: ability to provide a minimum level of service in the presence of power loss, failures, or attacks (need to thwart denial of service attacks). 38 | 39 |

40 | Authentication: authenticate other nodes, cluster heads, and base stations before granting a limited resource, or revealing information. Integrity: ensure that the message or entity under consideration is not altered (data integrity is achieved by data authentication). Freshness: ensure that each message is fresh, most recent (detect replay attacks). 41 | 42 |

43 | Confidentiality: providing privacy of the wireless communication channels (prevent information leakage by eavesdropping or covert channels), need semantic security, which ensures that an eavesdropper has no information about the plaintext, even if it sees multiple encryptions of the same plaintext (e.g., concatenate plaintext with a random bit string, this however requires sending more data and costs more energy). Non-repudiation: preventing malicious nodes from hiding their activities (e.g., they cannot refute the validity of a statement they signed). 44 | 45 |

46 | Solutions (SNEP, micro TESLA, Key Distribution): 47 | 48 |

49 | What are the limitations in designing security? Security needs to limit the consumption of processing power. Limited power supply limits the lifetime of keys. Working memory cannot hold the variables for asymmetric cryptographic algorithms such as RSA. High overhead to create and verify signatures. Need to limit communication. 50 | 51 |

52 | SNEP: A and B share a master key, which they use to derive an encryption keys K_AB and K_BA, and MAC keys K'_AB and K'_BA. A and B synchronize counter values C_A=C_B. Communication from A to B: {Data}_[K_AB,C_A] = Data XOR E_{K_AB}(C_A) together with MAC_{K'_AB}({Data}_[K_AB,C_A]||C_A), see Formula (1). The MAC computation is pictured in Figure 3 using CBC mode. This gives semantic security, data authentication, weak freshness (if the message verifies correctly, a receiver knows that the message must have been sent after the previous message it (the receiver) received correctly), low communication overhead (the counter value is not sent). 53 | 54 |

55 | Strong freshness: see Formula (2), if B request a message from A, then B transmits to A a nonce and A includes this nonce in the MAC of its communication to B. If the MAC verifies correctly, B knows that A generated the response after B sent the request. 56 | 57 |

58 | Synchronize counter values: see Section 5.2 for a simple bootstrapping protocol, at any time the above protocol with strong freshness can be used to request the current counter value. To prevent denial of service attacks, allow transmitting the counter with each encrypted message in the above protocols, or attach another short MAC to the message that does not depend on the counter. 59 | 60 |

61 | micro TESLA: authenticated broadcast requires an asymmetric mechanism, otherwise any compromised receiver could forge messages from the sender. How can this be done without asymmetric crypto? Introduce asymmetry through delayed disclosure of symmetric keys. Idea: base station uses MAC_K with a key unknown to sensor nodes, K is a key of a key chain (K_i = F(K_{i+1}), where F is a one-way function) through which it is committed to the base station (in a key chain, keys are self-authenticating), the key chain is revealed through delayed disclosure by the base station. The key disclosure time delay is on the order of a few time intervals and greater than any reasonable round trip time. Receiver node knows the key disclosure time. Each receiver node needs to have one authentic key of the one-way key chain as a commitment to the entire chain. Sender base station and receiver nodes are loosely time synchronized. Simple bootstrapping protocol using shared secret MAC keys, see Section 5.5. 62 | 63 |

64 | Nodes cannot store the keys of a key chain: node may broadcast data through the base station, or uses the base station to outsource key chain management. 65 | 66 |

67 | Key setup: master key shared by the base station and node. How do we do key distribution? There has been a lot of research providing solutions that have good resilience, connectivity, and scalability. Controversial solution: Key infection; bootstrapping does not need to be secure, it is about security maintenance in a stationary network. Idea: transmit symmetric keys in the clear and use secrecy amplification (and other mechanisms). In secrecy amplification two nodes A and B use a third neighboring node C to set up communication between A and B. This communication channel is protected by keys K_{A,C} and K_{C,B}. It is used to exchange a nonce N. A and B replace their key K_{A,B} by H(K_{A,B}||N) and verify whether they can use this new key. If K_{A,B} is know to an adversary, but keys K_{A,C} and K_{C,B} are not, then the adversary cannot extract the new K_{A,B}! This solution has been proposed for the battlefield management application. 68 | 69 |

70 | Related topics: RFID tags, social networks, TinyDB. 71 | 72 | 73 | 74 |

-------------------------------------------------------------------------------- /previous-years/l08-browser-security.txt: -------------------------------------------------------------------------------- 1 | Browser Security (guest lecture by Ramesh Chandra) 2 | ================================================== 3 | 4 | web app security 5 | server and client -- we'll mostly focus on client 6 | 7 | web apps: past vs. present 8 | past: mainly static content, simpler security model 9 | user interactions resulted in round-trips to server 10 | present: highly dynamic content with client-side code 11 | advantages: responsiveness, better functionality 12 | more complex security model 13 | 14 | threat model / assumptions 15 | attacker controls his/her own web site, attacker.com (sounds reasonable) 16 | attacker's web site is loaded in your browser (why is this reasonable?) 17 | attacker cannot intercept/inject packets into the network 18 | browser/server doesn't have buffer overflows 19 | 20 | security policy / goals 21 | 1: isolation of code from different sites 22 | javascript code runs in your browser, has access to lots of things 23 | need to have some way of isolating code from different sites 24 | attacker should not be able to get your bank balance, xfer money, .. 25 | 26 | 2: UI security -- user needs to know what site they're talking to 27 | phishing attacks are usually the biggest problem in this space 28 | without isolation of code from diff. sites, UI security is hopeless 29 | how do you know you're interacting with your bank vs. an attacker? 30 | (if security can avoid depending on this question, all the better!) 31 | 32 | we'll largely focus on the first (isolation of code) for now 33 | 34 | how does javascript fit into the web model? 35 | HTML elements 36 | script tags; inline and src= 37 | built-in objects like window, document, etc 38 | DOM 39 | HTML elements can invoke JS code: onClick, onLoad, .. 40 | single-threaded execution; event-driven programming style for network IO 41 | frames for composing/structuring 42 | 43 | browser security model 44 | principal: domain of the web content's URL 45 | http://a.com/b.html and http://a.com/c.html are the same principal 46 | 47 | protected resource: frame 48 | principal is the domain of frame's location URL 49 | all code in the frame runs as that principal 50 | doesn't matter where the code came from (e.g. script src=...) 51 | analogous to a process in Unix 52 | 53 | protection mechanisms: 54 | javascript references as capabilities 55 | may not be able to get references to other windows/frames 56 | but there are many objects with global names 57 | access control: same origin policy 58 | privileged functions implement their own protection 59 | e.g. postMessage, XMLHttpRequest, window.open() 60 | 61 | same-origin policy (SOP) 62 | intuition/goal: only code from origin X can manipulate resources from X 63 | frame A can poke at frame B's content only if they have the same principal 64 | why does the browser allow any cross-frame access at all? 65 | frames used for layout in addition to protection 66 | unfortunately, quite vague, and overly restrictive; shows in practice 67 | exceptions to get around restrictions: 68 | script, image, css src tags: why are these needed? 69 | frame navigation 70 | 71 | 72 | frame navigation 73 | problem: navigating a frame is a special operation not governed by SOP 74 | subject to other access control rules, which this paper talks about 75 | why does the browser allow this in the first place? 76 | might have navigation links in one frame, other sites in another 77 | 78 | what goes wrong if attacker.com can navigate another frame? 79 | can substitute a phishing page for the login frame of another site (eg. bank) 80 | why doesn't the SSL icon "go away"? rule: all pages came via SSL 81 | reasoning: original site included the other origin explicitly? 82 | how does the attacker get a handle on that sub-frame? 83 | global name space of frame/window names 84 | more difficult in current browser -- firefox has per-frame name space 85 | of frame names 86 | 87 | what's their proposed fix? 88 | window policy: can only navigate frames in the same window 89 | can still mount the attack on another site if you open it within a window 90 | why is this still OK? no correct answer; mostly because of the URL bar 91 | 92 | mash-ups 93 | idea: combine data from multiple sites/origins 94 | eg: iGoogle combines frames from many developers in the same page 95 | terminology: the whole site is a "mashup" 96 | iGoogle is an "integrator" 97 | all the little boxes that are included in the page are "gadgets" 98 | what are the problems that we run into? 99 | one site's code in one frame can navigate another site's frame 100 | window policy is of no help 101 | why does it matter? UI for login, again 102 | 103 | better policy: descendant/child policy 104 | why do they argue the descendant policy is just as good as child? 105 | in theory, parent can cover up any descendant with a floating object 106 | when is child a better choice? 107 | later examples where site wants to know it's talking to the right child 108 | i.e. cases when the worry isn't the UI issues 109 | origin propagation: 110 | what's the reasoning for this? 111 | would this occur in real sites? frames used for side-by-side structure 112 | 113 | cross-origin frame communication 114 | when would you need it? mashups where origins interact 115 | why do origins need to interact on the client? can we push interactions to 116 | server-side? 117 | cleaner design => easier to implement 118 | avoid extra round trips => more responsive app 119 | better integration => better user experience 120 | nice example: yelp wants to use google maps 121 | mutually distrustful (in theory, at least) 122 | alternative 1: map in another frame (open it to some location), no feedback 123 | alternative 2: map in the same frame (script src=), no protection 124 | yelp does this today 125 | alternative 3: map in one frame, yelp in another frame, communication btwn 126 | 127 | threat model: in addition to threat model described above, we assume: 128 | attacker's gadget can load honest gadget in a subframe 129 | attacker's gadget can communicate with integrator and honest gadget 130 | 131 | goal: secure, reliable communication between origins 132 | 133 | how does frame communication work? 134 | plan 1: exploiting a covert channel! (fragment channel) 135 | problem: no authentication (where did a message come from?) 136 | workaround: treat as a network, run authentication protocol 137 | all 3 impls these guys looked at had the same bug 138 | 139 | protocol: nonces, include sender's ID (rcpt doesn't know sender) 140 | idea: each side generates a nonce, gives it to the other side 141 | if someone gives you a message w/ nonce, it came from other side 142 | 143 | what's the possible attack? 144 | attacker can impersonate integrator when talking to gadget 145 | 146 | why does it matter? gadget might have policies for diff. sites 147 | OK to add your contacts list gadget into facebook, access it 148 | not OK to access your contact list gadget by other sites 149 | 150 | how does the attack work? 151 | relay initial message to the gadget 152 | gadget replies back to the integrator 153 | integrator sends gadget's nonce to attacker, 154 | to prove it's the integrator sending the msg 155 | now the attacker has both nonces, can impersonate in both dir'n 156 | might not be able to intercept msgs from gadget, though 157 | they're sent directly to integrator's URI 158 | fix is well-known: include URI (name) in second response too 159 | 160 | plan 2: browser developers designed a special mechanism for it 161 | frame.postMessage("hello") 162 | paper claims this provides authentication but not privacy; how come? 163 | frame can re-navigate without sender's knowledge 164 | how can this happen? 165 | sender was itself in a sub-frame of attacker's site 166 | descendant policy allows attacker to access all sub*-frames 167 | why didn't the fragment channel have this problem? 168 | tight binding between message and recipient (url#msg) 169 | solution: make the binding explicit 170 | 171 | protected resource: cookie 172 | how does HTTP authentication work? 173 | browser keeps track of a session "cookie" -- arbitrary blob from server 174 | sends the cookie along with every request to that server 175 | cookie often includes username and authentication proof 176 | inside browser, same-origin policy protects cookies like frames 177 | cookie stored in document.cookie 178 | can only access cookies for your own origin 179 | 180 | possible attack: generate requests to xfer money from attacker.com 181 | 182 | 183 | solution: spaghetti-rules 184 | hard to prevent GET requests, so allow those (e.g. img tags) 185 | protect from malicious ops: include some non-cookie token in the request 186 | protect bank account balance: only see responses from the same origin 187 | except that's not quite true either 188 | script src= tags run code 189 | style src= tags load CSS style-sheets, also visible 190 | so, to protect sensitive data, make sure it doesn't parse as JS or CSS? 191 | 192 | another mechanism to secure mashups: safe subset of javascript 193 | eg: FBJS, ADSafe, Caja 194 | Facebook javascript (FBJS): compiles gadget down to a safe subset of JS 195 | per gadget name space 196 | accesses to global name space through secure wrappers 197 | intercepts all events and proxies AJAX requests thru FB 198 | gadget is embedded into FB and needs to trust FB 199 | 200 | takeaways 201 | web security lacks unifying set of principles 202 | policies such as SOP have many exceptions 203 | different browsers / runtimes (e.g. Flash) implement different policies 204 | confusing to web developers 205 | supporting existing web sites makes deploying fundamental fixes difficult 206 | lesson: think about security early on in the design 207 | 208 | -------------------------------------------------------------------------------- /l08-my-web-security.md: -------------------------------------------------------------------------------- 1 | Web security 2 | ============ 3 | 4 | Web security for a long time meant looking at what the server was doing, since the client-side was very simple. On the server, CGI scripts were executed and they interfaced with DBs, etc. 5 | 6 | These days, browsers are very complicated: 7 | 8 | * JavaScript: pages execute client-side code 9 | * The Document Object Model (DOM) 10 | * XMLHttpRequests: a way for JavaScript client-side code to fetch content from the web-server asynchronously 11 | - a.k.a AJAX 12 | * Web Sockets 13 | * Multimedia support (the `