├── .gitignore ├── BOM.md ├── LICENSE ├── README.md ├── collect-traces ├── README.md ├── client │ ├── cleanup_containers.sh │ ├── docker-debian │ │ └── Dockerfile │ ├── docker-ubuntu │ │ └── Dockerfile │ ├── exp │ │ └── collect.py │ ├── run.sh │ └── set_tb_permissions.sh ├── extract │ ├── all.sh │ ├── circpadsim.py │ ├── extract.py │ └── sim-all.sh ├── lists │ ├── monitored │ │ ├── README.md │ │ └── top-50-selected-multi.list │ └── unmonitored │ │ ├── README.md │ │ ├── reddit-front-all.list │ │ ├── reddit-front-year.list │ │ └── reddit.py └── server │ ├── circpad-server.py │ └── run-collect-server.sh ├── dataset └── README.md ├── evaluation ├── once.py ├── overhead.py ├── shared.py ├── tweak.md ├── tweak.py ├── tweak.sh └── visualize.py ├── evolve ├── README.md ├── circpadsim.py ├── evolve.py ├── loop.py ├── machine.py └── shared.py ├── machines ├── hello-world.md ├── phase2 │ ├── README.md │ ├── april-mc │ ├── april-mr │ ├── april-nopadding.png │ ├── april.png │ ├── february-mc │ ├── february-mr │ ├── february-nopadding.png │ ├── february.png │ ├── june-mc │ ├── june-mr │ ├── june-nopadding.png │ ├── june.png │ ├── march-mc │ ├── march-mr │ ├── march-nopadding.png │ ├── march.png │ ├── may-mc │ ├── may-mr │ ├── may-nopadding.png │ ├── may.png │ ├── strawman-mc │ ├── strawman-mr │ ├── strawman-nopadding.png │ └── strawman.png └── phase3 │ ├── README.md │ ├── interspace-mc.c │ ├── interspace-mr.c │ ├── interspace-nopadding.png │ ├── interspace.png │ ├── spring-mc.c │ ├── spring-mr.c │ ├── spring-nopadding.png │ └── spring.png └── notes ├── circuit-padding-framework.md └── machine-from-scratch.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Prerequisites 2 | *.d 3 | 4 | # Object files 5 | *.o 6 | *.ko 7 | *.obj 8 | *.elf 9 | 10 | # Linker output 11 | *.ilk 12 | *.map 13 | *.exp 14 | 15 | # Precompiled Headers 16 | *.gch 17 | *.pch 18 | 19 | # Libraries 20 | *.lib 21 | *.a 22 | *.la 23 | *.lo 24 | 25 | # Shared objects (inc. Windows DLLs) 26 | *.dll 27 | *.so 28 | *.so.* 29 | *.dylib 30 | 31 | # Executables 32 | *.exe 33 | *.out 34 | *.app 35 | *.i*86 36 | *.x86_64 37 | *.hex 38 | 39 | # Debug files 40 | *.dSYM/ 41 | *.su 42 | *.idb 43 | *.pdb 44 | 45 | # Kernel Module Compile Results 46 | *.mod* 47 | *.cmd 48 | .tmp_versions/ 49 | modules.order 50 | Module.symvers 51 | Mkfile.old 52 | dkms.conf 53 | 54 | collect-traces/exp/tor-browser_en-US 55 | *_pycache__* 56 | -------------------------------------------------------------------------------- /BOM.md: -------------------------------------------------------------------------------- 1 | # Software Bill of Materials 2 | This is the 3 | [software bill of materials](https://en.wikipedia.org/wiki/Software_bill_of_materials) 4 | of the padding machines created in this project. We consider software used 5 | during the development (like editor and OS) and experimentation out of scope. 6 | 7 | The machines only depend on Tor's onion router software little-t tor 8 | available at https://gitweb.torproject.org/tor.git/. 9 | 10 | A design goal for the padding machines is to not introduce further dependencies 11 | into tor for the machines to work. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2019, pulls 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Padding Machines for Tor 2 | This is the repository for the [NGI Zero PET](https://nlnet.nl/PET/) project 3 | "Padding Machines by Tor". The goal of the project was to create one or more 4 | padding machines for Tor's new [circuit padding 5 | framework](https://blog.torproject.org/new-release-tor-0405). The padding 6 | machines should defend against [Website Fingerprinting (WF) 7 | attacks](https://blog.torproject.org/critique-website-traffic-fingerprinting-attacks). 8 | 9 | ## Project Results 10 | This project made several contributions with the help of additional funding from 11 | [the Swedish Internet Foundation](https://internetstiftelsen.se/en/) for a 12 | related project. 13 | 14 | Notable results: 15 | - [Developer notes on the circuit padding framework](notes/circuit-padding-framework.md). 16 | - [Building a padding machine from scratch](notes/machine-from-scratch.md). 17 | - [Implemented an APE-like padding machine](https://github.com/pylls/tor/tree/circuit-padding-ape-machine). 18 | - Tor trac tickets: [#31098](https://trac.torproject.org/projects/tor/ticket/31098), 19 | [#31111](https://trac.torproject.org/projects/tor/ticket/31111), 20 | [#31112](https://trac.torproject.org/projects/tor/ticket/31112), 21 | [#31113](https://trac.torproject.org/projects/tor/ticket/31113). 22 | 23 | - [A minimal simulator](https://github.com/pylls/circpad-sim) for padding 24 | machines in Tor's circuit padding framework, see 25 | [#31788](https://trac.torproject.org/projects/tor/ticket/31788). 26 | - [Simple collection tools](collect-traces/) for collecting traces for the circpad simulator. 27 | - [The goodenough dataset](dataset/) tailored to the circpad simulator and for 28 | creating "good enough" machines. 29 | - [An evaluation tool](evaluation/once.py) for running the Deep Fingerprinting 30 | (DF) attack against a dataset, producing a number of relevant metrics. Based 31 | on a port of DF to PyTorch. 32 | - [An example machine](machines/hello-world.md) designed, implemented, 33 | evaluated, and documented. 34 | - [Evolved machines using genetic programming](machines/phase2). The best 35 | machine is a more effective defense against DF than WTF-PAD. 36 | - [The final padding machines for Tor](machines/phase3) consisting of a 37 | cleaned-up version of the best evolved machine and a tailored machine that is 38 | an even better defense. 39 | - [Tools for evolving machines](evolve/) using genetic programming. 40 | - [Highlights of the 41 | project](https://lists.torproject.org/pipermail/tor-project/2020-November/003018.html) 42 | were shared as part of the November 2020 Tor DEMO Day. 43 | 44 | The work in the project is documented in a pre-print paper on arxiv. Results 45 | from the pre-print will be incorporated into a later submission to an academic 46 | conference together with new unpublished results (other project). 47 | 48 | ## Acknowledgements 49 | This project is made possible thanks to a generous grant from the [NGI Zero 50 | PET](https://nlnet.nl/PET/) project, that in turn is made possible with 51 | financial support from the [European Commission's](https://ec.europa.eu/) [Next 52 | Generation Internet](https://www.ngi.eu/) programme, under the aegis of [DG 53 | Communications Networks, Content and 54 | Technology](https://ec.europa.eu/info/departments/communications-networks-content-and-technology_en). 55 | Co-financing (for administrative costs and equipment) is provided by [Computer 56 | Science](https://www.kau.se/en/cs) at [Karlstad 57 | University](https://www.kau.se/en). [The Swedish Internet 58 | Foundation](https://internetstiftelsen.se/en/) also funded part of the work by 59 | enabling me to spend extra time on the simulator (synergies with another 60 | project) and tweaking the Interspace machine. -------------------------------------------------------------------------------- /collect-traces/README.md: -------------------------------------------------------------------------------- 1 | # Howto collect a lot of traces 2 | Here we describe how we collect traces from Tor Browser in a large scale with 3 | relative ease. We make basic use of python, shell-scripts, and containers. The 4 | idea is to run many headless Tor Browser clients in containers that repeatedly 5 | get work from a collection server. The work consists of a URL to visit. While 6 | visiting a URL the client records its tor log and uploads it the server. 7 | 8 | Note that everything in this folder is of *research quality*, we share this with 9 | the hope of making it easier for other researchers. 10 | 11 | ## Modify Tor Browser 12 | First download this folder and a fresh Linux Tor Browser install from 13 | torproject.org. Edit `Browser/start-tor-browser`, line 12, change it to: 14 | 15 | ```bash 16 | if [ "x$DISPLAY" = "x" ] && [[ "$*" != *--headless* ]]; then 17 | ``` 18 | 19 | This makes it possible to run Tor Browser in headless mode without a full X 20 | install (no more `xvfb`, yay). 21 | 22 | Tor Browser in headless mode will not display the Tor launcher start-up prompt and will hang indefinitely. To skip the prompt, create a preference file in the Firefox profile that ships with TB: `Browser/TorBrowser/Data/Browser/profile.default/user.js`, and add this single line: 23 | 24 | ``` 25 | user_pref("extensions.torlauncher.prompt_at_startup", false); 26 | ``` 27 | 28 | Edit `Browser/TorBrowser/Data/Tor/torrc` and set any necessary restrictions, 29 | e.g., `EntryNodes`, `MiddleNodes` or`UseEntryGuards`, depending on experiment to 30 | run. If you're using the [circuitpadding 31 | simulator](https://github.com/pylls/circpad-sim), build the custom `tor` binary, 32 | add it to TB at `Browser/TorBrowser/Tor/`, and add ``Log [circ]info notice 33 | stdout'' to `torrc`. 34 | 35 | When we collected our traces for the goodenough dataset we used the following torrc: 36 | 37 | ``` 38 | Log [circ]info notice stdout 39 | UseEntryGuards 0 40 | ``` 41 | 42 | ## Build the docker container 43 | 1. On the machine(s) you want to use for collection, install docker. 44 | 2. Build the Dockerfile in either `docker-debian` or `docker-ubuntu`, depending 45 | on what fits the environment where you built the custom `tor`binary. You 46 | build the container by running: `docker build -t wf-collect .` (note the 47 | dot). 48 | 49 | ## Starting containers 50 | On each machine: 51 | 1. Copy `tor-browser_en-US` that you modified earlier into `exp`. 52 | 2. Run `./set_tb_permissions.sh`. 53 | 54 | Edit `run.sh` and then run it. 55 | 56 | For our experiments we created three zip-files of Tor Browser with different 57 | security levels/settings set and put them all in the `exp` folder. We then used 58 | the following command to rotate on each machine: 59 | 60 | ``` 61 | rm -rf collect/exp/tor-browser_en-US && cd collect/exp/ && unzip tor-browser_en-US-safest.zip && cd ../ && ./set_tb_permissions.sh && ./run.sh 62 | ``` 63 | 64 | ## Setup a collection server 65 | Run `circpad-server.py` on a server that can be reached from the docker 66 | containers. The parameters to the script are largely self-explanatory: 67 | 68 | ``` 69 | usage: circpad-server.py [-h] -l L -n N -d D [-m M] [-s S] 70 | 71 | optional arguments: 72 | -h, --help show this help message and exit 73 | -l L file with list of sites to visit, one site per line 74 | -n N number of samples 75 | -d D data folder for storing results 76 | -m M minimum number of lines in torlog to accept 77 | -s S stop collecting at this many logs collected, regardless of 78 | remaining sites or samples (useful for unmonitored sites) 79 | ``` 80 | 81 | All clients will attempt to get work from the server, and on failure, sleeps for 82 | the specified timeout (default: 60s) before trying again. For our experiments we 83 | used 7 machines with 20 containers each talking to a server with a single modest 84 | core without much trouble. All machines, including the server, were located in 85 | the same cluster. Running the server on separate physical machines may mean the 86 | server becomes a too-big bottleneck during collection due to being single 87 | threaded. 88 | 89 | ## Extract traces from the dataset 90 | Once you've collected your raw dataset the next step is to extract the useful 91 | logs and get some circpad traces. The `extract` folder contains all you need: 92 | the `extract.py` script will verify that the logs contain traces from visiting 93 | the intended websites and structure the dataset as in our goodenough dataset. 94 | 95 | ``` 96 | usage: extract.py [-h] -i I -o O -t T -l L [--monitored] [--unmonitored] 97 | [-c C] [-s S] [-m M] 98 | 99 | optional arguments: 100 | -h, --help show this help message and exit 101 | -i I input folder of logs 102 | -o O output folder for logs 103 | -t T output folder for traces 104 | -l L file with list of sites to visit, one site per line 105 | --monitored extract monitored 106 | --unmonitored extract unmonitored 107 | -c C the number of monitored classes 108 | -s S the number of samples 109 | -m M minimum number of lines in a trace 110 | ``` 111 | 112 | See the helper `all.sh` script for examples on how to use `extract.py`. 113 | -------------------------------------------------------------------------------- /collect-traces/client/cleanup_containers.sh: -------------------------------------------------------------------------------- 1 | docker stop $(docker ps -a -q) 2 | docker rm $(docker ps -a -q) -------------------------------------------------------------------------------- /collect-traces/client/docker-debian/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM debian:testing 2 | MAINTAINER Tobias Pulls 3 | 4 | # install and clean since we'll be running many copies 5 | RUN apt-get update && apt-get install -y \ 6 | dumb-init \ 7 | python3 \ 8 | python3-requests \ 9 | firefox-esr 10 | RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* 11 | 12 | # setup a user to not run as root 13 | ENV HOME /home/user 14 | ENV LANG C.UTF-8 15 | RUN useradd --create-home --home-dir $HOME user 16 | WORKDIR $HOME 17 | USER user 18 | 19 | ENTRYPOINT ["dumb-init", "--"] 20 | -------------------------------------------------------------------------------- /collect-traces/client/docker-ubuntu/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:rolling 2 | MAINTAINER Tobias Pulls 3 | 4 | # install and clean since we'll be running many copies 5 | RUN apt-get update && apt-get install -y \ 6 | dumb-init \ 7 | python3 \ 8 | python3-requests \ 9 | firefox 10 | RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* 11 | 12 | # setup a user to not run as root 13 | ENV HOME /home/user 14 | ENV LANG C.UTF-8 15 | RUN useradd --create-home --home-dir $HOME user 16 | WORKDIR $HOME 17 | USER user 18 | 19 | ENTRYPOINT ["dumb-init", "--"] 20 | -------------------------------------------------------------------------------- /collect-traces/client/exp/collect.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """Collect Network Traces from Tor Browser 3 | 4 | We assume that this script will execute from multiple containers that share the 5 | exact same script arguments. The script gets its work and uploads the traces it 6 | collects to the sever at the specific url. 7 | """ 8 | import argparse 9 | import os 10 | import sys 11 | import random 12 | import shutil 13 | import string 14 | import tempfile 15 | import time 16 | import datetime 17 | import subprocess 18 | import signal 19 | 20 | import requests 21 | from requests.exceptions import Timeout, ConnectionError, ConnectTimeout 22 | 23 | ap = argparse.ArgumentParser() 24 | ap.add_argument("-b", required=True, 25 | help="folder with Tor Browser") 26 | ap.add_argument("-u", required=True, 27 | help="the complete URL to the server") 28 | 29 | ap.add_argument("-a", required=False, default=5, type=int, 30 | help="number of attempts to collect each trace") 31 | ap.add_argument("-t", required=False, default=60, type=int, 32 | help="timeout (s) for each TB visit") 33 | ap.add_argument("-m", required=False, default=100, type=int, 34 | help="minimum number of liens in torlog to accept") 35 | args = vars(ap.parse_args()) 36 | 37 | TBFILE = "start-tor-browser.desktop" 38 | CIRCPAD_EVENT = "circpad_trace_event" 39 | 40 | tmpdir = tmpdir = tempfile.mkdtemp() 41 | 42 | def main(): 43 | if not os.path.exists(args["b"]): 44 | sys.exit(f"Tor Browser directory {args['b']} does not exist") 45 | if not os.path.isfile(os.path.join(args["b"], TBFILE)): 46 | sys.exit(f"Tor Browser directory {args['b']} missing {TBFILE}") 47 | 48 | # on SIGINT remove the temporary folder 49 | def signal_handler(sig, frame): 50 | shutil.rmtree(tmpdir) 51 | sys.exit(0) 52 | signal.signal(signal.SIGINT, signal_handler) 53 | 54 | tb = make_tb_copy(args["b"]) 55 | print("two warmup visits for fresh consensus and whatnot update checks") 56 | print(f"\t got {len(visit('kau.se/en', tb, args['t']))} log-lines") 57 | print(f"\t got {len(visit('kau.se/en/cs', tb, args['t']))} log-lines") 58 | 59 | work = "" 60 | last_site = "" 61 | while True: 62 | # either work will be empty or contain the log from collecting below 63 | if work == "": 64 | # get new work 65 | work = get_work() 66 | else: 67 | # upload work and get new work 68 | work = upload_work(work, last_site) 69 | 70 | # do any work if we got any, or sleep a bit 71 | if work != "": 72 | last_site = work 73 | work = collect(last_site, tb) 74 | else: 75 | time.sleep(args["t"]) 76 | 77 | # cleanup, if we ever get here somehow 78 | shutil.rmtree(tmpdir) 79 | 80 | def get_work(): 81 | try: 82 | response = requests.get(args["u"], timeout=args["t"]) 83 | if response: 84 | return response.content.decode('UTF-8') 85 | except (Timeout, ConnectionError, ConnectTimeout): 86 | return "" 87 | return "" 88 | 89 | def upload_work(log, site): 90 | print(f"\t {now()} uploading log of len {len(log)}...") 91 | 92 | try: 93 | response = requests.post( 94 | args["u"], 95 | timeout=args["t"], 96 | data=[('log', '\n'.join(log)), ('site', site)] 97 | ) 98 | if response: 99 | return response.content.decode('UTF-8') 100 | except (Timeout, ConnectionError, ConnectTimeout): 101 | return "" 102 | return "" 103 | 104 | def collect(site, tb_orig): 105 | print(f"attempting to collect site {site}") 106 | for _ in range(args["a"]): 107 | # create fresh TB copy for this visit 108 | tb = make_tb_copy(tb_orig) 109 | 110 | # visit with TB, blocking, and get stdout (the log) 111 | log = visit(site, tb, args["t"]) 112 | print(f"\t {now()} got {len(log)} circpad events in log") 113 | 114 | # cleanup our TB copy 115 | shutil.rmtree(tb) 116 | 117 | # done if long enough trace 118 | if len(log) >= args["m"]: 119 | return log 120 | 121 | return "" 122 | 123 | def make_tb_copy(src): 124 | dst = os.path.join(tmpdir, 125 | ''.join(random.choices(string.ascii_uppercase + string.digits, k=24))) 126 | 127 | # ibus breaks on multiple copies that move location, need to ignore 128 | shutil.copytree(src, dst, ignore=shutil.ignore_patterns('ibus')) 129 | return dst 130 | 131 | def visit(url, tb, timeout): 132 | tb = os.path.join(tb, "Browser", "start-tor-browser") 133 | url = url.replace("'", "\\'") 134 | url = url.replace(";", "\;") 135 | cmd = f"timeout -k 5 {str(timeout)} {tb} --verbose --headless {url}" 136 | print(f"\t {now()} {cmd}") 137 | 138 | result = subprocess.run( 139 | cmd, 140 | capture_output=True, 141 | text=True, 142 | shell=True 143 | ) 144 | 145 | return filter_circpad_lines(result.stdout) 146 | 147 | def filter_circpad_lines(stdout): 148 | ''' Filters the log for trace events from the circuitpadding 149 | framework, saving space. 150 | ''' 151 | out = [] 152 | lines = stdout.split("\n") 153 | for l in lines: 154 | if CIRCPAD_EVENT in l: 155 | out.append(l) 156 | 157 | return out 158 | 159 | def now(): 160 | return datetime.datetime.now() 161 | 162 | if __name__ == "__main__": 163 | main() 164 | -------------------------------------------------------------------------------- /collect-traces/client/run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # the number of docker instances to create 4 | WORKERS=20 5 | 6 | # the absolute path to the experiment folder where you put collect.py and TB 7 | FOLDER=/home/user/collect/exp/ 8 | 9 | # the number of seconds to collect data for per instance visit 10 | TIMEOUT=60 11 | 12 | # the minimum number of circpad trace events to accept 13 | MIN=100 14 | 15 | # the URL of the server 16 | SERVER=http://server:5000 17 | 18 | for ((n=0;n<$WORKERS;n++)) do 19 | docker run -d \ 20 | -v $FOLDER:/home/user/exp wf-collect \ 21 | python3 -u \ 22 | /home/user/exp/collect.py \ 23 | -b /home/user/exp/tor-browser_en-US/ \ 24 | -u $SERVER \ 25 | -m $MIN \ 26 | -t $TIMEOUT 27 | done 28 | -------------------------------------------------------------------------------- /collect-traces/client/set_tb_permissions.sh: -------------------------------------------------------------------------------- 1 | chmod a+r -R exp/tor-browser_en-US/ 2 | find exp/tor-browser_en-US/ -type d -print0 | xargs -0 chmod 755 -------------------------------------------------------------------------------- /collect-traces/extract/all.sh: -------------------------------------------------------------------------------- 1 | # we assume that raw contains folders as collected by `circpad-server.py` with the same lists 2 | 3 | # extract monitored 4 | ./extract.py -i raw/safer-mon/ -o safer/client-logs/monitored/ -t safer/client-traces/monitored/ -l top-50-selected-multi.list --monitored 5 | ./extract.py -i raw/safest-mon/ -o safest/client-logs/monitored/ -t safest/client-traces/monitored/ -l top-50-selected-multi.list --monitored 6 | ./extract.py -i raw/standard-mon/ -o standard/client-logs/monitored/ -t standard/client-traces/monitored/ -l top-50-selected-multi.list --monitored 7 | # extract unmonitored 8 | ./extract.py -i raw/safer4-unmon/ -o safer/client-logs/unmonitored/ -t safer/client-traces/unmonitored/ -l reddit-front-year.list --unmonitored 9 | ./extract.py -i raw/safest4-unmon/ -o safest/client-logs/unmonitored/ -t safest/client-traces/unmonitored/ -l reddit-front-year.list --unmonitored 10 | ./extract.py -i raw/standard-unmon/ -o standard/client-logs/unmonitored/ -t standard/client-traces/unmonitored/ -l reddit-front-year.list --unmonitored 11 | 12 | # simulate fake relay traces 13 | ./simrelaytrace.py -i safer/client-traces/monitored/ -o safer/fakerelay-traces/monitored/ 14 | ./simrelaytrace.py -i safer/client-traces/unmonitored/ -o safer/fakerelay-traces/unmonitored/ 15 | ./simrelaytrace.py -i safest/client-traces/monitored/ -o safest/fakerelay-traces/monitored/ 16 | ./simrelaytrace.py -i safest/client-traces/unmonitored/ -o safest/fakerelay-traces/unmonitored/ 17 | ./simrelaytrace.py -i standard/client-traces/monitored/ -o standard/fakerelay-traces/monitored/ 18 | ./simrelaytrace.py -i standard/client-traces/unmonitored/ -o standard/fakerelay-traces/unmonitored/ 19 | -------------------------------------------------------------------------------- /collect-traces/extract/circpadsim.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import sys 3 | import socket 4 | 5 | CIRCPAD_ERROR_WRONG_FORMAT = "invalid trace format" 6 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin" 7 | 8 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent" 9 | CIRCPAD_EVENT_NONPADDING_RECV = "circpad_cell_event_nonpadding_received" 10 | CIRCPAD_EVENT_PADDING_SENT = "circpad_cell_event_padding_sent" 11 | CIRCPAD_EVENT_PADDING_RECV = "circpad_cell_event_padding_received" 12 | 13 | CIRCPAD_LOG = "circpad_trace_event" 14 | CIRCPAD_LOG_TIMESTAMP = "timestamp=" 15 | CIRCPAD_LOG_CIRC_ID = "client_circ_id=" 16 | CIRCPAD_LOG_EVENT = "event=" 17 | 18 | CIRCPAD_BLACKLISTED_ADDRESSES = ["aus1.torproject.org"] 19 | CIRCPAD_BLACKLISTED_EVENTS = [ 20 | "circpad_negotiate_logging" 21 | ] 22 | 23 | def circpad_get_all_addresses(trace): 24 | addresses = [] 25 | for l in trace: 26 | if len(l) < 2: 27 | sys.exit(CIRCPAD_ERROR_WRONG_FORMAT) 28 | if CIRCPAD_ADDRESS_EVENT in l[1]: 29 | if len(l[1]) < 2: 30 | sys.exit(CIRCPAD_ERROR_WRONG_FORMAT) 31 | addresses.append(l[1].split()[1]) 32 | return addresses 33 | 34 | def circpad_get_nonpadding_times(trace): 35 | sent_nonpadding, recv_nonpadding = [], [] 36 | 37 | for l in trace: 38 | split = l.split() 39 | if CIRCPAD_EVENT_NONPADDING_SENT in split[1]: 40 | sent_nonpadding.append(split[0]) 41 | elif CIRCPAD_EVENT_NONPADDING_RECV in split[1]: 42 | recv_nonpadding.append(split[0]) 43 | 44 | return sent_nonpadding, recv_nonpadding 45 | 46 | def circpad_get_padding_times(trace): 47 | sent_padding, recv_padding = [], [] 48 | 49 | for l in trace: 50 | split = l.split() 51 | if CIRCPAD_EVENT_PADDING_SENT in split[1]: 52 | sent_padding.append(split[0]) 53 | elif CIRCPAD_EVENT_PADDING_RECV in split[1]: 54 | recv_padding.append(split[0]) 55 | 56 | return sent_padding, recv_padding 57 | 58 | def circpad_parse_line(line): 59 | split = line.split() 60 | assert(len(split) >= 2) 61 | event = split[1] 62 | timestamp = int(split[0]) 63 | 64 | return event, timestamp 65 | 66 | def circpad_lines_to_trace(lines): 67 | trace = [] 68 | for l in lines: 69 | event, timestamp = circpad_parse_line(l) 70 | trace.append((timestamp, event)) 71 | return trace 72 | 73 | def circpad_extract_log_traces( 74 | log_lines, 75 | source_client=True, 76 | source_relay=True, 77 | allow_ips=False, 78 | filter_client_negotiate=False, 79 | filter_relay_negotiate=False, 80 | max_length=999999999 81 | ): 82 | # helper function 83 | def blacklist_hit(d): 84 | for a in circpad_get_all_addresses(d): 85 | if a in CIRCPAD_BLACKLISTED_ADDRESSES: 86 | return True 87 | return False 88 | 89 | # helper to extract one line 90 | def extract_from_line(line): 91 | n = line.index(CIRCPAD_LOG_TIMESTAMP)+len(CIRCPAD_LOG_TIMESTAMP) 92 | timestamp = line[n:].split(" ", maxsplit=1)[0] 93 | n = line.index(CIRCPAD_LOG_CIRC_ID)+len(CIRCPAD_LOG_CIRC_ID) 94 | cid = line[n:].split(" ", maxsplit=1)[0] 95 | 96 | # an event is the last part, no need to split on space like we did earlier 97 | n = line.index(CIRCPAD_LOG_EVENT)+len(CIRCPAD_LOG_EVENT) 98 | event = line[n:] 99 | 100 | return int(cid), int(timestamp), event 101 | 102 | circuits = {} 103 | base = -1 104 | for line in log_lines: 105 | if CIRCPAD_LOG in line: 106 | # skip client/relay if they shouldn't be part of the trace 107 | if not source_client and "source=client" in line: 108 | continue 109 | if not source_relay and "source=relay" in line: 110 | continue 111 | 112 | # extract trace and make timestamps relative 113 | cid, timestamp, event = extract_from_line(line) 114 | if base == -1: 115 | base = timestamp 116 | timestamp = timestamp - base 117 | 118 | # store trace 119 | if cid in circuits.keys(): 120 | if len(circuits[cid]) < max_length: 121 | circuits[cid] = circuits.get(cid) + [(timestamp, event)] 122 | else: 123 | circuits[cid] = [(timestamp, event)] 124 | 125 | # filter out circuits with blacklisted addresses 126 | for cid in list(circuits.keys()): 127 | if blacklist_hit(circuits[cid]): 128 | del circuits[cid] 129 | # filter out circuits with only IPs (unless arg says otherwise) 130 | for cid in list(circuits.keys()): 131 | if not allow_ips and circpad_only_ips_in_trace(circuits[cid]): 132 | del circuits[cid] 133 | 134 | # remove blacklisted events (and associated events) 135 | for cid in list(circuits.keys()): 136 | circuits[cid] = circpad_remove_blacklisted_events(circuits[cid], 137 | filter_client_negotiate, filter_relay_negotiate) 138 | 139 | return circuits 140 | 141 | 142 | def circpad_remove_blacklisted_events( 143 | trace, 144 | filter_client_negotiate, 145 | filter_relay_negotiate 146 | ): 147 | 148 | result = [] 149 | ignore_next_send_cell = False 150 | 151 | for line in trace: 152 | strline = str(line) # What the hell was this before? 153 | 154 | # If we hit a blacklisted event, this means we should ignore the next 155 | # sent nonpadding cell. Since the blacklisted event should only be 156 | # triggered client-side, there shouldn't be any impact on relay traces. 157 | if any(b in strline for b in CIRCPAD_BLACKLISTED_EVENTS): 158 | ignore_next_send_cell = True 159 | else: 160 | if ignore_next_send_cell and CIRCPAD_EVENT_NONPADDING_SENT in strline: 161 | ignore_next_send_cell = False 162 | else: 163 | result.append(line) 164 | 165 | return result 166 | 167 | def circpad_only_ips_in_trace(trace): 168 | def is_ipv4(addr): 169 | try: 170 | socket.inet_aton(addr) 171 | except (socket.error, TypeError): 172 | return False 173 | return True 174 | def is_ipv6(addr): 175 | try: 176 | socket.inet_pton(addr,socket.AF_INET6) 177 | except (socket.error, TypeError): 178 | return False 179 | return True 180 | 181 | for a in circpad_get_all_addresses(trace): 182 | if not is_ipv4(a) and not is_ipv6(a): 183 | return False 184 | return True 185 | 186 | 187 | def circpad_to_wf(trace, cells=False, timecells=False, dirtime=False, strip=False): 188 | ''' Get a WF representation of the trace in the specified format. 189 | 190 | We support three formats: 191 | - cells, each line only contains 1 or -1 for outgoing or incoming cells. 192 | - timecells, relative timestamp (ms) added before each cell. 193 | - directionaltime, each line has relative time multiplied with cell value. 194 | 195 | If the strip flag is set, events prior to a first domain resolution is 196 | stripped from the trace (if present). Cirucits are typically created in the 197 | background by Tor Browser to speed-up browsing for users. Removing this is 198 | beneficital for WF attackers, because it's often assumed (more or less 199 | realistically so) that an attacker can often detect this (mainly by a 200 | significant time of "silence" on the wire, followed by what is assumed to be 201 | a website load). 202 | 203 | FIXME: For timecells and dirtime, current magnitute is nanoseconds, might be 204 | more efficient to round to seconds with lower resultion, especially for deep 205 | learning attacks. 206 | ''' 207 | result = [] 208 | 209 | # only strip if we find the event for an address being resolved 210 | if strip: 211 | for i, l in enumerate(trace): 212 | if CIRCPAD_ADDRESS_EVENT in l[1]: 213 | trace = trace[i:] 214 | break 215 | 216 | for l in trace: 217 | # outgoing is positive 218 | if CIRCPAD_EVENT_NONPADDING_SENT in l[1] or \ 219 | CIRCPAD_EVENT_PADDING_SENT in l[1]: 220 | if cells: 221 | result.append("1") 222 | if timecells: 223 | result.append(f"{l[0]} 1") 224 | if dirtime: 225 | result.append(f"{l[0]}") 226 | 227 | # incoming is negative 228 | elif CIRCPAD_EVENT_NONPADDING_RECV in l[1] or \ 229 | CIRCPAD_EVENT_PADDING_RECV in l[1]: 230 | if cells: 231 | result.append("-1") 232 | if timecells: 233 | result.append(f"{l[0]} -1") 234 | if dirtime: 235 | result.append(f"{l[0]*-1}") 236 | return result 237 | -------------------------------------------------------------------------------- /collect-traces/extract/extract.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import argparse 3 | import os 4 | import sys 5 | import shutil 6 | import circpadsim 7 | 8 | ''' Given an input and output folders, extract results. 9 | 10 | Monitored: dimension, use backups, give error, check for already done 11 | 12 | Unmonitored: num, check if done, pick random 13 | ''' 14 | ap = argparse.ArgumentParser() 15 | ap.add_argument("-i", required=True, 16 | help="input folder of logs") 17 | ap.add_argument("-o", required=True, 18 | help="output folder for logs") 19 | ap.add_argument("-t", required=True, 20 | help="output folder for traces") 21 | ap.add_argument("-l", required=True, 22 | help="file with list of sites to visit, one site per line") 23 | 24 | ap.add_argument("--monitored", required=False, default=False, 25 | action="store_true", help="extract monitored") 26 | ap.add_argument("--unmonitored", required=False, default=False, 27 | action="store_true", help="extract unmonitored") 28 | 29 | ap.add_argument("-c", required=False, type=int, default=500, 30 | help="the number of monitored classes") 31 | ap.add_argument("-s", required=False, type=int, default=20, 32 | help="the number of samples") 33 | ap.add_argument("-m", required=False, default=100, type=int, 34 | help="minimum number of lines in a trace") 35 | args = vars(ap.parse_args()) 36 | 37 | def main(): 38 | if ( 39 | (args["monitored"] and args["unmonitored"]) or 40 | (not args["monitored"] and not args["unmonitored"]) 41 | ): 42 | sys.exit("needs exactly one of --monitored or --unmonitored") 43 | 44 | if not os.path.isdir(args["i"]): 45 | sys.exit(f"{args['i']} is not a directory") 46 | if not os.path.isdir(args["o"]): 47 | sys.exit(f"{args['o']} is not a directory") 48 | if not os.path.isdir(args["t"]): 49 | sys.exit(f"{args['t']} is not a directory") 50 | 51 | inlist = os.listdir(args["i"]) 52 | if len(inlist) < args["c"]*args["s"]: 53 | sys.exit( 54 | f'tasked to extract {args["c"]*args["s"]} samples, ' 55 | f'but {args["i"]} contains at most ' 56 | f'{len(inlist)} samples' 57 | ) 58 | 59 | outlist = os.listdir(args["o"]) 60 | if len(outlist) > 0: 61 | sys.exit(f'{args["o"]} is not empty') 62 | 63 | tracelist = os.listdir(args["t"]) 64 | if len(tracelist) > 0: 65 | sys.exit(f'{args["t"]} is not empty') 66 | 67 | print(f"reading sites list {args['l']}") 68 | sites = get_sites_list() 69 | print(f"ok, list has {len(sites)} starting sites") 70 | 71 | if args["monitored"]: 72 | print("monitored") 73 | for c in range(args["c"]): 74 | print(f"{c}-{0}") 75 | # every class has backup traces, that is, extract logs we collected, 76 | # starting from the intended sample counter until there is no more 77 | # such file 78 | backup = args["s"] 79 | site = sites[c] 80 | for i in range(args["s"]): 81 | infname = f"{c}-{i}.log" 82 | trace, readfname, backup = find_good_trace(c, i, backup, site) 83 | write_trace(trace, results_trace_file(infname)) 84 | write_log(readfname, infname) 85 | else: 86 | print("unmonitored") 87 | n = 0 88 | for index, site in enumerate(sites): 89 | if n >= args["c"]*args["s"]: 90 | break 91 | 92 | infname = f"{index}-0.log" 93 | if not os.path.exists(os.path.join(args["i"], infname)): 94 | continue 95 | 96 | trace, good = extract_trace(infname, site) 97 | if not good: 98 | print(f"not good {infname}") 99 | continue 100 | 101 | write_trace(trace, results_trace_file(infname)) 102 | write_log(infname, infname) 103 | 104 | n += 1 105 | if n % 100 == 0: 106 | print(n) 107 | 108 | def write_log(src, dst): 109 | shutil.copy( 110 | os.path.join(args["i"], src), 111 | os.path.join(args["o"], dst) 112 | ) 113 | 114 | def write_trace(output, fname): 115 | # make time relative before writing 116 | base = -1 117 | with open(fname, "w") as f: 118 | for l in output: 119 | t = int(l[0]) 120 | if base == -1: 121 | base = t 122 | t = t - base 123 | f.write(f"{t:016d} {l[1].strip()}\n") 124 | 125 | def find_good_trace(c, i, backup, site): 126 | inst = i 127 | while True: 128 | infname = f"{c}-{inst}.log" 129 | if not os.path.exists(os.path.join(args["i"], infname)): 130 | sys.exit(f"not enough logs for class {c}, instance {i}") 131 | 132 | trace, good = extract_trace(infname, site) 133 | if good: 134 | return trace, infname, backup 135 | # no good, try a backup 136 | print(f"need backup for {c}-{i}") 137 | inst = backup 138 | backup += 1 139 | 140 | def extract_trace(infname, site): 141 | circuits = {} 142 | with open(os.path.join(args["i"], infname), 'r') as f: 143 | circuits = circpadsim.circpad_extract_log_traces(f.readlines(), 144 | True, True, False, False, False, 10*1000) 145 | 146 | if len(circuits) == 0: 147 | return "", False 148 | 149 | # try to find the first circuit with our site that is of acceptable length 150 | for cid in circuits: 151 | for l in circuits[cid]: 152 | s = l[1].split(" ") 153 | if s[len(s)-1].rstrip() in site: 154 | if len(circuits[cid]) >= args["m"]: 155 | return circuits[cid], True 156 | 157 | return "", False 158 | 159 | def results_trace_file(fname): 160 | if os.path.splitext(fname)[1] == ".log": 161 | return os.path.join(args["t"], os.path.splitext(fname)[0]+'.trace') 162 | return os.path.join(args["t"], fname+'.trace') 163 | 164 | def get_sites_list(): 165 | l = [] 166 | with open(args["l"]) as f: 167 | for line in f: 168 | site = line.rstrip() 169 | if site in l: 170 | print(f"warning, list of sites has duplicate: {site}") 171 | l.append(site) 172 | return l 173 | 174 | if __name__ == "__main__": 175 | main() 176 | -------------------------------------------------------------------------------- /collect-traces/extract/sim-all.sh: -------------------------------------------------------------------------------- 1 | ./simrelaytrace.py -i safer/client-traces/monitored/ -o safer/fakerelay-traces/monitored/ 2 | ./simrelaytrace.py -i safer/client-traces/unmonitored/ -o safer/fakerelay-traces/unmonitored/ 3 | ./simrelaytrace.py -i safest/client-traces/monitored/ -o safest/fakerelay-traces/monitored/ 4 | ./simrelaytrace.py -i safest/client-traces/unmonitored/ -o safest/fakerelay-traces/unmonitored/ 5 | #./simrelaytrace.py -i standard/client-traces/monitored/ -o standard/fakerelay-traces/monitored/ 6 | #./simrelaytrace.py -i standard/client-traces/unmonitored/ -o standard/fakerelay-traces/unmonitored/ 7 | -------------------------------------------------------------------------------- /collect-traces/lists/monitored/README.md: -------------------------------------------------------------------------------- 1 | # top-50-selected-multi list 2 | This document describes how the top-50-selected-multi.list file was created. Had 3 | to get to Alexa rank 212. 4 | 5 | ## February update 6 | - replaced five headlines from headlines.yahoo.co.jp 7 | - replaced one link on ebay.com 8 | replaced one etsy.com link, shop taking a break 9 | 10 | ## January update 11 | - replaced two headlines that 404:ed from headlines.yahoo.co.jp 12 | - replaced four items that was replaced from ebay.com 13 | - replaced four articles that 404:ed from suara.com 14 | 15 | ## How created 16 | Approach for the list is simple but boring. Starting from the top of the Alexa 17 | top-list, asking two questions to decide if we include the site or not: 18 | - Is the site reliable to visit over Tor? That is, not behind some cloudwall or 19 | blacklist? Also, does it load reliably. 20 | - Does the site contain several _similar_ webpages beyond the frontpage that can 21 | be accessed? 22 | 23 | ## General pruning 24 | - remove all tracking stuff in URL that still gives the page on a clean TB 25 | - avoid porn (for sake of work) 26 | - try not to mix significant content types on pages, like mixing videos and text 27 | articles on news-site (that's two different classes) 28 | 29 | ## Per-site notes 30 | - wikipedia.org: took 10 links from today's feature article 31 | - amazon.com: first 10 of Today's deals under $25 32 | - reddit.com: the first 10 subreddits on the frontpage 33 | - okezone.com: the first 10 articles 34 | - yahoo.co.jp: the first 10 articles 35 | - tor.stackexchange.com: 10 latest questions 36 | - ebay.com: first 10 offers 37 | - aliexpress.com: top 10 selection items 38 | - msn.com: first 10 articles 39 | - tribunnews.com: first 10 articles 40 | - twitch.tv: top 10 games listing 41 | - yandex.ru: top 10 clips in the first category shown 42 | - imdb.com: top 10 of movies opening this week 43 | - aws.amazon.com: the first product page from the first 10 products (reading order) 44 | - booking.com: search in first 10 places listed 45 | - medium.com: top 10 features stories 46 | - detik.com: top 10 in news feed 47 | - bbc.com: top 10 articles 48 | - indeed.com: top 10 popular job searches 49 | - w3schools.com: top 10 links left column 50 | - nytimes.com: first 10 articles 51 | - cnn.com: first 10 articles, mixed content types 52 | - imgur.com: top 10 viral 53 | - fandom.com: top 10 articles with big underlineable links 54 | - stackexchange.com: top 10 hot links 55 | - soundcloud.com: top 10 trending 56 | - github.com: top 10 trending today 57 | - nih.gov: 10 latest news releases 58 | - theguardian.com: 10 frontpage linked articles 59 | - slideshare.net: top 10 featured slides 60 | - sindonews.com: top 10 "TERPOPULER" 61 | - freepik.com: top 10 Feepik's choice 62 | - uol.com.br: 10 articles on frontpage 63 | - walmart.com: top 10 shop categories 64 | - etsy.com: top 10 personalized jewellery from frontpage 65 | - wikihow.com: top 10 linked wikis 66 | - craigslist.org: 10 different cities in different US states 67 | - ladbible.com: top 10 trending 68 | - archive.org: top 10 collections at the archive 69 | - nicovideo.jp: top 10 ranked videos 70 | - setn.com: 10 non-video articles on frontpage 71 | - forbes.com: 10 popular articles 72 | - thepiratebay.org: 10 top categories listings 73 | - pixabay.com: 10 popular image categories 74 | - gfycat.com: top 10 trending gifs 75 | - healthline.com: 10 first articles 76 | - dictionary.com: 10 random articles 77 | - suara.com: 10 newsarticles 78 | - sciencedirect.com: 10 article listing 79 | - foxnews.com: 10 frontpage articles -------------------------------------------------------------------------------- /collect-traces/lists/unmonitored/README.md: -------------------------------------------------------------------------------- 1 | # Unmonitored lists from reddit 2 | Using the praw library to access the reddit API, on the 4th of December 2019: 3 | 4 | - `reddit-front-year.list` consists of 11716 URLs filtered from 51070 5 | submissions to r/frontpage limited to submissions this year 6 | - `reddit-front-all.list` consists of 14167 URLs filtered from 54512 submissions 7 | to r/frontpage wit no time filter (all) 8 | 9 | The filtering was done with `reddit.py`, using its built-in blacklist, as well 10 | as the monitored file `top-50-selected-multi.list`. 11 | -------------------------------------------------------------------------------- /collect-traces/lists/unmonitored/reddit.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import argparse 3 | import sys 4 | import os 5 | import praw 6 | from urllib.parse import urlsplit 7 | 8 | ap = argparse.ArgumentParser() 9 | ap.add_argument("-m", required=True, default="", 10 | help="location of monitored list to load") 11 | ap.add_argument("-u", required=True, default="", 12 | help="location of unmonitored list to save") 13 | ap.add_argument("-n", required=True, type=int, 14 | help="the total number of unique sites to get") 15 | args = vars(ap.parse_args()) 16 | 17 | blacklist = [ 18 | # remove direct image links 19 | ".gif", 20 | ".jpg", 21 | ".jpeg", 22 | ".png", 23 | # remove image hosting sites 24 | "redd.it", 25 | "reddit.com", 26 | "reddituploads.com", 27 | "imgur.com", 28 | "gfycat.com", 29 | # youtube and twitter both treat Tor bad 30 | "youtube.com", 31 | "youtu.be", 32 | "twitter.com", 33 | ] 34 | 35 | def main(): 36 | if not os.path.exists(args["m"]): 37 | sys.exit(f"{args['m']}, no such file (argument -m)") 38 | if os.path.exists(args["u"]): 39 | sys.exit(f"{args['u']} already exists") 40 | 41 | # load monitored list, clean the urls, filter on base 42 | monitored = get_sites_list() 43 | 44 | # loop over submissions until done 45 | reddit = praw.Reddit(client_id='REPLACE', 46 | client_secret='REPLACE', 47 | password='REPLACE', 48 | user_agent='a research python script by /u/REPLACE, collecting URLs for website fingerprinting attacks', 49 | username='REPLACE') 50 | unmonitored = [] 51 | count = 0 52 | with open(args["u"], 'w') as f: 53 | for submission in reddit.subreddit("all").top(time_filter="year", limit=args["n"]): 54 | count += 1 55 | # https://praw.readthedocs.io/en/latest/code_overview/models/submission.html 56 | base = base_url(submission.url) 57 | if not any(b in submission.url for b in blacklist): 58 | if not any(m in base for m in monitored): 59 | print(f"base {base}\t full {submission.url}") 60 | if not submission.url in unmonitored: 61 | unmonitored.append(submission.url) 62 | f.write(f"{submission.url}\n") 63 | 64 | print(f"\ngot {len(unmonitored)} sites, {count} submissions") 65 | 66 | def get_sites_list(): 67 | l = [] 68 | with open(args["m"]) as f: 69 | for line in f: 70 | site = base_url(line.rstrip()) 71 | # only add unique base URLs, faster lookup 72 | if not site in l: 73 | l.append(site) 74 | return l 75 | 76 | def base_url(u): 77 | return urlsplit(u).netloc 78 | 79 | if __name__ == "__main__": 80 | main() -------------------------------------------------------------------------------- /collect-traces/server/circpad-server.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import argparse 3 | import os 4 | import random 5 | import socket 6 | import sys 7 | from flask import Flask, request 8 | 9 | app = Flask(__name__) 10 | 11 | ap = argparse.ArgumentParser() 12 | ap.add_argument("-l", required=True, 13 | help="file with list of sites to visit, one site per line") 14 | ap.add_argument("-n", required=True, type=int, 15 | help="number of samples") 16 | ap.add_argument("-d", required=True, 17 | help="data folder for storing results") 18 | 19 | ap.add_argument("-m", required=False, default=500, type=int, 20 | help="minimum number of lines in torlog to accept") 21 | ap.add_argument("-s", required=False, default=-1, type=int, 22 | help="stop collecting at this many logs collected, regardless of remaining sites or samples (useful for unmonitored sites)") 23 | args = vars(ap.parse_args()) 24 | 25 | RESULTSFMT = "{}-{}.log" 26 | 27 | sites = [] 28 | remaining_sites = [] 29 | collected_samples = {} 30 | total_collected = 0 31 | 32 | def main(): 33 | global total_collected 34 | 35 | if not os.path.exists(args["d"]): 36 | sys.exit(f"data directory {args['d']} does not exist") 37 | 38 | print(f"reading sites list {args['l']}") 39 | starting_sites = get_sites_list() 40 | print(f"ok, list has {len(starting_sites)} starting sites") 41 | 42 | for site in starting_sites: 43 | sites.append(site) 44 | remaining_sites.append(site) 45 | collected_samples[site] = 0 46 | 47 | for _ in range(args["n"]): 48 | if os.path.isfile(results_file(site)): 49 | total_collected = total_collected + 1 50 | # record the collected sample 51 | collected_samples[site] = collected_samples[site] + 1 52 | # if we got enough samples, all done 53 | if collected_samples[site] >= args["n"]: 54 | remaining_sites.remove(site) 55 | 56 | if args["s"] > 0 and total_collected >= args["s"]: 57 | sys.exit(f"already done, collected {total_collected} logs") 58 | 59 | if args["s"] > 0: 60 | remaining = args['s'] - total_collected 61 | print(f"set to collect {args['s']} logs, need {remaining} more") 62 | 63 | print(f"list has {len(remaining_sites)} remaining sites") 64 | 65 | app.run(host="0.0.0.0", threaded=False) 66 | 67 | @app.route('/', methods=['GET', 'POST']) 68 | def handler(): 69 | if request.method == 'POST': 70 | add_log(request.form['log'], request.form['site']) 71 | next = get_next_item() 72 | print(f"\tnext item is {next}") 73 | return next 74 | 75 | def add_log(log, site): 76 | global total_collected 77 | 78 | # already done? 79 | if not site in remaining_sites: 80 | print(f"\t got already done site") 81 | return 82 | 83 | log = log.split("\n") 84 | 85 | if not is_complete_circpad_log(log): 86 | print(f"\t got incomplete log for {site}") 87 | return 88 | 89 | print(f"\tgot log of {len(log)} events for site {site}") 90 | 91 | # store the log 92 | with open(results_file(site), 'w') as f: 93 | for l in log: 94 | f.write(f"{l}\n") 95 | 96 | # update count of samples 97 | collected_samples[site] = collected_samples[site] + 1 98 | if collected_samples[site] >= args["n"]: 99 | remaining_sites.remove(site) 100 | total_collected += 1 101 | 102 | def is_complete_circpad_log(log): 103 | circuits = circpad_extract_log_traces(log) 104 | n = 0 105 | for cid in circuits: 106 | if len(circuits[cid]) >= args["m"]: 107 | n += 1 108 | 109 | # A "complete" circpad log has exactly one sizeable trace, but at times we 110 | # get extra traces, e.g., due to TB extensions phoning home. I found that 111 | # the best approach was to collect potentially some less useful logs and 112 | # then discard at the end (see extraction tools). 113 | # return n == 1 114 | 115 | # at least we got one log that looks good 116 | return n >= 1 117 | 118 | def get_next_item(): 119 | global total_collected 120 | 121 | # already done? 122 | if args["s"] > 0 and total_collected >= args["s"]: 123 | return "" 124 | 125 | # got more work? 126 | if len(remaining_sites) > 0: 127 | random.shuffle(remaining_sites) 128 | return remaining_sites[0] 129 | 130 | return "" 131 | 132 | def get_sites_list(): 133 | l = [] 134 | with open(args["l"]) as f: 135 | for line in f: 136 | site = line.rstrip() 137 | if site in l: 138 | print(f"warning, list of sites has duplicate: {site}") 139 | l.append(site) 140 | return l 141 | 142 | def results_file(site): 143 | index = sites.index(site) 144 | sample = collected_samples[site] 145 | return os.path.join(args["d"], RESULTSFMT.format(index, sample)) 146 | 147 | 148 | CIRCPAD_ERROR_WRONG_FORMAT = "invalid trace format" 149 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin" 150 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent" 151 | 152 | CIRCPAD_LOG = "circpad_trace_event" 153 | CIRCPAD_LOG_TIMESTAMP = "timestamp=" 154 | CIRCPAD_LOG_CIRC_ID = "client_circ_id=" 155 | CIRCPAD_LOG_EVENT = "event=" 156 | 157 | CIRCPAD_BLACKLISTED_ADDRESSES = ["aus1.torproject.org"] 158 | CIRCPAD_BLACKLISTED_EVENTS = [ 159 | "circpad_negotiate_logging" 160 | ] 161 | 162 | def circpad_get_all_addresses(trace): 163 | addresses = [] 164 | for l in trace: 165 | if len(l) < 2: 166 | sys.exit(CIRCPAD_ERROR_WRONG_FORMAT) 167 | if CIRCPAD_ADDRESS_EVENT in l[1]: 168 | if len(l[1]) < 2: 169 | sys.exit(CIRCPAD_ERROR_WRONG_FORMAT) 170 | addresses.append(l[1].split()[1]) 171 | return addresses 172 | 173 | def circpad_parse_line(line): 174 | split = line.split() 175 | assert(len(split) >= 2) 176 | event = split[1] 177 | timestamp = int(split[0]) 178 | 179 | return event, timestamp 180 | 181 | def circpad_lines_to_trace(lines): 182 | trace = [] 183 | for l in lines: 184 | event, timestamp = circpad_parse_line(l) 185 | trace.append((timestamp, event)) 186 | return trace 187 | 188 | def circpad_extract_log_traces( 189 | log_lines, 190 | source_client=True, 191 | source_relay=True, 192 | allow_ips=False, 193 | filter_client_negotiate=False, 194 | filter_relay_negotiate=False 195 | ): 196 | # helper function 197 | def blacklist_hit(d): 198 | for a in circpad_get_all_addresses(d): 199 | if a in CIRCPAD_BLACKLISTED_ADDRESSES: 200 | return True 201 | return False 202 | 203 | # helper to extract one line 204 | def extract_from_line(line): 205 | n = line.index(CIRCPAD_LOG_TIMESTAMP)+len(CIRCPAD_LOG_TIMESTAMP) 206 | timestamp = line[n:].split(" ", maxsplit=1)[0] 207 | n = line.index(CIRCPAD_LOG_CIRC_ID)+len(CIRCPAD_LOG_CIRC_ID) 208 | cid = line[n:].split(" ", maxsplit=1)[0] 209 | 210 | # an event is the last part, no need to split on space like we did earlier 211 | n = line.index(CIRCPAD_LOG_EVENT)+len(CIRCPAD_LOG_EVENT) 212 | event = line[n:] 213 | 214 | return int(cid), int(timestamp), event 215 | 216 | circuits = {} 217 | base = -1 218 | for line in log_lines: 219 | if CIRCPAD_LOG in line: 220 | # skip client/relay if they shouldn't be part of the trace 221 | if not source_client and "source=client" in line: 222 | continue 223 | if not source_relay and "source=relay" in line: 224 | continue 225 | 226 | # extract trace and make timestamps relative 227 | cid, timestamp, event = extract_from_line(line) 228 | if base == -1: 229 | base = timestamp 230 | timestamp = timestamp - base 231 | 232 | # store trace 233 | if cid in circuits.keys(): 234 | circuits[cid] = circuits.get(cid) + [(timestamp, event)] 235 | else: 236 | circuits[cid] = [(timestamp, event)] 237 | 238 | # filter out circuits with blacklisted addresses 239 | for cid in list(circuits.keys()): 240 | if blacklist_hit(circuits[cid]): 241 | del circuits[cid] 242 | # filter out circuits with only IPs (unless arg says otherwise) 243 | for cid in list(circuits.keys()): 244 | if not allow_ips and circpad_only_ips_in_trace(circuits[cid]): 245 | del circuits[cid] 246 | 247 | # remove blacklisted events (and associated events) 248 | for cid in list(circuits.keys()): 249 | circuits[cid] = circpad_remove_blacklisted_events(circuits[cid], 250 | filter_client_negotiate, filter_relay_negotiate) 251 | 252 | return circuits 253 | 254 | 255 | def circpad_remove_blacklisted_events( 256 | trace, 257 | filter_client_negotiate, 258 | filter_relay_negotiate 259 | ): 260 | 261 | result = [] 262 | ignore_next_send_cell = True 263 | 264 | for line in trace: 265 | # If we hit a blacklisted event, this means we should ignore the next 266 | # sent nonpadding cell. Since the blacklisted event should only be 267 | # triggered client-side, there shouldn't be any impact on relay traces. 268 | if any(b in line for b in CIRCPAD_BLACKLISTED_EVENTS): 269 | ignore_next_send_cell = True 270 | else: 271 | if ignore_next_send_cell and CIRCPAD_EVENT_NONPADDING_SENT in line: 272 | ignore_next_send_cell = False 273 | else: 274 | result.append(line) 275 | 276 | return result 277 | 278 | def circpad_only_ips_in_trace(trace): 279 | def is_ipv4(addr): 280 | try: 281 | socket.inet_aton(addr) 282 | except (socket.error, TypeError): 283 | return False 284 | return True 285 | def is_ipv6(addr): 286 | try: 287 | socket.inet_pton(addr,socket.AF_INET6) 288 | except (socket.error, TypeError): 289 | return False 290 | return True 291 | 292 | for a in circpad_get_all_addresses(trace): 293 | if not is_ipv4(a) and not is_ipv6(a): 294 | return False 295 | return True 296 | 297 | if __name__ == '__main__': 298 | main() 299 | -------------------------------------------------------------------------------- /collect-traces/server/run-collect-server.sh: -------------------------------------------------------------------------------- 1 | # Simple helper script. The two runs used for the goodenough dataset using their 2 | # respective lists (see zips). I manually (un)commented the lines below per 3 | # part, the server won't stop when done. 4 | 5 | #python3.6 circpad-server.py -d safer-mon/ -l top-50-selected-multi.list -n 30 -m 100 6 | python3.6 circpad-server.py -d safer-unmon/ -l reddit-front-year.list -n 1 -s 11000 -m 100 7 | -------------------------------------------------------------------------------- /dataset/README.md: -------------------------------------------------------------------------------- 1 | # The Goodenough dataset 2 | We set out to create a dataset that better reflects the challenges of an 3 | attacker than the typical datasets used in the evaluation of Website 4 | Fingerprinting attacks. The dataset consists of 10,000 monitored samples and 5 | 10,00 unmonitored samples. The monitored samples represent 50 classes of popular 6 | websites taken from the Alexa toplist (all within Alexa top-300 at the time of 7 | collection). For each website/class, we selected 10 webpages to represent that 8 | class, with the intent of evaluating _webpage-to-website_ fingerprinting. For 9 | example, for the website reddit.com, we selected 10 URLs to popular subreddits 10 | such as https://www.reddit.com/r/wholesomememes/. Similarly, for wikipedia.org, 11 | we selected articles such as https://en.wikipedia.org/wiki/Dinosaur, etc. The 12 | full list of websites and webpages are available as part of the dataset. We 13 | collected 20 samples per webpage, resulting in 50x10x20=10,000 monitored 14 | samples. 15 | 16 | As a complement, we collected 10,000 unmonitored webpages from reddit.com/r/all 17 | (top last year). We made sure to exclude webpages of monitored websites, which 18 | include self-hosted images at Reddit. We also excluded direct image links, since 19 | they are too distinct to the monitored webpages, and links to YouTube and 20 | Twitter that have a tendency of not treating traffic from Tor nicely (i.e., 21 | sporadically blocking access). 22 | 23 | The dataset consists of: 24 | - complete lists of visited monitored and unmonitored websites 25 | - logs from Tor Browser 26 | - traces extracted for the [circuit padding simulator](https://github.com/pylls/circpad-sim) 27 | - fakerelay traces that are [simulated](https://github.com/pylls/circpad-sim/blob/master/simrelaytrace.py) from the client traces 28 | 29 | The final traces have all been verified to work fine with the circuit padding 30 | simulator. There are complete sets of traces for the [three security 31 | levels/settings of Tor 32 | Browser](https://tb-manual.torproject.org/security-settings/). 33 | 34 | We collected the dataset twice so far in the beginning of January and February 35 | to allow for comparisons over time. We did minimal changes to the webpages 36 | visited due to, .e.g., removed content. See list README for details of our 37 | changes. (There's also a dataset from December 2019 but only for some security 38 | levels, reach out in case you're interested.) 39 | 40 | Download links (may change in the future, please reference this repository): 41 | - https://dart.cse.kau.se/goodenough/goodenough-jan-2020.zip 5.4 GiB compressed, 165 GiB extracted 42 | - https://dart.cse.kau.se/goodenough/goodenough-feb-2020.zip 6.1G compressed, 176 GiB extracted 43 | 44 | ``` 45 | $ sha256sum goodenough-* 46 | 37ab85288ebd8c9059b93716e2b21235a06063d252242f01c4274d0605e28131 goodenough-feb-2020.zip 47 | 82123a774275b9b6830a9208591f4e9c7bf759d12ed690db8694362fbca9bcac goodenough-jan-2020.zip 48 | ``` 49 | -------------------------------------------------------------------------------- /evaluation/once.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """once.py 4 | 5 | This script can be used to once run Deep Fingerprinting (DF) on the goodenough 6 | dataset and produce some metrics. Right now only working on extracted cells, but 7 | made to be straightforward to extend. 8 | 9 | There are many parameters to tweak, see the arguments and help below. Works 10 | well on my machine with a RTX 2070 (see batch size based on memory). 11 | 12 | Supports saving and loading datasets as well as models. 13 | 14 | The DF implementation is ported to PyTorch with inspiration from 15 | https://github.com/lin-zju/deep-fp . 16 | """ 17 | 18 | import argparse 19 | import os 20 | import sys 21 | import numpy as np 22 | import torch 23 | import torch.nn as nn 24 | import torch.nn.functional as F 25 | from torch.utils import data 26 | import datetime 27 | import pickle 28 | import csv 29 | import shared 30 | 31 | ap = argparse.ArgumentParser() 32 | # load and save dataset/model 33 | ap.add_argument("--ld", required=False, default="", 34 | help="load dataset from pickle, provide path to pickled file") 35 | ap.add_argument("--sd", required=False, default="", 36 | help="save dataset, provide path to dump pickled file") 37 | ap.add_argument("--lm", required=False, default="", 38 | help="load model from pickle, provide path to pickled file") 39 | ap.add_argument("--sm", required=False, default="", 40 | help="save model, provide path to dump pickled file") 41 | 42 | ## extra output 43 | ap.add_argument("--csv", required=False, default=None, 44 | help="save resulting metrics in provided path in csv format") 45 | ap.add_argument("--extra", required=False, default="", 46 | help="value of extra column in csv output") 47 | 48 | # extract/train new dataset/model 49 | ap.add_argument("--ed", required=False, default="", 50 | help="extract dataset, path with {monitored,unmonitored} subfolders") 51 | ap.add_argument("--train", required=False, default=False, 52 | action="store_true", help="train model") 53 | 54 | # experiment parameters 55 | ap.add_argument("--epochs", required=False, type=int, default=30, 56 | help="the number of epochs for training") 57 | ap.add_argument("--batchsize", required=False, type=int, default=750, 58 | help="batch size") 59 | ap.add_argument("-f", required=False, type=int, default=0, 60 | help="the fold number (partition offset)") 61 | ap.add_argument("-l", required=False, type=int, default=5000, 62 | help="max input length used in DF") 63 | ap.add_argument("-z", required=False, default="", 64 | help="zero each sample between sample[zero], e.g., 0:10 for the first 10 cells") 65 | 66 | # dataset dimensions 67 | ap.add_argument("-c", required=False, type=int, default=50, 68 | help="the number of monitored classes") 69 | ap.add_argument("-p", required=False, type=int, default=10, 70 | help="the number of partitions") 71 | ap.add_argument("-s", required=False, type=int, default=20, 72 | help="the number of samples") 73 | args = vars(ap.parse_args()) 74 | 75 | def now(): 76 | return datetime.datetime.now().strftime("%H:%M:%S") 77 | 78 | def main(): 79 | if ( 80 | (args["ld"] == "" and args["ed"] == "") or 81 | (args["ld"] != "" and args["ed"] != "") 82 | ): 83 | sys.exit(f"needs exactly one of --ld and --ed") 84 | 85 | 86 | dataset, labels = {}, {} 87 | if args["ld"] != "": 88 | print(f"attempting to load dataset from pickle file {args['ld']}") 89 | dataset, labels = pickle.load(open(args["ld"], "rb")) 90 | # flatten dataset with extra details (generated by tweak.py) 91 | for k in dataset: 92 | dataset[k][0][dataset[k][0] > 1.0] = 1.0 93 | dataset[k][0][dataset[k][0] < -1.0] = -1.0 94 | 95 | else: 96 | if not os.path.isdir(args["ed"]): 97 | sys.exit(f"{args['ed']} is not a directory") 98 | 99 | mon_dir = os.path.join(args["ed"], "monitored") 100 | if not os.path.isdir(mon_dir): 101 | sys.exit(f"{mon_dir} is not a directory") 102 | 103 | unm_dir = os.path.join(args["ed"], "unmonitored") 104 | if not os.path.isdir(unm_dir): 105 | sys.exit(f"{unm_dir} is not a directory") 106 | 107 | print(f"{now()} starting to load dataset from folder...") 108 | dataset, labels = shared.load_dataset( 109 | mon_dir, 110 | unm_dir, 111 | args["c"], 112 | args["p"], 113 | args["s"], 114 | args["l"], 115 | shared.trace2cells 116 | ) 117 | if args["sd"] != "": 118 | pickle.dump((dataset, labels), open(args["sd"], "wb")) 119 | print(f"saved dataset to {args['sd']}") 120 | 121 | print(f"{now()} loaded {len(dataset)} items in dataset with {len(labels)} labels") 122 | 123 | split = shared.split_dataset(args["c"], args["p"], args["s"], args["f"], labels) 124 | print( 125 | f"{now()} split {len(split['train'])} training, " 126 | f"{len(split['validation'])} validation, and " 127 | f"{len(split['test'])} testing" 128 | ) 129 | 130 | if args["z"] != "": 131 | dataset = shared.zero_dataset(dataset, args["z"]) 132 | print(f"{now()} zeroed each item in dataset as data[{args['z']}]") 133 | 134 | model = DFNet(args["c"]+1) # one class for unmonitored 135 | if args["lm"] != "": 136 | model = torch.load(args["lm"]) 137 | print(f"loaded model from {args['lm']}") 138 | 139 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 140 | if torch.cuda.is_available(): 141 | print(f"{now()} using {torch.cuda.get_device_name(0)}") 142 | model.cuda() 143 | 144 | if args["train"]: 145 | # Note below that shuffle=True is *essential*, 146 | # see https://stackoverflow.com/questions/54354465/ 147 | train_gen = data.DataLoader( 148 | shared.Dataset(split["train"], dataset, labels), 149 | batch_size=args["batchsize"], shuffle=True, 150 | ) 151 | validation_gen = data.DataLoader( 152 | shared.Dataset(split["validation"], dataset, labels), 153 | batch_size=args["batchsize"], shuffle=True, 154 | ) 155 | 156 | optimizer = torch.optim.Adamax(params=model.parameters()) 157 | criterion = torch.nn.CrossEntropyLoss() 158 | 159 | for epoch in range(args["epochs"]): 160 | print(f"{now()} epoch {epoch}") 161 | 162 | # training 163 | model.train() 164 | torch.set_grad_enabled(True) 165 | running_loss = 0.0 166 | n = 0 167 | for x, Y in train_gen: 168 | x, Y = x.to(device), Y.to(device) 169 | optimizer.zero_grad() 170 | outputs = model(x) 171 | loss = criterion(outputs, Y) 172 | loss.backward() 173 | optimizer.step() 174 | running_loss += loss.item() 175 | n+=1 176 | print(f"\ttraining loss {running_loss/n}") 177 | 178 | # validation 179 | model.eval() 180 | torch.set_grad_enabled(False) 181 | running_corrects = 0 182 | n = 0 183 | for x, Y in validation_gen: 184 | x, Y = x.to(device), Y.to(device) 185 | 186 | outputs = model(x) 187 | _, preds = torch.max(outputs, 1) 188 | running_corrects += torch.sum(preds == Y) 189 | n += len(Y) 190 | print(f"\tvalidation accuracy {float(running_corrects)/float(n)}") 191 | 192 | if args["sm"] != "": 193 | torch.save(model, args["sm"]) 194 | print(f"saved model to {args['sm']}") 195 | 196 | # testing 197 | testing_gen = data.DataLoader( 198 | shared.Dataset(split["test"], dataset, labels), 199 | batch_size=args["batchsize"] 200 | ) 201 | model.eval() 202 | torch.set_grad_enabled(False) 203 | predictions = [] 204 | p_labels = [] 205 | for x, Y in testing_gen: 206 | x = x.to(device) 207 | outputs = model(x) 208 | index = F.softmax(outputs, dim=1).data.cpu().numpy() 209 | predictions.extend(index.tolist()) 210 | p_labels.extend(Y.data.numpy().tolist()) 211 | 212 | print(f"{now()} made {len(predictions)} predictions with {len(p_labels)} labels") 213 | csvline = [] 214 | threshold = np.append([0], 1.0 - 1 / np.logspace(0.05, 2, num=15, endpoint=True)) 215 | threshold = np.around(threshold, decimals=4) 216 | for th in threshold: 217 | tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1 = shared.metrics(th, 218 | predictions, p_labels, args["c"]) 219 | print( 220 | f"\tthreshold {th:4.2}, " 221 | f"recall {recall:4.2}, " 222 | f"precision {precision:4.2}, " 223 | f"F1 {f1:4.2}, " 224 | f"accuracy {accuracy:4.2} " 225 | f"[tp {tp:>5}, fpp {fpp:>5}, fnp {fnp:>5}, tn {tn:>5}, fn {fn:>5}]" 226 | ) 227 | csvline.append([ 228 | th, recall, precision, f1, tp, fpp, fnp, tn, fn, args["extra"] 229 | ]) 230 | 231 | if args["csv"]: 232 | with open(args["csv"], "w", newline="") as csvfile: 233 | w = csv.writer(csvfile, delimiter=",") 234 | w.writerow(["th", "recall", "precision", "f1", "tp", "fpp", "fnp", "tn", "fn", "extra"]) 235 | w.writerows(csvline) 236 | print(f"saved testing results to {args['csv']}") 237 | 238 | class DFNet(nn.Module): 239 | def __init__(self, classes, fc_in_features = 512*10): 240 | super(DFNet, self).__init__() 241 | # sources used when writing this, struggled with the change in output 242 | # size due to the convolutions and stumbled upon below: 243 | # - https://github.com/lin-zju/deep-fp/blob/master/lib/modeling/backbone/dfnet.py 244 | # - https://ezyang.github.io/convolution-visualizer/index.html 245 | self.kernel_size = 7 246 | self.padding_size = 3 247 | self.pool_stride_size = 4 248 | self.pool_size = 7 249 | 250 | self.block1 = self.__block(1, 32, nn.ELU()) 251 | self.block2 = self.__block(32, 64, nn.ReLU()) 252 | self.block3 = self.__block(64, 128, nn.ReLU()) 253 | self.block4 = self.__block(128, 256, nn.ReLU()) 254 | 255 | self.fc = nn.Sequential( 256 | nn.Linear(fc_in_features, 512), 257 | nn.BatchNorm1d(512), 258 | nn.ReLU(), 259 | nn.Dropout(0.7), 260 | nn.Linear(512, 512), 261 | nn.BatchNorm1d(512), 262 | nn.ReLU(), 263 | nn.Dropout(0.5) 264 | ) 265 | 266 | self.prediction = nn.Sequential( 267 | nn.Linear(512, classes), 268 | # when using CrossEntropyLoss, already computed internally 269 | #nn.Softmax(dim=1) # dim = 1, don't softmax batch 270 | ) 271 | 272 | def __block(self, channels_in, channels, activation): 273 | return nn.Sequential( 274 | nn.Conv1d(channels_in, channels, self.kernel_size, padding=self.padding_size), 275 | nn.BatchNorm1d(channels), 276 | activation, 277 | nn.Conv1d(channels, channels, self.kernel_size, padding=self.padding_size), 278 | nn.BatchNorm1d(channels), 279 | activation, 280 | nn.MaxPool1d(self.pool_size, stride=self.pool_stride_size, padding=self.padding_size), 281 | nn.Dropout(p=0.1) 282 | ) 283 | 284 | def forward(self, x): 285 | x = self.block1(x) 286 | x = self.block2(x) 287 | x = self.block3(x) 288 | x = self.block4(x) 289 | x = x.flatten(start_dim=1) # dim = 1, don't flatten batch 290 | x = self.fc(x) 291 | x = self.prediction(x) 292 | 293 | return x 294 | 295 | if __name__ == "__main__": 296 | main() -------------------------------------------------------------------------------- /evaluation/overhead.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import argparse 3 | import shared 4 | import pickle 5 | import sys 6 | import os 7 | import numpy as np 8 | 9 | ap = argparse.ArgumentParser() 10 | ap.add_argument("--ld", required=True, 11 | help="load dataset from pickle, provide path to pickled file") 12 | args = vars(ap.parse_args()) 13 | 14 | def main(): 15 | '''Bandwidth overhead is based on number of padding and non-padding cells in 16 | all traces. 17 | ''' 18 | print(f"attempting to load dataset from pickle file {args['ld']}") 19 | dataset, labels = pickle.load(open(args["ld"], "rb")) 20 | 21 | t_sent_padding = [] 22 | t_sent_nonpadding = [] 23 | t_sent_overhead = [] 24 | t_recv_padding = [] 25 | t_recv_nonpadding = [] 26 | t_recv_overhead = [] 27 | 28 | for trace in dataset: 29 | unique, counts = np.unique(dataset[trace][0], return_counts=True) 30 | d = dict(zip(unique, counts)) 31 | sent_nonpadding = d[1] 32 | recv_nonpadding = d[-1] 33 | 34 | sent_padding = 0 35 | if 2 in d: 36 | sent_padding = d[2] 37 | 38 | recv_padding = 0 39 | if -2 in d: 40 | recv_padding = d[-2] 41 | 42 | if sent_nonpadding == 0: 43 | sys.exit(f"sent 0 nonpadding cells, broken trace?") 44 | if recv_nonpadding == 0: 45 | sys.exit(f"recv 0 nonpadding cells, broken trace?") 46 | 47 | t_sent_padding.append(sent_padding) 48 | t_sent_nonpadding.append(sent_nonpadding) 49 | t_sent_overhead.append(float(sent_padding+sent_nonpadding) / float(sent_nonpadding)) 50 | 51 | t_recv_padding.append(recv_padding) 52 | t_recv_nonpadding.append(recv_nonpadding) 53 | t_recv_overhead.append(float(recv_padding+recv_nonpadding) / float(recv_nonpadding)) 54 | 55 | sent_padding = sum(t_sent_padding) 56 | sent_nonpadding = sum(t_sent_nonpadding) 57 | sent_cells = sent_padding + sent_nonpadding 58 | 59 | recv_padding = sum(t_recv_padding) 60 | recv_nonpadding = sum(t_recv_nonpadding) 61 | recv_cells = recv_padding + recv_nonpadding 62 | 63 | total_cells = sent_cells + recv_cells 64 | 65 | avg_sent = float(sent_cells)/float(sent_nonpadding) 66 | avg_recv = float(recv_cells)/float(recv_nonpadding) 67 | avg_total = float(total_cells)/float(recv_nonpadding+sent_nonpadding) 68 | 69 | print(f"in total for {len(t_sent_padding)} traces:") 70 | print(f"\t- {total_cells} cells") 71 | print(f"\t- {avg_total:.0%} average total bandwidth") 72 | 73 | print(f"\t- {sent_cells} sent cells ({float(sent_cells)/float(total_cells):.0%})") 74 | print(f"\t\t- {sent_nonpadding} nonpadding") 75 | print(f"\t\t- {sent_padding} padding") 76 | print(f"\t\t- {avg_sent:.0%} average sent bandwidth") 77 | 78 | print(f"\t- {recv_cells} recv cells ({float(recv_cells)/float(total_cells):.0%})") 79 | print(f"\t\t- {recv_nonpadding} nonpadding") 80 | print(f"\t\t- {recv_padding} padding") 81 | print(f"\t\t- {avg_recv:.0%} average recv bandwidth") 82 | 83 | if __name__ == "__main__": 84 | main() -------------------------------------------------------------------------------- /evaluation/shared.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import numpy as np 3 | import os 4 | import sys 5 | from torch.utils import data 6 | 7 | def metrics(threshold, predictions, labels, label_unmon): 8 | ''' Computes a range of metrics. 9 | 10 | For details on the metrics, see, e.g., https://www.cs.kau.se/pulls/hot/baserate/ 11 | ''' 12 | tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1 = 0, 0, 0, 0, 0, 0.0, 0.0, 0.0, 0.0 13 | 14 | for i in range(len(predictions)): 15 | label_pred = np.argmax(predictions[i]) 16 | prob_pred = max(predictions[i]) 17 | label_correct = labels[i] 18 | 19 | # we split on monitored or unmonitored correct label 20 | if label_correct != label_unmon: 21 | # either confident and correct, 22 | if prob_pred >= threshold and label_pred == label_correct: 23 | tp = tp + 1 24 | # confident and wrong monitored label, or 25 | elif prob_pred >= threshold and label_pred != label_unmon: 26 | fpp = fpp + 1 27 | # wrong because not confident or predicted unmonitored for monitored 28 | else: 29 | fn = fn + 1 30 | else: 31 | if prob_pred < threshold or label_pred == label_unmon: # correct prediction? 32 | tn = tn + 1 33 | elif label_pred < label_unmon: # predicted monitored for unmonitored 34 | fnp = fnp + 1 35 | else: # this should never happen 36 | sys.exit(f"this should never, wrongly labelled data for {label_pred}") 37 | 38 | if tp + fn + fpp > 0: 39 | recall = round(float(tp) / float(tp + fpp + fn), 4) 40 | if tp + fpp + fnp > 0: 41 | precision = round(float(tp) / float(tp + fpp + fnp), 4) 42 | 43 | if precision > 0 and recall > 0: 44 | f1 = round(2*((precision*recall)/(precision+recall)), 4) 45 | 46 | accuracy = round(float(tp + tn) / float(tp + fpp + fnp + fn + tn), 4) 47 | 48 | return tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1 49 | 50 | 51 | class Dataset(data.Dataset): 52 | def __init__(self, ids, dataset, labels): 53 | self.ids = ids 54 | self.dataset = dataset 55 | self.labels = labels 56 | 57 | def __len__(self): 58 | return len(self.ids) 59 | 60 | def __getitem__(self, index): 61 | ID = self.ids[index] 62 | return self.dataset[ID], self.labels[ID] 63 | 64 | def load_dataset( 65 | mon_dir, unm_dir, 66 | classes, partitions, samples, 67 | length, extract_func 68 | ): 69 | ''' Loads the dataset from disk into two dictionaries for data and labels. 70 | 71 | The dictionaries are indexed by sample ID. The ID encodes if its a monitored 72 | or unmonitored sample to make it easier to debug, as well as some info about 73 | the corresponding data file on disk. 74 | 75 | This function works assumes the structure of the following dataset: 76 | - "top50-partitioned-reddit-levels-cirucitpadding" 77 | ''' 78 | data = {} 79 | labels = {} 80 | 81 | # load monitored data 82 | for c in range(0,classes): 83 | for p in range(0,partitions): 84 | site = c*10 + p 85 | for s in range(0,samples): 86 | ID = f"m-{c}-{p}-{s}" 87 | labels[ID] = c 88 | 89 | # file format is {site}-{sample}.trace 90 | fname = f"{site}-{s}.trace" 91 | with open(os.path.join(mon_dir, fname), "r") as f: 92 | data[ID] = extract_func(f.read(), length) 93 | 94 | # load unmonitored data 95 | dirlist = os.listdir(unm_dir) 96 | # make sure we only load a balanced dataset 97 | dirlist = dirlist[:len(data)] 98 | for fname in dirlist: 99 | ID = f"u-{fname}" 100 | labels[ID] = classes # start from 0 for monitored 101 | with open(os.path.join(unm_dir, fname), "r") as f: 102 | data[ID] = extract_func(f.read(), length) 103 | 104 | return data, labels 105 | 106 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent" 107 | CIRCPAD_EVENT_NONPADDING_RECV = "circpad_cell_event_nonpadding_received" 108 | CIRCPAD_EVENT_PADDING_SENT = "circpad_cell_event_padding_sent" 109 | CIRCPAD_EVENT_PADDING_RECV = "circpad_cell_event_padding_received" 110 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin" 111 | 112 | def trace2cells(log, length, strip=True): 113 | ''' A fast specialised function to generate cells from a trace. 114 | 115 | Based on circpad_to_wf() in circpad-sim/common.py, but only for cells. 116 | ''' 117 | data = np.zeros((1, length), dtype=np.float32) 118 | n = 0 119 | 120 | if strip: 121 | for i, line in enumerate(log): 122 | if CIRCPAD_ADDRESS_EVENT in line: 123 | log = log[i:] 124 | break 125 | 126 | s = log.split("\n") 127 | for line in s: 128 | # outgoing is positive 129 | if CIRCPAD_EVENT_NONPADDING_SENT in line or \ 130 | CIRCPAD_EVENT_PADDING_SENT in line: 131 | data[0][n] = 1.0 132 | n += 1 133 | # incoming is negative 134 | elif CIRCPAD_EVENT_NONPADDING_RECV in line or \ 135 | CIRCPAD_EVENT_PADDING_RECV in line: 136 | data[0][n] = -1.0 137 | n += 1 138 | 139 | if n == length: 140 | break 141 | 142 | return data 143 | 144 | def split_dataset( 145 | classes, partitions, samples, fold, labels, 146 | ): 147 | '''Splits the dataset based on fold. 148 | 149 | The split is only based on IDs, not the actual data. The result is a 8:1:1 150 | split into training, validation, and testing. 151 | ''' 152 | training = [] 153 | validation = [] 154 | testing = [] 155 | 156 | # monitored, split by _partition_ 157 | for c in range(0,classes): 158 | for p in range(0,partitions): 159 | for s in range(0,samples): 160 | ID = f"m-{c}-{p}-{s}" 161 | i = (p+fold) % partitions 162 | 163 | if i < partitions-2: 164 | training.append(ID) 165 | elif i < partitions-1: 166 | validation.append(ID) 167 | else: 168 | testing.append(ID) 169 | 170 | # unmonitored 171 | counter = 0 172 | for k in labels.keys(): 173 | if not k.startswith("u"): 174 | continue 175 | i = (counter+fold) % partitions 176 | if i < partitions-2: 177 | training.append(k) 178 | elif i < partitions-1: 179 | validation.append(k) 180 | else: 181 | testing.append(k) 182 | counter += 1 183 | 184 | split = {} 185 | split["train"] = training 186 | split["validation"] = validation 187 | split["test"] = testing 188 | return split 189 | 190 | def zero_dataset(dataset, z): 191 | index = z.split(":") 192 | start = int(index[0]) 193 | stop = int(index[1]) 194 | data = np.zeros((stop-start), dtype=np.float32) 195 | for k, v in dataset.items(): 196 | v[:,start:stop] = data 197 | dataset[k] = v 198 | return dataset -------------------------------------------------------------------------------- /evaluation/tweak.md: -------------------------------------------------------------------------------- 1 | # Howto use tweak.py 2 | 3 | Below is an example of how to run `tweak.py`: 4 | ``` 5 | ./tweak.py --client dataset-feb/standard/client-traces/ --relay dataset-feb/standard/fakerelay-traces/ -t ../tor --mc client-machine --mr relay-machine --save tmp.pkl 6 | ``` 7 | 8 | The help output explains most flags: 9 | 10 | ``` 11 | usage: tweak.py [-h] --client CLIENT --relay RELAY [-c C] [-p P] [-s S] -t T [-w W] [-l L] --mc MC --mr MR --save SAVE 12 | 13 | optional arguments: 14 | -h, --help show this help message and exit 15 | --client CLIENT input folder of client circpadtrace files 16 | --relay RELAY input folder of relay circpadtrace files 17 | -c C the number of monitored classes 18 | -p P the number of partitions 19 | -s S the number of samples 20 | -t T path to tor folder (bob/tor, not bob/tor/src) 21 | -w W number of workers for simulating machines 22 | -l L max length of extracted cells 23 | --mc MC path to file of client machine (c-code) to tweak 24 | --mr MR path to file of relay machine (c-code) to tweak 25 | --save SAVE file to save results to 26 | ``` 27 | 28 | The expected input format (`--client` and `--relay`) is that of the dataset in 29 | this repository. For the tor folder (`-t`), see 30 | [circpad-sim](https://github.com/pylls/circpad-sim). Machines you tweak (`--mc` 31 | and `--mr`) have to be of the appropriate format. Several examples are available 32 | in 33 | [machines/phase2/](https://github.com/pylls/padding-machines-for-tor/tree/master/machines/phase2/). 34 | 35 | For how to use tweak.py as part of tweaking a padding machine, see `tweak.sh` in 36 | this folder and the [phase 2 37 | writeup]((https://github.com/pylls/padding-machines-for-tor/tree/master/machines/phase2/)). 38 | -------------------------------------------------------------------------------- /evaluation/tweak.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' Tweak a pair of machines. 3 | 4 | The goal is to be able to rapidly tweak a single pair of machines as fast as 5 | possible against DF. ''' 6 | import argparse 7 | import sys 8 | import os 9 | import subprocess 10 | import tempfile 11 | import signal 12 | import numpy as np 13 | import pickle 14 | from multiprocessing import Pool 15 | import logging 16 | import shared 17 | 18 | logging.basicConfig(level = logging.INFO, format = "%(asctime)s %(message)s") 19 | 20 | ap = argparse.ArgumentParser() 21 | # dataset and its dimensions, assuming same count unmon as mon 22 | ap.add_argument("--client", required=True, 23 | help="input folder of client circpadtrace files") 24 | ap.add_argument("--relay", required=True, 25 | help="input folder of relay circpadtrace files") 26 | ap.add_argument("-c", required=False, type=int, default=50, 27 | help="the number of monitored classes") 28 | ap.add_argument("-p", required=False, type=int, default=10, 29 | help="the number of partitions") 30 | ap.add_argument("-s", required=False, type=int, default=20, 31 | help="the number of samples") 32 | 33 | # exp 34 | ap.add_argument("-t", required=True, 35 | help="path to tor folder (bob/tor, not bob/tor/src)") 36 | ap.add_argument("-w", required=False, type=int, default=10, 37 | help="number of workers for simulating machines") 38 | ap.add_argument("-l", required=False, type=int, default=5000, 39 | help="max length of extracted cells") 40 | 41 | # machines to tweak 42 | ap.add_argument("--mc", required=True, 43 | help="path to file of client machine (c-code) to tweak") 44 | ap.add_argument("--mr", required=True, 45 | help="path to file of relay machine (c-code) to tweak") 46 | 47 | # pickle dump results 48 | ap.add_argument("--save", required=True, help="file to save results to") 49 | args = vars(ap.parse_args()) 50 | 51 | TOR_CIRCPADSIM_SRC_LOC = "src/test/test_circuitpadding_sim.c" 52 | CLIENT_MACHINE_TOKEN = "//REPLACE-client-padding-machine-REPLACE" 53 | RELAY_MACHINE_TOKEN = "//REPLACE-relay-padding-machine-REPLACE" 54 | TOR_CIRCPADSIM_CMD = os.path.join(args["t"], "src/test/test circuitpadding_sim/..") 55 | TOR_CIRCPADSIM_CMD_FORMAT = f"{TOR_CIRCPADSIM_CMD} --info --circpadsim {{}} {{}} 1" 56 | 57 | tmpdir = tempfile.mkdtemp() 58 | original_src = "" 59 | src_path = os.path.join(args["t"], TOR_CIRCPADSIM_SRC_LOC) 60 | 61 | def main(): 62 | # properly restore tor source when closed 63 | signal.signal(signal.SIGINT, sigint_handler) 64 | 65 | # list of input traces, sorted assuming the matching client and relay traces 66 | # have the same name in respective folders 67 | c_mon_dir = os.path.join(args["client"], "monitored") 68 | if not os.path.isdir(c_mon_dir): 69 | sys.exit(f"{c_mon_dir} is not a directory") 70 | c_unm_dir = os.path.join(args["client"], "unmonitored") 71 | if not os.path.isdir(c_unm_dir): 72 | sys.exit(f"{c_unm_dir} is not a directory") 73 | r_mon_dir = os.path.join(args["relay"], "monitored") 74 | if not os.path.isdir(r_mon_dir): 75 | sys.exit(f"{r_mon_dir} is not a directory") 76 | r_unm_dir = os.path.join(args["relay"], "unmonitored") 77 | if not os.path.isdir(r_unm_dir): 78 | sys.exit(f"{r_unm_dir} is not a directory") 79 | 80 | logging.info(f"loading original traces") 81 | labels, fnames_client, fnames_relay = load_dataset( 82 | c_mon_dir, c_unm_dir, 83 | r_mon_dir, r_unm_dir, 84 | args["c"], args["p"], args["s"] 85 | ) 86 | logging.info(f"loaded {len(labels)} traces") 87 | 88 | # load machines to tweak 89 | with open(args["mc"], "r") as f: 90 | mc = f.read() 91 | with open(args["mr"], "r") as f: 92 | mr = f.read() 93 | 94 | logging.info(f"adding machines") 95 | add_machines(mc, mr) 96 | 97 | logging.info("simulating machines") 98 | client_traces, _ = simulate_machines(labels, fnames_client, fnames_relay, extract_cells_detailed) 99 | 100 | logging.info(f"pickle dump to {args['save']}") 101 | pickle.dump((client_traces, labels), open(args["save"], "wb")) 102 | 103 | logging.info(f"done") 104 | 105 | def add_machines(client, relay): 106 | # read source 107 | global original_src, src_path 108 | if original_src == "": 109 | with open(src_path, "r") as myfile: 110 | original_src = myfile.read() 111 | assert(original_src != "") 112 | assert(CLIENT_MACHINE_TOKEN in original_src) 113 | assert(RELAY_MACHINE_TOKEN in original_src) 114 | 115 | # replace with machines and save the modified source 116 | modified_src = original_src.replace(CLIENT_MACHINE_TOKEN, client) 117 | modified_src = modified_src.replace(RELAY_MACHINE_TOKEN, relay) 118 | with open(src_path, "w") as f: 119 | f.write(modified_src) 120 | 121 | # make new machines, then restore original source 122 | make_tor() 123 | restore_source() 124 | 125 | def restore_source(): 126 | global original_src, src_path 127 | with open(src_path, "w") as f: 128 | f.write(original_src) 129 | 130 | def sigint_handler(foo=1, bar=2): 131 | restore_source() 132 | sys.exit(0) 133 | 134 | def make_tor(): 135 | cmd = f"cd {args['t']} && make" 136 | result = subprocess.run(cmd, stdout=subprocess.DEVNULL, shell=True) 137 | if result.returncode != 0: 138 | logging.info(cmd) 139 | assert(result.returncode == 0) 140 | 141 | def simulate_machines( 142 | labels, fnames_client, fnames_relay, 143 | extract_func, 144 | extract_client=True, 145 | extract_relay=False, 146 | ): 147 | 148 | todo = [] 149 | logging.info(f"\t\tlisting {len(labels)} traces to simulate") 150 | for ID in labels: 151 | todo.append( 152 | (fnames_client[ID], fnames_relay[ID], ID, 153 | extract_func, extract_client, extract_relay) 154 | ) 155 | 156 | logging.info(f"\t\trunning with {args['w']} workers") 157 | p = Pool(args["w"]) 158 | results = p.starmap(do_simulate_machines, todo) 159 | 160 | logging.info(f"\t\textracting results") 161 | # ID -> extracted 162 | out_client = {} 163 | out_relay = {} 164 | for result in results: 165 | if extract_client: 166 | out_client[result[0]] = result[1] 167 | if extract_relay: 168 | out_relay[result[0]] = result[2] 169 | 170 | p.close() 171 | 172 | return out_client, out_relay 173 | 174 | def do_simulate_machines( 175 | client, relay, ID, 176 | extract_func, extract_client=True, extract_relay=False 177 | ): 178 | cmd = TOR_CIRCPADSIM_CMD_FORMAT.format(client, relay) 179 | result = subprocess.run(cmd, capture_output=True, text=True, shell=True) 180 | if result.returncode != 0: 181 | logging.error(f"got returncode {result.returncode} for cmd {cmd}") 182 | assert(result.returncode == 0) 183 | 184 | # parse out the simulated logs, get client and relay traces 185 | client_out = [] 186 | relay_out = [] 187 | log = result.stdout.split("\n") 188 | if extract_client: 189 | client_out = extract_func(log, client=True) 190 | if extract_relay: 191 | relay_out = extract_func(log, client=False) 192 | 193 | return (ID, client_out, relay_out) 194 | 195 | def extract_cells_detailed(log, client=True): 196 | i = 0 197 | length = args["l"] 198 | data = np.zeros((1, length), dtype=np.float32) 199 | for line in log: 200 | if i >= length: 201 | break 202 | 203 | if client and not "source=client" in line: 204 | continue 205 | elif not client and not "source=relay" in line: 206 | continue 207 | 208 | if shared.CIRCPAD_EVENT_NONPADDING_SENT in line: 209 | data[0][i] = 1.0 # outgoing is positive 210 | i += 1 211 | elif shared.CIRCPAD_EVENT_PADDING_SENT in line: 212 | data[0][i] = 2.0 213 | i += 1 214 | elif shared.CIRCPAD_EVENT_NONPADDING_RECV in line: 215 | data[0][i] = -1.0 216 | i += 1 217 | elif shared.CIRCPAD_EVENT_PADDING_RECV in line: 218 | data[0][i] = -2.0 219 | i += 1 220 | 221 | return data 222 | 223 | def load_dataset( 224 | c_mon_dir, c_unm_dir, 225 | r_mon_dir, r_unm_dir, 226 | classes, partitions, samples 227 | ): 228 | 229 | # ID -> class 230 | labels = {} 231 | # ID -> fname 232 | fnames_client = {} 233 | fnames_relay = {} 234 | 235 | # monitored 236 | for c in range(0,classes): 237 | for p in range(0,partitions): 238 | site = c*10 + p 239 | for s in range(0,samples): 240 | ID = f"m-{c}-{p}-{s}" 241 | fname = f"{site}-{s}.trace" 242 | 243 | labels[ID] = c 244 | fnames_client[ID] = os.path.join(c_mon_dir, fname) 245 | fnames_relay[ID] = os.path.join(r_mon_dir, fname) 246 | if not os.path.exists(fnames_client[ID]): 247 | sys.exit(f"{fnames_client[ID]} does not exist") 248 | if not os.path.exists(fnames_relay[ID]): 249 | sys.exit(f"{fnames_relay[ID]} does not exist") 250 | 251 | # unmonitored 252 | dirlist = os.listdir(c_unm_dir)[:len(labels)] 253 | for fname in dirlist: 254 | ID = f"u-{fname}" 255 | 256 | labels[ID] = classes # start from 0 for monitored 257 | fnames_client[ID] = os.path.join(c_unm_dir, fname) 258 | fnames_relay[ID] = os.path.join(r_unm_dir, fname) 259 | if not os.path.exists(fnames_relay[ID]): 260 | sys.exit(f"{fnames_relay[ID]} does not exist") 261 | 262 | # we need to provide: 263 | # - for simulate_machines, the full fname with path for each pair 264 | # - for df, labels as above and a way to get the data from simulate_machines that maps from ID 265 | # have simulate_machines produce ID -> data for client and relay 266 | return labels, fnames_client, fnames_relay 267 | 268 | if __name__ == "__main__": 269 | main() 270 | -------------------------------------------------------------------------------- /evaluation/tweak.sh: -------------------------------------------------------------------------------- 1 | # example of how to tweak machines stored in tmp-mc and tmp-mr, standard february dataset 2 | ./tweak.py --client dataset-feb/standard/client-traces/ --relay dataset-feb/standard/fakerelay-traces/ -t ../tor --mc phase2/strawman-mc --mr phase2/strawman-mr --save tmp.pkl -s $1 -w 8 3 | ./once.py --ld tmp.pkl --train -s $1 4 | ./overhead.py --ld tmp.pkl 5 | ./visualize.py --ld tmp.pkl -s tmp 6 | ./visualize.py --ld tmp.pkl -s tmp-nopadding --hide 7 | -------------------------------------------------------------------------------- /evaluation/visualize.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import argparse 3 | import sys 4 | import os 5 | import random 6 | import numpy as np 7 | import pickle 8 | from PIL import Image 9 | 10 | import circpadsim 11 | 12 | ap = argparse.ArgumentParser() 13 | ap.add_argument("--ld", required=True, 14 | help="load dataset from pickle, provide path to pickled file") 15 | 16 | ap.add_argument("-s", default="test", 17 | help="save filename prefix") 18 | 19 | # dimensions of the image 20 | ap.add_argument("-x", type=int, default=5000, 21 | help="image width (x-axis)") 22 | ap.add_argument("-y", type=int, default=1000, 23 | help="image height (y-axis)") 24 | 25 | ap.add_argument("--hide", required=False, default=False, 26 | action="store_true", help="hide padding cells") 27 | args = vars(ap.parse_args()) 28 | 29 | # TOMATO colors below 30 | COLOR_BACKGROUND = [0, 0, 0, 0] # transparent PNG (alpha 0) 31 | COLOR_NONPADDING_RECV = [0, 0, 0, 255] # black - most data is nonpadding received 32 | COLOR_NONPADDING_SENT = [255, 255, 255, 255] # white - sent nonpadding data 33 | COLOR_PADDING_RECV = [170, 57, 57, 255] # red - most padding is received padding 34 | COLOR_PADDING_SENT = [45, 136, 45, 255] # green - outgoing padding 35 | 36 | def main(): 37 | print(f"attempting to load dataset from pickle file {args['ld']}") 38 | dataset, _ = pickle.load(open(args["ld"], "rb")) 39 | 40 | image = Image.fromarray(get_img_data(dataset, args["y"], args["x"])) 41 | image.save(open(f"{args['s']}.png", "wb")) 42 | 43 | def get_img_data(dataset, n, width): 44 | data = np.full((n, width, 4), COLOR_BACKGROUND, dtype=np.uint8) 45 | 46 | for y, k in enumerate(dataset): 47 | if y >= n: 48 | break 49 | x = 0 50 | for v in dataset[k][0]: 51 | if x >= width: 52 | break 53 | if v == 1: 54 | data[y][x] =COLOR_NONPADDING_SENT 55 | x += 1 56 | elif v == -1: 57 | data[y][x] = COLOR_NONPADDING_RECV 58 | x += 1 59 | elif not args["hide"] and v == 2: 60 | data[y][x] = COLOR_PADDING_SENT 61 | x += 1 62 | elif not args["hide"] and v == -2: 63 | data[y][x] = COLOR_PADDING_RECV 64 | x += 1 65 | 66 | return data 67 | 68 | if __name__ == "__main__": 69 | main() -------------------------------------------------------------------------------- /evolve/README.md: -------------------------------------------------------------------------------- 1 | # Evolving Machines 2 | 3 | This is provided as-is for the sake of helping other researchers pondering 4 | burning CPU and GPU for new machines. All code highly research grade, certified 5 | mostly working by trial-and-error running on a single box. If you have any 6 | questions, want to discuss approaches here, or just rant about how bad the code 7 | is feel free to reach out for a chat. 8 | 9 | The main file is `loop.py`. Happy digging! -------------------------------------------------------------------------------- /evolve/circpadsim.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import sys 3 | import socket 4 | 5 | CIRCPAD_ERROR_WRONG_FORMAT = "invalid trace format" 6 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin" 7 | 8 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent" 9 | CIRCPAD_EVENT_NONPADDING_RECV = "circpad_cell_event_nonpadding_received" 10 | CIRCPAD_EVENT_PADDING_SENT = "circpad_cell_event_padding_sent" 11 | CIRCPAD_EVENT_PADDING_RECV = "circpad_cell_event_padding_received" 12 | 13 | CIRCPAD_LOG = "circpad_trace_event" 14 | CIRCPAD_LOG_TIMESTAMP = "timestamp=" 15 | CIRCPAD_LOG_CIRC_ID = "client_circ_id=" 16 | CIRCPAD_LOG_EVENT = "event=" 17 | 18 | CIRCPAD_BLACKLISTED_ADDRESSES = ["aus1.torproject.org"] 19 | CIRCPAD_BLACKLISTED_EVENTS = [ 20 | "circpad_negotiate_logging" 21 | ] 22 | 23 | def circpad_get_all_addresses(trace): 24 | addresses = [] 25 | for l in trace: 26 | if len(l) < 2: 27 | sys.exit(CIRCPAD_ERROR_WRONG_FORMAT) 28 | if CIRCPAD_ADDRESS_EVENT in l[1]: 29 | if len(l[1]) < 2: 30 | sys.exit(CIRCPAD_ERROR_WRONG_FORMAT) 31 | addresses.append(l[1].split()[1]) 32 | return addresses 33 | 34 | def circpad_get_nonpadding_times(trace): 35 | sent_nonpadding, recv_nonpadding = [], [] 36 | 37 | for l in trace: 38 | split = l.split() 39 | if CIRCPAD_EVENT_NONPADDING_SENT in split[1]: 40 | sent_nonpadding.append(split[0]) 41 | elif CIRCPAD_EVENT_NONPADDING_RECV in split[1]: 42 | recv_nonpadding.append(split[0]) 43 | 44 | return sent_nonpadding, recv_nonpadding 45 | 46 | def circpad_get_padding_times(trace): 47 | sent_padding, recv_padding = [], [] 48 | 49 | for l in trace: 50 | split = l.split() 51 | if CIRCPAD_EVENT_PADDING_SENT in split[1]: 52 | sent_padding.append(split[0]) 53 | elif CIRCPAD_EVENT_PADDING_RECV in split[1]: 54 | recv_padding.append(split[0]) 55 | 56 | return sent_padding, recv_padding 57 | 58 | def circpad_parse_line(line): 59 | split = line.split() 60 | assert(len(split) >= 2) 61 | event = split[1] 62 | timestamp = int(split[0]) 63 | 64 | return event, timestamp 65 | 66 | def circpad_lines_to_trace(lines): 67 | trace = [] 68 | for l in lines: 69 | event, timestamp = circpad_parse_line(l) 70 | trace.append((timestamp, event)) 71 | return trace 72 | 73 | def circpad_extract_log_traces( 74 | log_lines, 75 | source_client=True, 76 | source_relay=True, 77 | allow_ips=False, 78 | filter_client_negotiate=False, 79 | filter_relay_negotiate=False, 80 | max_length=999999999 81 | ): 82 | # helper function 83 | def blacklist_hit(d): 84 | for a in circpad_get_all_addresses(d): 85 | if a in CIRCPAD_BLACKLISTED_ADDRESSES: 86 | return True 87 | return False 88 | 89 | # helper to extract one line 90 | def extract_from_line(line): 91 | n = line.index(CIRCPAD_LOG_TIMESTAMP)+len(CIRCPAD_LOG_TIMESTAMP) 92 | timestamp = line[n:].split(" ", maxsplit=1)[0] 93 | n = line.index(CIRCPAD_LOG_CIRC_ID)+len(CIRCPAD_LOG_CIRC_ID) 94 | cid = line[n:].split(" ", maxsplit=1)[0] 95 | 96 | # an event is the last part, no need to split on space like we did earlier 97 | n = line.index(CIRCPAD_LOG_EVENT)+len(CIRCPAD_LOG_EVENT) 98 | event = line[n:] 99 | 100 | return int(cid), int(timestamp), event 101 | 102 | circuits = {} 103 | base = -1 104 | for line in log_lines: 105 | if CIRCPAD_LOG in line: 106 | # skip client/relay if they shouldn't be part of the trace 107 | if not source_client and "source=client" in line: 108 | continue 109 | if not source_relay and "source=relay" in line: 110 | continue 111 | 112 | # extract trace and make timestamps relative 113 | cid, timestamp, event = extract_from_line(line) 114 | if base == -1: 115 | base = timestamp 116 | timestamp = timestamp - base 117 | 118 | # store trace 119 | if cid in circuits.keys(): 120 | if len(circuits[cid]) < max_length: 121 | circuits[cid] = circuits.get(cid) + [(timestamp, event)] 122 | else: 123 | circuits[cid] = [(timestamp, event)] 124 | 125 | # filter out circuits with blacklisted addresses 126 | for cid in list(circuits.keys()): 127 | if blacklist_hit(circuits[cid]): 128 | del circuits[cid] 129 | # filter out circuits with only IPs (unless arg says otherwise) 130 | for cid in list(circuits.keys()): 131 | if not allow_ips and circpad_only_ips_in_trace(circuits[cid]): 132 | del circuits[cid] 133 | 134 | # remove blacklisted events (and associated events) 135 | for cid in list(circuits.keys()): 136 | circuits[cid] = circpad_remove_blacklisted_events(circuits[cid], 137 | filter_client_negotiate, filter_relay_negotiate) 138 | 139 | return circuits 140 | 141 | 142 | def circpad_remove_blacklisted_events( 143 | trace, 144 | filter_client_negotiate, 145 | filter_relay_negotiate 146 | ): 147 | 148 | result = [] 149 | ignore_next_send_cell = False 150 | 151 | for line in trace: 152 | strline = str(line) # What the hell was this before? 153 | 154 | # If we hit a blacklisted event, this means we should ignore the next 155 | # sent nonpadding cell. Since the blacklisted event should only be 156 | # triggered client-side, there shouldn't be any impact on relay traces. 157 | if any(b in strline for b in CIRCPAD_BLACKLISTED_EVENTS): 158 | ignore_next_send_cell = True 159 | else: 160 | if ignore_next_send_cell and CIRCPAD_EVENT_NONPADDING_SENT in strline: 161 | ignore_next_send_cell = False 162 | else: 163 | result.append(line) 164 | 165 | return result 166 | 167 | def circpad_only_ips_in_trace(trace): 168 | def is_ipv4(addr): 169 | try: 170 | socket.inet_aton(addr) 171 | except (socket.error, TypeError): 172 | return False 173 | return True 174 | def is_ipv6(addr): 175 | try: 176 | socket.inet_pton(addr,socket.AF_INET6) 177 | except (socket.error, TypeError): 178 | return False 179 | return True 180 | 181 | for a in circpad_get_all_addresses(trace): 182 | if not is_ipv4(a) and not is_ipv6(a): 183 | return False 184 | return True 185 | 186 | 187 | def circpad_to_wf( 188 | trace, 189 | cells=False, timecells=False, dirtime=False, cellevents=False, 190 | strip=False 191 | ): 192 | ''' Get a WF representation of the trace in the specified format. 193 | 194 | We support three formats: 195 | - cells, each line only contains 1 or -1 for outgoing or incoming cells. 196 | - timecells, relative timestamp (ms) added before each cell. 197 | - directionaltime, each line has relative time multiplied with cell value. 198 | - cellevents, each line consists of the trace event for (non)padding cells. 199 | 200 | If the strip flag is set, events prior to a first domain resolution is 201 | stripped from the trace (if present). Circuits are typically created in the 202 | background by Tor Browser to speed-up browsing for users. Removing this is 203 | beneficital for WF attackers, because it's often assumed (more or less 204 | realistically so) that an attacker can often detect this (mainly by a 205 | significant time of "silence" on the wire, followed by what is assumed to be 206 | a website load). 207 | 208 | FIXME: For timecells and dirtime, current magnitute is nanoseconds, might be 209 | more efficient to round to seconds with lower resultion, especially for deep 210 | learning attacks. 211 | ''' 212 | result = [] 213 | 214 | # only strip if we find the event for an address being resolved 215 | if strip: 216 | for i, l in enumerate(trace): 217 | if CIRCPAD_ADDRESS_EVENT in l[1]: 218 | trace = trace[i:] 219 | break 220 | 221 | for l in trace: 222 | # outgoing is positive 223 | if CIRCPAD_EVENT_NONPADDING_SENT in l[1] or \ 224 | CIRCPAD_EVENT_PADDING_SENT in l[1]: 225 | if cells: 226 | result.append("1") 227 | if timecells: 228 | result.append(f"{l[0]} 1") 229 | if dirtime: 230 | result.append(f"{l[0]}") 231 | if cellevents: 232 | result.append(l[1]) 233 | 234 | # incoming is negative 235 | elif CIRCPAD_EVENT_NONPADDING_RECV in l[1] or \ 236 | CIRCPAD_EVENT_PADDING_RECV in l[1]: 237 | if cells: 238 | result.append("-1") 239 | if timecells: 240 | result.append(f"{l[0]} -1") 241 | if dirtime: 242 | result.append(f"{l[0]*-1}") 243 | if cellevents: 244 | result.append(l[1]) 245 | return result 246 | -------------------------------------------------------------------------------- /evolve/evolve.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import machine 3 | import random 4 | import math 5 | 6 | import numpy as np 7 | 8 | ''' 9 | This is more genetic programming than genetic algorithms due to how we represent 10 | the machines to evolve. 11 | 12 | TL;DR Johan suggestions: have advanced selection or simple selection and 13 | mutation, mutation probability can be kept constant, elite fraction a good idea, 14 | was worthwhile to remove all useless individuals instead of keeping for 15 | diversity (Internet seems mixed on this though). 16 | 17 | inspiration https://deap.readthedocs.io/en/master/api/tools.html 18 | ''' 19 | 20 | def mutation(m, probability, exp): 21 | # with some probability, mutate each part of each state of a machine in place 22 | for s in m.states: 23 | if random.random() < probability: 24 | s.randomize_iat_dist(exp, probability) 25 | if random.random() < probability: 26 | s.randomize_length_dist(exp, probability) 27 | if random.random() < probability: 28 | s.randomize_transitions(exp, probability) 29 | 30 | def crossover(m1, m2, probability): 31 | # with some probability, performs single-point crossover in place 32 | if random.random() < probability: 33 | c = random.randint(0, len(m1.states)-1) 34 | tmp = m1.states[:c] 35 | m1.states[:c] = m2.states[:c] 36 | m2.states[:c] = tmp 37 | 38 | def selection(ml, fitness_func): 39 | # given a list of machines, selects the best, using the fitness function 40 | # we order, letting next_generation discard 41 | 42 | fl = [] 43 | for mp in ml: 44 | fl.append(fitness_func(mp)) 45 | 46 | # sort ml and fl together 47 | #fl, ml = (list(t) for t in zip(*sorted(zip(fl, ml)))) 48 | idx = np.argsort(fl) 49 | fl = list(np.array(fl)[idx]) 50 | ml = list(np.array(ml)[idx]) 51 | fl.reverse() 52 | ml.reverse() 53 | 54 | return ml, fl 55 | 56 | def initial_population(mc, mr, exp): 57 | pop = [] 58 | for _ in range(exp["population_size"]): 59 | pop.append([mc.randomize(exp), mr.randomize(exp)]) 60 | return pop 61 | 62 | def next_generation(ml, fl, exp): 63 | """ 64 | Given a sorted list of pairs of machines (better to worse) and their 65 | fitness, creates the next generation of machines. Is elitist, keeping the 66 | best machines as-is, and includes some machines randomly for diversity. The 67 | rest of the population is evolved using crossover and mutation from randomly 68 | selected machines, selected by weight based on their fitness. 69 | """ 70 | 71 | # elitist, pick a fraction of the best for the next generation 72 | n = math.floor(len(ml)*exp["elitist_frac"]) 73 | ng = ml[:n] 74 | 75 | # diverse, pick a random fraction for the next generation 76 | n = math.floor(len(ml)*exp["diversity_frac"]) 77 | ng.extend(random.choices(ml, k=n)) 78 | 79 | # evolve remaining next generation 80 | while(len(ng) < len(ml)): 81 | # select two random parents, weighted by fitness 82 | parents = random.choices(ml, weights=fl, k=2) 83 | 84 | # make two new machines as clones 85 | m0c = parents[0][0].clone() 86 | m0r = parents[0][1].clone() 87 | m1c = parents[1][0].clone() 88 | m1r = parents[1][1].clone() 89 | 90 | # TODO: crossover between pairs, but never swap roles, with some probability? 91 | 92 | # crossover of states, per role 93 | crossover(m0c, m1c, exp["crossover_prob"]) 94 | crossover(m0r, m1r, exp["crossover_prob"]) 95 | 96 | # mutate each machine 97 | mutation(m0c, exp["mutation_prob"], exp) 98 | mutation(m0r, exp["mutation_prob"], exp) 99 | mutation(m1c, exp["mutation_prob"], exp) 100 | mutation(m1r, exp["mutation_prob"], exp) 101 | 102 | # done, add to population 103 | ng.append([m0c, m0r]) 104 | ng.append([m1c, m1r]) 105 | 106 | # we may end up evolving one machine too many above in case elitist and 107 | # diverse fractions result in an uneven number of machines 108 | return ng[:len(ml)] 109 | 110 | def main(): 111 | # can do head and tail independent 112 | # add probabilistic (consensus parameter) transition from head to tail and done 113 | # start with safest; simpler and more realistic evaluation of effectiveness 114 | # efficiency in absolutes (like the Sith!) 115 | 116 | # example hardcoded state 117 | s = machine.MachineState( 118 | iat_dist=machine.Distribution(machine.DistType.LOG_LOGISTIC, 2, 10), 119 | length_dist=machine.Distribution(machine.DistType.UNIFORM, 1, 5), 120 | length_dist_add=1, 121 | length_dist_max=100, 122 | transitions=[[machine.Event.PADDING_SENT, 0], [machine.Event.NONPADDING_SENT, 0]] 123 | ) 124 | 125 | # can we create a machine? 126 | m = machine.Machine(name="goodenough", states=[s]) 127 | print(f"{m}\n") 128 | print(f"m has ID {m.id()}") 129 | 130 | # can we generate c? 131 | conditions = "hardcoded_conditions;" 132 | print(conditions) 133 | print(m.to_c("generated")) 134 | 135 | # parameters for our experiment in a dict, easier to save 136 | exp = {} 137 | exp["num_states"] = 3 138 | exp["iat_d_low"] = 0.0 139 | exp["iat_d_high"] = 10.0 140 | exp["iat_a_low"] = 0 141 | exp["iat_a_high"] = 10 142 | exp["iat_m_low"] = 100 143 | exp["iat_m_high"] = 100*1000 144 | exp["length_d_low"] = 0 145 | exp["length_d_high"] = 100 146 | exp["length_a_low"] = 10 147 | exp["length_a_high"] = 100 148 | exp["length_m_low"] = 100 149 | exp["length_m_high"] = 1*1000 150 | 151 | # random machine check 152 | r = m.randomize(exp) 153 | print(r) 154 | print(r.to_c("random")) 155 | 156 | # mutation check 157 | r2 = r.clone() 158 | mutation(r2, 0.5, exp) 159 | print(f"\n{r}\n\n{r2}") 160 | 161 | # crossover check 162 | m1 = m.randomize(exp) 163 | m2 = m.randomize(exp) 164 | print(f"\n{m1}\n\n{m2}") 165 | crossover(m1, m2, 1.0) 166 | print(f"\n{m1}\n\n{m2}") 167 | 168 | # initial population, we evolve machines in *pairs*, highly asymmetrical setting 169 | exp["name"] = "evolved" 170 | exp["target_hopnum"] = 1 171 | exp["population_size"] = 10 172 | exp["allowed_padding_count_client"] = 1000 173 | exp["max_padding_percent_client"] = 50 174 | exp["allowed_padding_count_relay"] = 1000 175 | exp["max_padding_percent_relay"] = 50 176 | ## template client and relay machines with our parameters 177 | mc = machine.Machine( 178 | is_origin_side=True, name=exp["name"], target_hopnum=exp["target_hopnum"], 179 | allowed_padding_count=exp["allowed_padding_count_client"], 180 | max_padding_percent=exp["max_padding_percent_client"], 181 | ) 182 | mr = mc.clone() 183 | mr.is_origin_side = False 184 | mr.allowed_padding_count = exp["allowed_padding_count_relay"] 185 | mr.max_padding_percent = exp["max_padding_percent_relay"] 186 | print("") 187 | 188 | ml = initial_population(mc, mr, exp) 189 | print(f"ml has {len(ml)} pairs of machines") 190 | 191 | def bad_fit_func(mp): 192 | return random.random() 193 | 194 | ml, fl = selection(ml, bad_fit_func) 195 | 196 | for i in range(len(fl)): 197 | print(f"fitness {fl[i]:1.2} for {ml[i]}") 198 | 199 | # next_generation check, can be done without working fitness function 200 | exp["mutation_prob"] = 0.2 201 | exp["crossover_prob"] = 0.7 202 | exp["elitist_frac"] = 0.2 203 | exp["diversity_frac"] = 0.1 204 | 205 | # create dummy fl 206 | fl = [f for f in range(10)] 207 | ng = next_generation(ml, fl, exp) 208 | print(f"ng has {len(ng)} pairs of machines") 209 | 210 | if __name__ == "__main__": 211 | main() -------------------------------------------------------------------------------- /evolve/machine.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | from enum import Enum 3 | import copy 4 | import random 5 | import hashlib 6 | 7 | # the possible discrete distributions 8 | class DistType(Enum): 9 | NONE = "CIRCPAD_DIST_NONE" 10 | UNIFORM = "CIRCPAD_DIST_UNIFORM" 11 | LOGISTIC = "CIRCPAD_DIST_LOGISTIC" 12 | LOG_LOGISTIC = "CIRCPAD_DIST_LOG_LOGISTIC" 13 | GEOMETRIC = "CIRCPAD_DIST_GEOMETRIC" 14 | WEIBULL = "CIRCPAD_DIST_WEIBULL" 15 | PARETO = "CIRCPAD_DIST_PARETO" 16 | 17 | class Distribution: 18 | def __init__(self, dist_type=DistType.NONE, param1=0, param2=0): 19 | self.dist_type = dist_type 20 | self.param1 = param1 21 | self.param2 = param2 22 | 23 | def __str__(self): 24 | return f"{self.dist_type} {self.param1:.2f} {self.param2:.2f}" 25 | 26 | def randomize(self, a, b): 27 | self.dist_type = random.choice(list(DistType)) 28 | self.param1 = random.uniform(a,b) 29 | self.param2 = random.uniform(a,b) 30 | return self 31 | 32 | # the events specified by struct circpad_event_t 33 | class Event(Enum): 34 | # a non-padding cell was received 35 | NONPADDING_RECV = "CIRCPAD_EVENT_NONPADDING_RECV" 36 | # a non-padding cell was sent 37 | NONPADDING_SENT = "CIRCPAD_EVENT_NONPADDING_SENT" 38 | # a padding cell (RELAY_COMMAND_DROP) was sent 39 | PADDING_SENT = "CIRCPAD_EVENT_PADDING_SENT" 40 | # a padding cell was received 41 | PADDING_RECV = "CIRCPAD_EVENT_PADDING_RECV" 42 | # we tried to schedule padding but we ended up picking the infinity bin 43 | # which means that padding was delayed infinitely 44 | INFINITY = "CIRCPAD_EVENT_INFINITY" 45 | # all histogram bins are empty (we are out of tokens) 46 | BINS_EMPTY = "CIRCPAD_EVENT_BINS_EMPTY" 47 | # out of allowed cells to send in state 48 | LENGTH_COUNT = "CIRCPAD_EVENT_LENGTH_COUNT" 49 | 50 | class MachineState: 51 | # TODO histogram with iat_histogram and token_removal 52 | def __init__( 53 | self, 54 | # IAT-dist 55 | iat_dist=None, 56 | # dist_added_shift_usec 57 | iat_dist_add=0, 58 | # dist_max_sample_usec 59 | iat_dist_max=None, 60 | # length-dist 61 | length_dist=None, 62 | # start_length 63 | length_dist_add=0, 64 | # max_length 65 | length_dist_max=None, 66 | # should we decrement length when we see a nonpadding packet? 67 | length_includes_nonpadding=False, 68 | # the transitions from this state, we ignore by default 69 | transitions = [] 70 | ): 71 | self.iat_dist = iat_dist 72 | self.iat_dist_add = iat_dist_add 73 | self.iat_dist_max = iat_dist_max 74 | self.length_dist = length_dist 75 | self.length_dist_add = length_dist_add 76 | self.length_dist_max = length_dist_max 77 | self.length_includes_nonpadding = length_includes_nonpadding 78 | self.transitions = transitions 79 | 80 | def __str__(self): 81 | r = f"\tiat-dist {self.iat_dist}" 82 | if self.iat_dist_add > 0 or self.iat_dist_max != None: 83 | r += f" clamped to [{self.iat_dist_add}, {self.iat_dist_max}],\n" 84 | 85 | r += f"\tlength-dist {self.length_dist}" 86 | if self.length_dist_add > 0 or self.length_dist_max != None: 87 | r += f" clamped to [{self.length_dist_add}, {self.length_dist_max}],\n" 88 | 89 | if self.length_includes_nonpadding: 90 | r += f"\tlength_includes_nonpadding,\n" 91 | 92 | if len(self.transitions) == 0: 93 | r += f"\tno transitions" 94 | else: 95 | r += f"\ttransitions\n\t[" 96 | for t in self.transitions: 97 | r += f"\n\t\t{t[0]} -> {t[1]}," 98 | r += f"\n\t]" 99 | 100 | return r 101 | 102 | def to_c(self, prefix): 103 | ''' 104 | Returns c code with the prefix added for each line. 105 | 106 | The prefix should be generated by the caller in a format to fit the 107 | following format _before_ the first . (excluding the dot): 108 | 109 | machine->states[index].length_dist.type = CIRCPAD_DIST_UNIFORM; 110 | 111 | One possible prefix would be "machine->states[1]" for the "machine" 112 | variable name and the state with index 1. 113 | ''' 114 | 115 | c = "" 116 | prefix = f"\n{prefix}" 117 | 118 | if not self.length_dist is None: 119 | c += f"{prefix}.length_dist.type = {self.length_dist.dist_type.value};" 120 | c += f"{prefix}.length_dist.param1 = {self.length_dist.param1};" 121 | c += f"{prefix}.length_dist.param2 = {self.length_dist.param2};" 122 | 123 | if self.length_dist_add > 0: 124 | c += f"{prefix}.start_length = {self.length_dist_add};" 125 | 126 | if not self.length_dist_max is None: 127 | c += f"{prefix}.max_length = {self.length_dist_max};" 128 | 129 | if not self.iat_dist is None: 130 | c += f"{prefix}.iat_dist.type = {self.iat_dist.dist_type.value};" 131 | c += f"{prefix}.iat_dist.param1 = {self.iat_dist.param1};" 132 | c += f"{prefix}.iat_dist.param2 = {self.iat_dist.param2};" 133 | 134 | if self.iat_dist_add > 0: 135 | c += f"{prefix}.dist_added_shift_usec = {self.iat_dist_add};" 136 | 137 | if not self.iat_dist_max is None: 138 | c += f"{prefix}.dist_max_sample_usec = {self.iat_dist_max};" 139 | else: 140 | # BUG: circuitpadding.c, line 560, should check if set like for length 141 | c += f"{prefix}.dist_max_sample_usec = CIRCPAD_DELAY_INFINITE;" 142 | 143 | if self.length_includes_nonpadding: 144 | c += f"{prefix}.length_includes_nonpadding = 1;" 145 | 146 | for t in self.transitions: 147 | c += f"{prefix}.next_state[{t[0].value}] = {t[1]};" 148 | 149 | return c 150 | 151 | def randomize(self, exp): 152 | self.randomize_iat_dist(exp) 153 | self.randomize_length_dist(exp) 154 | self.randomize_transitions(exp) 155 | return self 156 | 157 | def randomize_iat_dist( 158 | self, 159 | exp, 160 | probability=1.0, 161 | ): 162 | if random.random() < probability: 163 | self.iat_dist = Distribution().randomize( 164 | exp["iat_d_low"], 165 | exp["iat_d_high"] 166 | ) 167 | if random.random() < probability: 168 | self.iat_dist_add = random.randint( 169 | exp["iat_a_low"], 170 | exp["iat_a_high"] 171 | ) 172 | if random.random() < probability: 173 | self.iat_dist_max = random.randint( 174 | exp["iat_m_low"], 175 | exp["iat_m_high"] 176 | ) 177 | 178 | def randomize_length_dist( 179 | self, 180 | exp, 181 | probability=1.0, 182 | ): 183 | if random.random() < probability: 184 | self.length_dist = Distribution().randomize( 185 | exp["length_d_low"], 186 | exp["length_d_high"] 187 | ) 188 | if random.random() < probability: 189 | self.length_dist_add = random.randint( 190 | exp["length_a_low"], 191 | exp["length_a_high"] 192 | ) 193 | if random.random() < probability: 194 | self.length_dist_max = random.randint( 195 | exp["length_m_low"], 196 | exp["length_m_high"] 197 | ) 198 | 199 | def randomize_transitions( 200 | self, 201 | exp, 202 | probability=1.0, 203 | ): 204 | # by chance, a state that end up never having any transitions from or to 205 | # it may appear useless, but like "introns", they are useful "material" 206 | # later for mutation it seems 207 | self.transitions = [] 208 | for e in Event: 209 | # TODO: with histograms we can consider these, for now, ignore 210 | if e == Event.INFINITY or e == Event.BINS_EMPTY: 211 | continue 212 | if random.random() < probability: 213 | self.transitions.append([e,random.randint(0,exp["num_states"]-1)]) 214 | 215 | 216 | class Machine: 217 | # TODO: conditions 218 | def __init__( 219 | self, 220 | # just a user-friendly machine name for logs 221 | name = "", 222 | # which machine index slot should this machine go into 223 | machine_index = 0, 224 | # send a padding negotiate to shut down machine at end state? 225 | should_negotiate_end = False, 226 | # origin side or relay side 227 | is_origin_side = False, 228 | # which hop in the circuit should we send padding to/from? 229 | # 1-indexed (ie: hop #1 is guard, #2 middle, #3 exit). 230 | target_hopnum = 0, 231 | # if this flag is enabled, don't close circuits that use this machine 232 | manage_circ_lifetime = False, 233 | # how many padding cells can be sent before we apply overhead limits? 234 | allowed_padding_count = 0, 235 | # padding percent cap: Stop padding if we exceed this percent overhead. 236 | max_padding_percent = 0, 237 | # list of states 238 | states = [], 239 | ): 240 | self.name = name 241 | self.machine_index = machine_index 242 | self.should_negotiate_end = should_negotiate_end 243 | self.is_origin_side = is_origin_side 244 | self.target_hopnum = target_hopnum 245 | self.manage_circ_lifetime = manage_circ_lifetime 246 | self.allowed_padding_count = allowed_padding_count 247 | self.max_padding_percent = max_padding_percent 248 | self.states = states 249 | 250 | def __str__(self): 251 | r = f"{self.name}, index {self.machine_index}" 252 | 253 | if self.should_negotiate_end: 254 | r += f", should_negotiate_end" 255 | 256 | if self.is_origin_side: 257 | r += f", origin side" 258 | else: 259 | r += f", relay side" 260 | 261 | r += f", sending padding to/from " 262 | if self.target_hopnum == 0: 263 | r += f"guard" 264 | elif self.target_hopnum == 1: 265 | r += f"middle" 266 | else: 267 | r += f"exit" 268 | 269 | if self.manage_circ_lifetime: 270 | r += f", manage_circ_lifetime" 271 | if self.allowed_padding_count > 0: 272 | r += f", allowed_padding_count {self.allowed_padding_count}" 273 | if self.max_padding_percent > 0: 274 | r += f", max_padding_percent {self.max_padding_percent}" 275 | 276 | r += f", states:\n[" 277 | for s in self.states: 278 | r += f"\n{s}, " 279 | r += f"\n]" 280 | 281 | return r 282 | 283 | def to_c(self, varname): 284 | # transforms the machine to c code with the specified variable name 285 | 286 | prefix = f"\n{varname}" 287 | c = f"{prefix}->name = \"{self.name}\";" 288 | c += f"{prefix}->machine_index = {self.machine_index};" 289 | c += f"{prefix}->target_hopnum = {self.target_hopnum};" 290 | 291 | if self.should_negotiate_end: 292 | c += f"{prefix}->should_negotiate_end = 1;" 293 | if self.is_origin_side: 294 | c += f"{prefix}->is_origin_side = 1;" 295 | else: 296 | c += f"{prefix}->is_origin_side = 0;" 297 | if self.manage_circ_lifetime: 298 | c += f"{prefix}->manage_circ_lifetime = 1;" 299 | 300 | c += f"{prefix}->allowed_padding_count = {self.allowed_padding_count};" 301 | c += f"{prefix}->max_padding_percent = {self.max_padding_percent};" 302 | 303 | c += f"\ncircpad_machine_states_init({varname}, {len(self.states)});" 304 | for k, s in enumerate(self.states): 305 | c += s.to_c(f"{varname}->states[{k}]") 306 | 307 | return c 308 | 309 | def clone(self): 310 | return copy.deepcopy(self) 311 | 312 | def randomize(self, exp): 313 | # create a randomized clone of this machine with num_states random states 314 | r = self.clone() 315 | r.states = [] 316 | for _ in range(exp["num_states"]): 317 | r.states.append(MachineState().randomize(exp)) 318 | 319 | return r 320 | 321 | def id(self): 322 | return hashlib.sha256(self.to_c("").encode("ascii")).hexdigest()[:16] 323 | -------------------------------------------------------------------------------- /evolve/shared.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import numpy as np 3 | import os 4 | import sys 5 | from torch.utils import data 6 | import torch.nn as nn 7 | 8 | def metrics(threshold, predictions, labels, label_unmon): 9 | ''' Computes a range of metrics. 10 | 11 | For details on the metrics, see, e.g., https://www.cs.kau.se/pulls/hot/baserate/ 12 | ''' 13 | tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1 = 0, 0, 0, 0, 0, 0.0, 0.0, 0.0, 0.0 14 | 15 | # extended metric: per-class monitored stats 16 | monitored_right = {} 17 | monitored_total = {} 18 | 19 | for i in range(len(predictions)): 20 | label_pred = np.argmax(predictions[i]) 21 | prob_pred = max(predictions[i]) 22 | label_correct = labels[i] 23 | 24 | # we split on monitored or unmonitored correct label 25 | if label_correct != label_unmon: 26 | monitored_total[label_correct] = monitored_total.get(label_correct, 0) + 1 27 | # either confident and correct, 28 | if prob_pred >= threshold and label_pred == label_correct: 29 | tp = tp + 1 30 | monitored_right[label_pred] = monitored_right.get(label_pred, 0) + 1 31 | # confident and wrong monitored label, or 32 | elif prob_pred >= threshold and label_pred != label_unmon: 33 | fpp = fpp + 1 34 | # wrong because not confident or predicted unmonitored for monitored 35 | else: 36 | fn = fn + 1 37 | else: 38 | if prob_pred < threshold or label_pred == label_unmon: # correct prediction? 39 | tn = tn + 1 40 | elif label_pred < label_unmon: # predicted monitored for unmonitored 41 | fnp = fnp + 1 42 | else: # this should never happen 43 | sys.exit(f"this should never, wrongly labelled data for {label_pred}") 44 | 45 | if tp + fn + fpp > 0: 46 | recall = round(float(tp) / float(tp + fpp + fn), 4) 47 | if tp + fpp + fnp > 0: 48 | precision = round(float(tp) / float(tp + fpp + fnp), 4) 49 | 50 | if precision > 0 and recall > 0: 51 | f1 = round(2*((precision*recall)/(precision+recall)), 4) 52 | 53 | accuracy = round(float(tp + tn) / float(tp + fpp + fnp + fn + tn), 4) 54 | 55 | return tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1, monitored_right, monitored_total 56 | 57 | 58 | class Dataset(data.Dataset): 59 | def __init__(self, ids, dataset, labels): 60 | self.ids = ids 61 | self.dataset = dataset 62 | self.labels = labels 63 | 64 | def __len__(self): 65 | return len(self.ids) 66 | 67 | def __getitem__(self, index): 68 | ID = self.ids[index] 69 | return self.dataset[ID], self.labels[ID] 70 | 71 | def load_dataset( 72 | mon_dir, unm_dir, 73 | classes, partitions, samples, 74 | length, extract_func 75 | ): 76 | ''' Loads the dataset from disk into two dictionaries for data and labels. 77 | 78 | The dictionaries are indexed by sample ID. The ID encodes if its a monitored 79 | or unmonitored sample to make it easier to debug, as well as some info about 80 | the corresponding data file on disk. 81 | 82 | This function works assumes the structure of the following dataset: 83 | - "top50-partitioned-reddit-levels-cirucitpadding" 84 | ''' 85 | data = {} 86 | labels = {} 87 | 88 | # load monitored data 89 | for c in range(0,classes): 90 | for p in range(0,partitions): 91 | site = c*10 + p 92 | for s in range(0,samples): 93 | ID = f"m-{c}-{p}-{s}" 94 | labels[ID] = c 95 | 96 | # file format is {site}-{sample}.trace 97 | fname = f"{site}-{s}.trace" 98 | with open(os.path.join(mon_dir, fname), "r") as f: 99 | data[ID] = extract_func(f.read(), length) 100 | 101 | # load unmonitored data 102 | dirlist = os.listdir(unm_dir) 103 | # make sure we only load a balanced dataset 104 | dirlist = dirlist[:len(data)] 105 | for fname in dirlist: 106 | ID = f"u-{fname}" 107 | labels[ID] = classes # start from 0 for monitored 108 | with open(os.path.join(unm_dir, fname), "r") as f: 109 | data[ID] = extract_func(f.read(), length) 110 | 111 | return data, labels 112 | 113 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent" 114 | CIRCPAD_EVENT_NONPADDING_RECV = "circpad_cell_event_nonpadding_received" 115 | CIRCPAD_EVENT_PADDING_SENT = "circpad_cell_event_padding_sent" 116 | CIRCPAD_EVENT_PADDING_RECV = "circpad_cell_event_padding_received" 117 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin" 118 | 119 | def trace2cells(log, length, strip=True): 120 | ''' A fast specialised function to generate cells from a trace. 121 | 122 | Based on circpad_to_wf() in circpad-sim/common.py, but only for cells. 123 | ''' 124 | data = np.zeros((1, length), dtype=np.float32) 125 | n = 0 126 | 127 | if strip: 128 | for i, line in enumerate(log): 129 | if CIRCPAD_ADDRESS_EVENT in line: 130 | log = log[i:] 131 | break 132 | 133 | s = log.split("\n") 134 | for line in s: 135 | # outgoing is positive 136 | if CIRCPAD_EVENT_NONPADDING_SENT in line or \ 137 | CIRCPAD_EVENT_PADDING_SENT in line: 138 | data[0][n] = 1.0 139 | n += 1 140 | # incoming is negative 141 | elif CIRCPAD_EVENT_NONPADDING_RECV in line or \ 142 | CIRCPAD_EVENT_PADDING_RECV in line: 143 | data[0][n] = -1.0 144 | n += 1 145 | 146 | if n == length: 147 | break 148 | 149 | return data 150 | 151 | def split_dataset( 152 | classes, partitions, samples, fold, labels, multiplier=1, 153 | ): 154 | '''Splits the dataset based on fold. 155 | 156 | The split is only based on IDs, not the actual data. The result is a 8:1:1 157 | split into training, validation, and testing. 158 | ''' 159 | training = [] 160 | validation = [] 161 | testing = [] 162 | 163 | # monitored, split by _partition_ 164 | for c in range(0,classes): 165 | for p in range(0,partitions): 166 | for s in range(0,samples): 167 | for x in range(0,multiplier): 168 | ID = f"m-{c}-{p}-{s}" 169 | if multiplier > 1: 170 | ID = f"{ID}-{x}" 171 | 172 | i = (p+fold) % partitions 173 | if i < partitions-2: 174 | training.append(ID) 175 | elif i < partitions-1: 176 | validation.append(ID) 177 | else: 178 | testing.append(ID) 179 | 180 | # unmonitored 181 | counter = 0 182 | for k in labels.keys(): 183 | if not k.startswith("u"): 184 | continue 185 | i = (counter+fold) % partitions 186 | if i < partitions-2: 187 | training.append(k) 188 | elif i < partitions-1: 189 | validation.append(k) 190 | else: 191 | testing.append(k) 192 | counter += 1 193 | 194 | split = {} 195 | split["train"] = training 196 | split["validation"] = validation 197 | split["test"] = testing 198 | return split 199 | 200 | def zero_dataset(dataset, z): 201 | index = z.split(":") 202 | start = int(index[0]) 203 | stop = int(index[1]) 204 | data = np.zeros((stop-start), dtype=np.float32) 205 | for k, v in dataset.items(): 206 | v[:,start:stop] = data 207 | dataset[k] = v 208 | return dataset 209 | 210 | class DFNet(nn.Module): 211 | def __init__(self, classes, fc_in_features = 512*10): 212 | super(DFNet, self).__init__() 213 | # https://ezyang.github.io/convolution-visualizer/index.html 214 | # https://github.com/lin-zju/deep-fp/blob/master/lib/modeling/backbone/dfnet.py 215 | self.kernel_size = 7 216 | self.padding_size = 3 217 | self.pool_stride_size = 4 218 | self.pool_size = 7 219 | 220 | self.block1 = self.__block(1, 32, nn.ELU()) 221 | self.block2 = self.__block(32, 64, nn.ReLU()) 222 | self.block3 = self.__block(64, 128, nn.ReLU()) 223 | self.block4 = self.__block(128, 256, nn.ReLU()) 224 | 225 | self.fc = nn.Sequential( 226 | nn.Linear(fc_in_features, 512), 227 | nn.BatchNorm1d(512), 228 | nn.ReLU(), 229 | nn.Dropout(0.7), 230 | nn.Linear(512, 512), 231 | nn.BatchNorm1d(512), 232 | nn.ReLU(), 233 | nn.Dropout(0.5) 234 | ) 235 | 236 | self.prediction = nn.Sequential( 237 | nn.Linear(512, classes), 238 | # when using CrossEntropyLoss, already computed internally 239 | #nn.Softmax(dim=1) # dim = 1, don't softmax batch 240 | ) 241 | 242 | def __block(self, channels_in, channels, activation): 243 | return nn.Sequential( 244 | nn.Conv1d(channels_in, channels, self.kernel_size, padding=self.padding_size), 245 | nn.BatchNorm1d(channels), 246 | activation, 247 | nn.Conv1d(channels, channels, self.kernel_size, padding=self.padding_size), 248 | nn.BatchNorm1d(channels), 249 | activation, 250 | nn.MaxPool1d(self.pool_size, stride=self.pool_stride_size, padding=self.padding_size), 251 | nn.Dropout(p=0.1) 252 | ) 253 | 254 | def forward(self, x): 255 | x = self.block1(x) 256 | x = self.block2(x) 257 | x = self.block3(x) 258 | x = self.block4(x) 259 | x = x.flatten(start_dim=1) # dim = 1, don't flatten batch 260 | x = self.fc(x) 261 | x = self.prediction(x) 262 | 263 | return x -------------------------------------------------------------------------------- /machines/hello-world.md: -------------------------------------------------------------------------------- 1 | # The Hello World Machine 2 | This shows the steps we plan to take to design, implement, evaluate, and 3 | document machines. It's just meant to be an example. 4 | 5 | The goal of this example machine is simple: to send at least one padding cell in 6 | each direction, to/from client from/to relay. We don't care about sending more 7 | padding, being efficient, or making much sense. It's just an example. 8 | 9 | ## Design 10 | Since we don't care much about anything for this machine, and I'm lazy, I'm just 11 | going to take a randomly generated machine from a tool we're in the process of 12 | tweaking. 13 | 14 | ## Implementation 15 | This gets ugly. Below we look at the client and relay machines: 16 | 17 | ```c 18 | circpad_machine_spec_t *gen_client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 19 | gen_client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 20 | gen_client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 21 | gen_client->conditions.reduced_padding_ok = 1; 22 | gen_client->name = "evolved"; 23 | gen_client->machine_index = 0; 24 | gen_client->target_hopnum = 1; 25 | gen_client->is_origin_side = 1; 26 | gen_client->allowed_padding_count = 200; 27 | gen_client->max_padding_percent = 50;circpad_machine_states_init(gen_client, 6); 28 | gen_client->states[0].length_dist.type = CIRCPAD_DIST_GEOMETRIC; 29 | gen_client->states[0].length_dist.param1 = 4.814274646108755; 30 | gen_client->states[0].length_dist.param2 = 4.869971264299856; 31 | gen_client->states[0].start_length = 4; 32 | gen_client->states[0].max_length = 505; 33 | gen_client->states[0].iat_dist.type = CIRCPAD_DIST_NONE; 34 | gen_client->states[0].iat_dist.param1 = 7.526266612653222; 35 | gen_client->states[0].iat_dist.param2 = 7.403589208246087; 36 | gen_client->states[0].start_length = 8; 37 | gen_client->states[0].dist_max_sample_usec = 63304; 38 | gen_client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 4; 39 | gen_client->states[1].length_dist.type = CIRCPAD_DIST_PARETO; 40 | gen_client->states[1].length_dist.param1 = 4.403617327117251; 41 | gen_client->states[1].length_dist.param2 = 5.996417832959251; 42 | gen_client->states[1].start_length = 6; 43 | gen_client->states[1].max_length = 483; 44 | gen_client->states[1].iat_dist.type = CIRCPAD_DIST_GEOMETRIC; 45 | gen_client->states[1].iat_dist.param1 = 8.361216732993883; 46 | gen_client->states[1].iat_dist.param2 = 0.9264596277951376; 47 | gen_client->states[1].start_length = 8; 48 | gen_client->states[1].dist_max_sample_usec = 7065; 49 | gen_client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 50 | gen_client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 51 | gen_client->states[2].length_dist.type = CIRCPAD_DIST_WEIBULL; 52 | gen_client->states[2].length_dist.param1 = 1.0426652399191527; 53 | gen_client->states[2].length_dist.param2 = 3.091020838174913; 54 | gen_client->states[2].start_length = 10; 55 | gen_client->states[2].max_length = 887; 56 | gen_client->states[2].iat_dist.type = CIRCPAD_DIST_WEIBULL; 57 | gen_client->states[2].iat_dist.param1 = 5.667292387983577; 58 | gen_client->states[2].iat_dist.param2 = 7.958737236028522; 59 | gen_client->states[2].dist_max_sample_usec = 23447; 60 | gen_client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 4; 61 | gen_client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 62 | gen_client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1; 63 | gen_client->states[3].length_dist.type = CIRCPAD_DIST_PARETO; 64 | gen_client->states[3].length_dist.param1 = 9.929415473412345; 65 | gen_client->states[3].length_dist.param2 = 5.546471576686779; 66 | gen_client->states[3].start_length = 7; 67 | gen_client->states[3].max_length = 936; 68 | gen_client->states[3].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 69 | gen_client->states[3].iat_dist.param1 = 3.332738685735962; 70 | gen_client->states[3].iat_dist.param2 = 6.678039275209297; 71 | gen_client->states[3].start_length = 3; 72 | gen_client->states[3].dist_max_sample_usec = 38700; 73 | gen_client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3; 74 | gen_client->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1; 75 | gen_client->states[4].length_dist.type = CIRCPAD_DIST_UNIFORM; 76 | gen_client->states[4].length_dist.param1 = 2.8857540118794556; 77 | gen_client->states[4].length_dist.param2 = 6.125818574119025; 78 | gen_client->states[4].start_length = 2; 79 | gen_client->states[4].max_length = 820; 80 | gen_client->states[4].iat_dist.type = CIRCPAD_DIST_PARETO; 81 | gen_client->states[4].iat_dist.param1 = 4.519039376257881; 82 | gen_client->states[4].iat_dist.param2 = 7.220421029371751; 83 | gen_client->states[4].start_length = 6; 84 | gen_client->states[4].dist_max_sample_usec = 79621; 85 | gen_client->states[4].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 86 | gen_client->states[4].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 87 | gen_client->states[4].next_state[CIRCPAD_EVENT_PADDING_RECV] = 4; 88 | gen_client->machine_num = smartlist_len(origin_padding_machines); 89 | circpad_register_padding_machine(gen_client, origin_padding_machines); 90 | ``` 91 | Notice the generated states and their transitions (next_state). We see that no 92 | state transitions to state 3 except for state 3 itself. In other words, state 3 93 | is completely useless. Oh well, such is the life of a generated machine. 94 | 95 | ```c 96 | circpad_machine_spec_t *gen_relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 97 | gen_relay->name = "evolved"; 98 | gen_relay->machine_index = 0; 99 | gen_relay->target_hopnum = 1; 100 | gen_relay->allowed_padding_count = 2000; 101 | gen_relay->max_padding_percent = 50;circpad_machine_states_init(gen_relay, 6); 102 | gen_relay->states[0].length_dist.type = CIRCPAD_DIST_WEIBULL; 103 | gen_relay->states[0].length_dist.param1 = 1.1119908099375175; 104 | gen_relay->states[0].length_dist.param2 = 9.295631276879977; 105 | gen_relay->states[0].start_length = 9; 106 | gen_relay->states[0].max_length = 166; 107 | gen_relay->states[0].iat_dist.type = CIRCPAD_DIST_NONE; 108 | gen_relay->states[0].iat_dist.param1 = 5.140798889226186; 109 | gen_relay->states[0].iat_dist.param2 = 3.7189363424246693; 110 | gen_relay->states[0].start_length = 8; 111 | gen_relay->states[0].dist_max_sample_usec = 19688; 112 | gen_relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1; 113 | gen_relay->states[1].length_dist.type = CIRCPAD_DIST_UNIFORM; 114 | gen_relay->states[1].length_dist.param1 = 7.4552261639344355; 115 | gen_relay->states[1].length_dist.param2 = 6.5836447477507445; 116 | gen_relay->states[1].start_length = 8; 117 | gen_relay->states[1].max_length = 567; 118 | gen_relay->states[1].iat_dist.type = CIRCPAD_DIST_WEIBULL; 119 | gen_relay->states[1].iat_dist.param1 = 5.028757716771455; 120 | gen_relay->states[1].iat_dist.param2 = 3.6175408250793497; 121 | gen_relay->states[1].start_length = 2; 122 | gen_relay->states[1].dist_max_sample_usec = 63563; 123 | gen_relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 0; 124 | gen_relay->states[2].length_dist.type = CIRCPAD_DIST_WEIBULL; 125 | gen_relay->states[2].length_dist.param1 = 5.622586262962072; 126 | gen_relay->states[2].length_dist.param2 = 0.30230478680857154; 127 | gen_relay->states[2].start_length = 1; 128 | gen_relay->states[2].max_length = 391; 129 | gen_relay->states[2].iat_dist.type = CIRCPAD_DIST_GEOMETRIC; 130 | gen_relay->states[2].iat_dist.param1 = 9.494124071150765; 131 | gen_relay->states[2].iat_dist.param2 = 4.857852071000062; 132 | gen_relay->states[2].start_length = 5; 133 | gen_relay->states[2].dist_max_sample_usec = 68729; 134 | gen_relay->states[3].length_dist.type = CIRCPAD_DIST_GEOMETRIC; 135 | gen_relay->states[3].length_dist.param1 = 0.7518053414585135; 136 | gen_relay->states[3].length_dist.param2 = 2.2110771083054215; 137 | gen_relay->states[3].start_length = 1; 138 | gen_relay->states[3].max_length = 141; 139 | gen_relay->states[3].iat_dist.type = CIRCPAD_DIST_GEOMETRIC; 140 | gen_relay->states[3].iat_dist.param1 = 3.7855567949957916; 141 | gen_relay->states[3].iat_dist.param2 = 5.158070632109185; 142 | gen_relay->states[3].dist_max_sample_usec = 77068; 143 | gen_relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 144 | gen_relay->states[4].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 145 | gen_relay->states[4].length_dist.param1 = 7.327935236568187; 146 | gen_relay->states[4].length_dist.param2 = 4.431830291905961; 147 | gen_relay->states[4].start_length = 5; 148 | gen_relay->states[4].max_length = 105; 149 | gen_relay->states[4].iat_dist.type = CIRCPAD_DIST_UNIFORM; 150 | gen_relay->states[4].iat_dist.param1 = 5.256975990732162; 151 | gen_relay->states[4].iat_dist.param2 = 2.2653274630000197; 152 | gen_relay->states[4].dist_max_sample_usec = 35592; 153 | gen_relay->states[4].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2; 154 | gen_relay->machine_num = smartlist_len(relay_padding_machines); 155 | circpad_register_padding_machine(gen_relay, relay_padding_machines); 156 | ``` 157 | 158 | The relay machine is even worse: states 0 and 1 only have transitions to each 159 | other, so states 2-4 are useless. 160 | 161 | ## Evaluation 162 | Let's see if the above machine does something. Using the circpad simulator we 163 | simulate the goodenough February dataset for the safest security level. We then 164 | pick a random trace and verify that the machine is producing padding: 165 | 166 | ``` 167 | $ cat monitored/1-0.trace | grep circpad_cell_event_padding | wc -l 168 | 241 169 | ``` 170 | 171 | The trace contained 241 padding events, great, it does something! 172 | 173 | Let's also see how effective the machine is as a defense against the Deep 174 | Fingerprinting attack by using `/evaluation/once.py` from this repo. 175 | 176 | Results with the machine: 177 | ``` 178 | threshold 0.0, recall 0.9, precision 0.93, F1 0.91, accuracy 0.92 [tp 898, fpp 17, fnp 53, tn 947, fn 85] 179 | threshold 0.11, recall 0.9, precision 0.93, F1 0.91, accuracy 0.92 [tp 898, fpp 17, fnp 53, tn 947, fn 85] 180 | threshold 0.35, recall 0.9, precision 0.93, F1 0.91, accuracy 0.92 [tp 898, fpp 16, fnp 53, tn 947, fn 86] 181 | threshold 0.53, recall 0.89, precision 0.94, F1 0.91, accuracy 0.92 [tp 887, fpp 12, fnp 45, tn 955, fn 101] 182 | threshold 0.66, recall 0.88, precision 0.96, F1 0.91, accuracy 0.92 [tp 877, fpp 7, fnp 34, tn 966, fn 116] 183 | threshold 0.75, recall 0.86, precision 0.97, F1 0.91, accuracy 0.92 [tp 863, fpp 5, fnp 24, tn 976, fn 132] 184 | threshold 0.82, recall 0.85, precision 0.97, F1 0.91, accuracy 0.92 [tp 854, fpp 4, fnp 20, tn 980, fn 142] 185 | threshold 0.87, recall 0.85, precision 0.98, F1 0.91, accuracy 0.92 [tp 849, fpp 3, fnp 18, tn 982, fn 148] 186 | threshold 0.91, recall 0.84, precision 0.98, F1 0.91, accuracy 0.92 [tp 845, fpp 3, fnp 15, tn 985, fn 152] 187 | threshold 0.93, recall 0.84, precision 0.98, F1 0.9, accuracy 0.91 [tp 838, fpp 2, fnp 14, tn 986, fn 160] 188 | threshold 0.95, recall 0.83, precision 0.98, F1 0.9, accuracy 0.91 [tp 832, fpp 2, fnp 12, tn 988, fn 166] 189 | threshold 0.96, recall 0.83, precision 0.99, F1 0.9, accuracy 0.91 [tp 826, fpp 1, fnp 10, tn 990, fn 173] 190 | threshold 0.97, recall 0.82, precision 0.99, F1 0.9, accuracy 0.9 [tp 818, fpp 1, fnp 9, tn 991, fn 181] 191 | threshold 0.98, recall 0.81, precision 0.99, F1 0.89, accuracy 0.9 [tp 805, fpp 1, fnp 6, tn 994, fn 194] 192 | threshold 0.99, recall 0.8, precision 0.99, F1 0.88, accuracy 0.9 [tp 796, fpp 0, fnp 6, tn 994, fn 204] 193 | threshold 0.99, recall 0.79, precision 0.99, F1 0.88, accuracy 0.89 [tp 787, fpp 0, fnp 5, tn 995, fn 213] 194 | ``` 195 | 196 | And without the machine: 197 | ``` 198 | threshold 0.0, recall 0.87, precision 0.91, F1 0.89, accuracy 0.91 [tp 868, fpp 34, fnp 51, tn 949, fn 98] 199 | threshold 0.11, recall 0.87, precision 0.91, F1 0.89, accuracy 0.91 [tp 868, fpp 34, fnp 51, tn 949, fn 98] 200 | threshold 0.35, recall 0.87, precision 0.91, F1 0.89, accuracy 0.91 [tp 868, fpp 32, fnp 50, tn 950, fn 100] 201 | threshold 0.53, recall 0.86, precision 0.93, F1 0.9, accuracy 0.91 [tp 863, fpp 22, fnp 43, tn 957, fn 115] 202 | threshold 0.66, recall 0.85, precision 0.94, F1 0.89, accuracy 0.91 [tp 855, fpp 17, fnp 40, tn 960, fn 128] 203 | threshold 0.75, recall 0.85, precision 0.95, F1 0.89, accuracy 0.91 [tp 846, fpp 14, fnp 34, tn 966, fn 140] 204 | threshold 0.82, recall 0.84, precision 0.96, F1 0.9, accuracy 0.91 [tp 841, fpp 10, fnp 28, tn 972, fn 149] 205 | threshold 0.87, recall 0.83, precision 0.97, F1 0.89, accuracy 0.9 [tp 833, fpp 5, fnp 25, tn 975, fn 162] 206 | threshold 0.91, recall 0.82, precision 0.97, F1 0.89, accuracy 0.9 [tp 822, fpp 2, fnp 24, tn 976, fn 176] 207 | threshold 0.93, recall 0.82, precision 0.98, F1 0.89, accuracy 0.9 [tp 816, fpp 2, fnp 18, tn 982, fn 182] 208 | threshold 0.95, recall 0.81, precision 0.98, F1 0.88, accuracy 0.9 [tp 806, fpp 2, fnp 14, tn 986, fn 192] 209 | threshold 0.96, recall 0.8, precision 0.99, F1 0.88, accuracy 0.89 [tp 796, fpp 1, fnp 11, tn 989, fn 203] 210 | threshold 0.97, recall 0.78, precision 0.99, F1 0.87, accuracy 0.89 [tp 781, fpp 0, fnp 9, tn 991, fn 219] 211 | threshold 0.98, recall 0.77, precision 0.99, F1 0.86, accuracy 0.88 [tp 768, fpp 0, fnp 8, tn 992, fn 232] 212 | threshold 0.99, recall 0.76, precision 0.99, F1 0.86, accuracy 0.88 [tp 761, fpp 0, fnp 7, tn 993, fn 239] 213 | threshold 0.99, recall 0.75, precision 0.99, F1 0.86, accuracy 0.87 [tp 752, fpp 0, fnp 5, tn 995, fn 248] 214 | ``` 215 | 216 | As we can see, a random machine is not always a defense at all: in this case it 217 | made the DF attack better, not worse. 218 | 219 | ## Documentation 220 | This is the documentation. The closer we get to an efficient and effective 221 | machine, the more time we plan to spend. -------------------------------------------------------------------------------- /machines/phase2/april-mc: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 4 | client->conditions.reduced_padding_ok = 1; 5 | 6 | client->name = "evolved"; 7 | client->machine_index = 0; 8 | client->target_hopnum = 2; 9 | client->is_origin_side = 1; 10 | client->allowed_padding_count = 1000; 11 | client->max_padding_percent = 50; 12 | circpad_machine_states_init(client, 4); 13 | client->states[0].length_dist.type = CIRCPAD_DIST_NONE; 14 | client->states[0].length_dist.param1 = 8.726140065991641; 15 | client->states[0].length_dist.param2 = 6.239957511844941; 16 | client->states[0].start_length = 9; 17 | client->states[0].max_length = 611; 18 | client->states[0].iat_dist.type = CIRCPAD_DIST_NONE; 19 | client->states[0].iat_dist.param1 = 7.887031292919491; 20 | client->states[0].iat_dist.param2 = 1.948416595933855; 21 | client->states[0].dist_max_sample_usec = 39633; 22 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 23 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 24 | client->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 0; 25 | client->states[1].length_dist.type = CIRCPAD_DIST_NONE; 26 | client->states[1].length_dist.param1 = 6.174064346424731; 27 | client->states[1].length_dist.param2 = 8.169566534772848; 28 | client->states[1].start_length = 8; 29 | client->states[1].max_length = 876; 30 | client->states[1].iat_dist.type = CIRCPAD_DIST_PARETO; 31 | client->states[1].iat_dist.param1 = 6.013166313063582; 32 | client->states[1].iat_dist.param2 = 3.909603771161987; 33 | client->states[1].dist_max_sample_usec = 85193; 34 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 35 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0; 36 | client->states[2].length_dist.type = CIRCPAD_DIST_NONE; 37 | client->states[2].length_dist.param1 = 1.6026451213041193; 38 | client->states[2].length_dist.param2 = 2.932035147480483; 39 | client->states[2].start_length = 1; 40 | client->states[2].max_length = 413; 41 | client->states[2].iat_dist.type = CIRCPAD_DIST_NONE; 42 | client->states[2].iat_dist.param1 = 4.780004981695894; 43 | client->states[2].iat_dist.param2 = 0.5839238235898347; 44 | client->states[2].dist_max_sample_usec = 92193; 45 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 46 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 47 | client->states[3].length_dist.type = CIRCPAD_DIST_PARETO; 48 | client->states[3].length_dist.param1 = 4.776842508009852; 49 | client->states[3].length_dist.param2 = 4.807709366988267; 50 | client->states[3].start_length = 3; 51 | client->states[3].max_length = 494; 52 | client->states[3].iat_dist.type = CIRCPAD_DIST_PARETO; 53 | client->states[3].iat_dist.param1 = 3.3391870088596; 54 | client->states[3].iat_dist.param2 = 7.179045336148708; 55 | client->states[3].dist_max_sample_usec = 9445; 56 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 57 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 58 | 59 | client->machine_num = smartlist_len(origin_padding_machines); 60 | circpad_register_padding_machine(client, origin_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/april-mr: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | 3 | relay->name = "evolved_relay"; 4 | relay->machine_index = 0; 5 | relay->target_hopnum = 2; 6 | relay->is_origin_side = 0; 7 | relay->allowed_padding_count = 1000; 8 | relay->max_padding_percent = 50; 9 | circpad_machine_states_init(relay, 4); 10 | relay->states[0].length_dist.type = CIRCPAD_DIST_GEOMETRIC; 11 | relay->states[0].length_dist.param1 = 9.619691383629117; 12 | relay->states[0].length_dist.param2 = 0.9505104524626451; 13 | relay->states[0].start_length = 9; 14 | relay->states[0].max_length = 799; 15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_PARETO; 16 | relay->states[0].iat_dist.param1 = 5.460653840184872; 17 | relay->states[0].iat_dist.param2 = 7.080387541173288; 18 | relay->states[0].dist_max_sample_usec = 94722; 19 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 20 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 21 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2; 22 | relay->states[1].length_dist.type = CIRCPAD_DIST_PARETO; 23 | relay->states[1].length_dist.param1 = 6.620754495941119; 24 | relay->states[1].length_dist.param2 = 0.01028407677243659; 25 | relay->states[1].start_length = 4; 26 | relay->states[1].max_length = 326; 27 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 28 | relay->states[1].iat_dist.param1 = 1.2767765551835941; 29 | relay->states[1].iat_dist.param2 = 0.11492671368700358; 30 | relay->states[1].dist_max_sample_usec = 31443; 31 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 32 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC; 33 | relay->states[2].length_dist.param1 = 4.11964473793041; 34 | relay->states[2].length_dist.param2 = 2.7250362139341764; 35 | relay->states[2].start_length = 5; 36 | relay->states[2].max_length = 693; 37 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 38 | relay->states[2].iat_dist.param1 = 5.232180204916029; 39 | relay->states[2].iat_dist.param2 = 5.469677647300559; 40 | relay->states[2].dist_max_sample_usec = 94733; 41 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 42 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3; 43 | relay->states[2].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2; 44 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 45 | relay->states[3].length_dist.param1 = 1.6167675237934875; 46 | relay->states[3].length_dist.param2 = 6.128003159320049; 47 | relay->states[3].start_length = 5; 48 | relay->states[3].max_length = 383; 49 | relay->states[3].iat_dist.type = CIRCPAD_DIST_UNIFORM; 50 | relay->states[3].iat_dist.param1 = 4.270468437086448; 51 | relay->states[3].iat_dist.param2 = 7.926284402139126; 52 | relay->states[3].dist_max_sample_usec = 55878; 53 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 54 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 55 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 56 | relay->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2; 57 | 58 | relay->machine_num = smartlist_len(relay_padding_machines); 59 | circpad_register_padding_machine(relay, relay_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/april-nopadding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/april-nopadding.png -------------------------------------------------------------------------------- /machines/phase2/april.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/april.png -------------------------------------------------------------------------------- /machines/phase2/february-mc: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 4 | client->conditions.reduced_padding_ok = 1; 5 | 6 | client->name = "evolved"; 7 | client->machine_index = 0; 8 | client->target_hopnum = 1; 9 | client->is_origin_side = 1; 10 | client->allowed_padding_count = 2000; 11 | client->max_padding_percent = 80; 12 | circpad_machine_states_init(client, 4); 13 | client->states[0].length_dist.type = CIRCPAD_DIST_PARETO; 14 | client->states[0].length_dist.param1 = 4.936848093904933; 15 | client->states[0].length_dist.param2 = 3.7363302458109127; 16 | client->states[0].start_length = 4; 17 | client->states[0].max_length = 465; 18 | client->states[0].iat_dist.type = CIRCPAD_DIST_PARETO; 19 | client->states[0].iat_dist.param1 = 8.455977480863071; 20 | client->states[0].iat_dist.param2 = 8.48589377927243; 21 | client->states[0].dist_max_sample_usec = 2405; 22 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 23 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 24 | client->states[1].length_dist.type = CIRCPAD_DIST_LOGISTIC; 25 | client->states[1].length_dist.param1 = 2.3729679932211143; 26 | client->states[1].length_dist.param2 = 3.443389414939797; 27 | client->states[1].start_length = 1; 28 | client->states[1].max_length = 564; 29 | client->states[1].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 30 | client->states[1].iat_dist.param1 = 7.268741849723991; 31 | client->states[1].iat_dist.param2 = 9.17862593564404; 32 | client->states[1].dist_max_sample_usec = 22527; 33 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 34 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3; 35 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1; 36 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0; 37 | client->states[1].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 0; 38 | client->states[2].length_dist.type = CIRCPAD_DIST_UNIFORM; 39 | client->states[2].length_dist.param1 = 8.211004044413276; 40 | client->states[2].length_dist.param2 = 1.9422749133401196; 41 | client->states[2].start_length = 3; 42 | client->states[2].max_length = 420; 43 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO; 44 | client->states[2].iat_dist.param1 = 2.572575044424351; 45 | client->states[2].iat_dist.param2 = 1.8615197791400429; 46 | client->states[2].dist_max_sample_usec = 76185; 47 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 48 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 49 | client->states[3].length_dist.type = CIRCPAD_DIST_GEOMETRIC; 50 | client->states[3].length_dist.param1 = 4.712947245478112; 51 | client->states[3].length_dist.param2 = 7.192401484606177; 52 | client->states[3].start_length = 5; 53 | client->states[3].max_length = 383; 54 | client->states[3].iat_dist.type = CIRCPAD_DIST_GEOMETRIC; 55 | client->states[3].iat_dist.param1 = 7.107152046757118; 56 | client->states[3].iat_dist.param2 = 6.279751058010154; 57 | client->states[3].dist_max_sample_usec = 20836; 58 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 59 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1; 60 | client->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1; 61 | 62 | client->machine_num = smartlist_len(origin_padding_machines); 63 | circpad_register_padding_machine(client, origin_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/february-mr: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | 3 | relay->name = "evolved"; 4 | relay->machine_index = 0; 5 | relay->target_hopnum = 1; 6 | relay->is_origin_side = 0; 7 | relay->allowed_padding_count = 4000; 8 | relay->max_padding_percent = 80; 9 | circpad_machine_states_init(relay, 4); 10 | relay->states[0].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 11 | relay->states[0].length_dist.param1 = 9.710256645599623; 12 | relay->states[0].length_dist.param2 = 9.988837974059598; 13 | relay->states[0].start_length = 1; 14 | relay->states[0].max_length = 881; 15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_UNIFORM; 16 | relay->states[0].iat_dist.param1 = 6.232538902597691; 17 | relay->states[0].iat_dist.param2 = 8.443597518961244; 18 | relay->states[0].dist_max_sample_usec = 55847; 19 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1; 20 | relay->states[1].length_dist.type = CIRCPAD_DIST_NONE; 21 | relay->states[1].length_dist.param1 = 5.154223094887426; 22 | relay->states[1].length_dist.param2 = 0.7400120457054027; 23 | relay->states[1].start_length = 4; 24 | relay->states[1].max_length = 149; 25 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 26 | relay->states[1].iat_dist.param1 = 5.303006850136577; 27 | relay->states[1].iat_dist.param2 = 0.6077197613396013; 28 | relay->states[1].dist_max_sample_usec = 41337; 29 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 0; 30 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3; 31 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC; 32 | relay->states[2].length_dist.param1 = 7.677154861253354; 33 | relay->states[2].length_dist.param2 = 8.28859930213646; 34 | relay->states[2].start_length = 7; 35 | relay->states[2].max_length = 912; 36 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 37 | relay->states[2].iat_dist.param1 = 1.6456878789999463; 38 | relay->states[2].iat_dist.param2 = 0.6419054414650316; 39 | relay->states[2].dist_max_sample_usec = 89370; 40 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 41 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0; 42 | relay->states[2].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 0; 43 | relay->states[3].length_dist.type = CIRCPAD_DIST_UNIFORM; 44 | relay->states[3].length_dist.param1 = 8.909655436873711; 45 | relay->states[3].length_dist.param2 = 1.2870903034258951; 46 | relay->states[3].start_length = 7; 47 | relay->states[3].max_length = 720; 48 | relay->states[3].iat_dist.type = CIRCPAD_DIST_NONE; 49 | relay->states[3].iat_dist.param1 = 6.15454437455432; 50 | relay->states[3].iat_dist.param2 = 2.427321350813574; 51 | relay->states[3].dist_max_sample_usec = 94319; 52 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1; 53 | 54 | relay->machine_num = smartlist_len(relay_padding_machines); 55 | circpad_register_padding_machine(relay, relay_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/february-nopadding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/february-nopadding.png -------------------------------------------------------------------------------- /machines/phase2/february.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/february.png -------------------------------------------------------------------------------- /machines/phase2/june-mc: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 4 | client->conditions.reduced_padding_ok = 1; 5 | 6 | client->name = "evolved"; 7 | client->machine_index = 0; 8 | client->target_hopnum = 2; 9 | client->is_origin_side = 1; 10 | client->allowed_padding_count = 1000; 11 | client->max_padding_percent = 50; 12 | circpad_machine_states_init(client, 4); 13 | client->states[0].length_dist.type = CIRCPAD_DIST_LOGISTIC; 14 | client->states[0].length_dist.param1 = 1.3092918377235663; 15 | client->states[0].length_dist.param2 = 4.348612869294878; 16 | client->states[0].start_length = 4; 17 | client->states[0].max_length = 247; 18 | client->states[0].iat_dist.type = CIRCPAD_DIST_PARETO; 19 | client->states[0].iat_dist.param1 = 1.8536172530364303; 20 | client->states[0].iat_dist.param2 = 6.538955098232947; 21 | client->states[0].dist_max_sample_usec = 55368; 22 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 23 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 24 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 25 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 26 | client->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3; 27 | client->states[1].length_dist.type = CIRCPAD_DIST_WEIBULL; 28 | client->states[1].length_dist.param1 = 2.502127504187194; 29 | client->states[1].length_dist.param2 = 3.264058654171975; 30 | client->states[1].start_length = 5; 31 | client->states[1].max_length = 480; 32 | client->states[1].iat_dist.type = CIRCPAD_DIST_GEOMETRIC; 33 | client->states[1].iat_dist.param1 = 3.0612997591093203; 34 | client->states[1].iat_dist.param2 = 1.1631101677415767; 35 | client->states[1].dist_max_sample_usec = 50578; 36 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 37 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 38 | client->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC; 39 | client->states[2].length_dist.param1 = 5.312431698514274; 40 | client->states[2].length_dist.param2 = 8.575598651430298; 41 | client->states[2].start_length = 1; 42 | client->states[2].max_length = 196; 43 | client->states[2].iat_dist.type = CIRCPAD_DIST_UNIFORM; 44 | client->states[2].iat_dist.param1 = 3.743025234693156; 45 | client->states[2].iat_dist.param2 = 4.230923837488635; 46 | client->states[2].dist_max_sample_usec = 24151; 47 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 0; 48 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3; 49 | client->states[2].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3; 50 | client->states[3].length_dist.type = CIRCPAD_DIST_NONE; 51 | client->states[3].length_dist.param1 = 9.837404566063828; 52 | client->states[3].length_dist.param2 = 1.8665675598256148; 53 | client->states[3].start_length = 2; 54 | client->states[3].max_length = 922; 55 | client->states[3].iat_dist.type = CIRCPAD_DIST_GEOMETRIC; 56 | client->states[3].iat_dist.param1 = 3.832715364459043; 57 | client->states[3].iat_dist.param2 = 9.307168953452074; 58 | client->states[3].dist_max_sample_usec = 20206; 59 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 60 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 61 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 62 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0; 63 | client->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 0; 64 | 65 | client->machine_num = smartlist_len(origin_padding_machines); 66 | circpad_register_padding_machine(client, origin_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/june-mr: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | 3 | relay->name = "evolved_relay"; 4 | relay->machine_index = 0; 5 | relay->target_hopnum = 2; 6 | relay->is_origin_side = 0; 7 | relay->allowed_padding_count = 1000; 8 | relay->max_padding_percent = 50; 9 | circpad_machine_states_init(relay, 4); 10 | relay->states[0].length_dist.type = CIRCPAD_DIST_UNIFORM; 11 | relay->states[0].length_dist.param1 = 7.541197616744535; 12 | relay->states[0].length_dist.param2 = 4.959358844064398; 13 | relay->states[0].start_length = 8; 14 | relay->states[0].max_length = 321; 15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 16 | relay->states[0].iat_dist.param1 = 6.355669985302768; 17 | relay->states[0].iat_dist.param2 = 4.718433911978695; 18 | relay->states[0].dist_max_sample_usec = 66344; 19 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 20 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 21 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 22 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 23 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3; 24 | relay->states[1].length_dist.type = CIRCPAD_DIST_LOGISTIC; 25 | relay->states[1].length_dist.param1 = 3.4705572928922788; 26 | relay->states[1].length_dist.param2 = 9.861608037705961; 27 | relay->states[1].start_length = 9; 28 | relay->states[1].max_length = 315; 29 | relay->states[1].iat_dist.type = CIRCPAD_DIST_PARETO; 30 | relay->states[1].iat_dist.param1 = 5.519982726461615; 31 | relay->states[1].iat_dist.param2 = 5.978017042155093; 32 | relay->states[1].dist_max_sample_usec = 85748; 33 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 34 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 35 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 36 | relay->states[2].length_dist.type = CIRCPAD_DIST_PARETO; 37 | relay->states[2].length_dist.param1 = 7.676818011827004; 38 | relay->states[2].length_dist.param2 = 2.235768039871122; 39 | relay->states[2].start_length = 10; 40 | relay->states[2].max_length = 202; 41 | relay->states[2].iat_dist.type = CIRCPAD_DIST_UNIFORM; 42 | relay->states[2].iat_dist.param1 = 8.420468407681655; 43 | relay->states[2].iat_dist.param2 = 8.874744130401139; 44 | relay->states[2].dist_max_sample_usec = 49042; 45 | relay->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 46 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 47 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 48 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOGISTIC; 49 | relay->states[3].length_dist.param1 = 6.0176848403977425; 50 | relay->states[3].length_dist.param2 = 8.745793617904283; 51 | relay->states[3].start_length = 2; 52 | relay->states[3].max_length = 158; 53 | relay->states[3].iat_dist.type = CIRCPAD_DIST_NONE; 54 | relay->states[3].iat_dist.param1 = 9.914906787243591; 55 | relay->states[3].iat_dist.param2 = 3.0423641201110243; 56 | relay->states[3].dist_max_sample_usec = 26650; 57 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 58 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 59 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1; 60 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 61 | relay->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2; 62 | 63 | relay->machine_num = smartlist_len(relay_padding_machines); 64 | circpad_register_padding_machine(relay, relay_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/june-nopadding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/june-nopadding.png -------------------------------------------------------------------------------- /machines/phase2/june.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/june.png -------------------------------------------------------------------------------- /machines/phase2/march-mc: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 4 | client->conditions.reduced_padding_ok = 1; 5 | 6 | client->name = "evolved"; 7 | client->machine_index = 0; 8 | client->target_hopnum = 1; 9 | client->is_origin_side = 1; 10 | client->allowed_padding_count = 2000; 11 | client->max_padding_percent = 80; 12 | circpad_machine_states_init(client, 4); 13 | client->states[0].length_dist.type = CIRCPAD_DIST_GEOMETRIC; 14 | client->states[0].length_dist.param1 = 9.29916383223583; 15 | client->states[0].length_dist.param2 = 8.563192209023997; 16 | client->states[0].start_length = 2; 17 | client->states[0].max_length = 844; 18 | client->states[0].iat_dist.type = CIRCPAD_DIST_WEIBULL; 19 | client->states[0].iat_dist.param1 = 8.313382688571522; 20 | client->states[0].iat_dist.param2 = 8.742989045383267; 21 | client->states[0].dist_max_sample_usec = 61818; 22 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 23 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 24 | client->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3; 25 | client->states[1].length_dist.type = CIRCPAD_DIST_PARETO; 26 | client->states[1].length_dist.param1 = 2.8017758499031; 27 | client->states[1].length_dist.param2 = 9.810981459295506; 28 | client->states[1].start_length = 3; 29 | client->states[1].max_length = 470; 30 | client->states[1].iat_dist.type = CIRCPAD_DIST_PARETO; 31 | client->states[1].iat_dist.param1 = 2.7147215785606016; 32 | client->states[1].iat_dist.param2 = 3.043038828426642; 33 | client->states[1].dist_max_sample_usec = 23669; 34 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3; 35 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1; 36 | client->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC; 37 | client->states[2].length_dist.param1 = 5.002464597348427; 38 | client->states[2].length_dist.param2 = 8.828389483663672; 39 | client->states[2].max_length = 287; 40 | client->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 41 | client->states[2].iat_dist.param1 = 0.6190142731130654; 42 | client->states[2].iat_dist.param2 = 8.4487983787332; 43 | client->states[2].dist_max_sample_usec = 12511; 44 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 45 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 46 | client->states[3].length_dist.type = CIRCPAD_DIST_WEIBULL; 47 | client->states[3].length_dist.param1 = 1.115301047557461; 48 | client->states[3].length_dist.param2 = 2.5638990818318996; 49 | client->states[3].start_length = 8; 50 | client->states[3].max_length = 847; 51 | client->states[3].iat_dist.type = CIRCPAD_DIST_GEOMETRIC; 52 | client->states[3].iat_dist.param1 = 8.374452627421187; 53 | client->states[3].iat_dist.param2 = 5.928675332086; 54 | client->states[3].dist_max_sample_usec = 16086; 55 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1; 56 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 57 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3; 58 | 59 | client->machine_num = smartlist_len(origin_padding_machines); 60 | circpad_register_padding_machine(client, origin_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/march-mr: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | 3 | relay->name = "evolved"; 4 | relay->machine_index = 0; 5 | relay->target_hopnum = 1; 6 | relay->is_origin_side = 0; 7 | relay->allowed_padding_count = 4000; 8 | relay->max_padding_percent = 80; 9 | circpad_machine_states_init(relay, 4); 10 | relay->states[0].length_dist.type = CIRCPAD_DIST_WEIBULL; 11 | relay->states[0].length_dist.param1 = 7.654991320082258; 12 | relay->states[0].length_dist.param2 = 4.456949700895323; 13 | relay->states[0].start_length = 6; 14 | relay->states[0].max_length = 603; 15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 16 | relay->states[0].iat_dist.param1 = 8.322240372052494; 17 | relay->states[0].iat_dist.param2 = 0.8362284595566694; 18 | relay->states[0].dist_max_sample_usec = 69469; 19 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 20 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2; 21 | relay->states[1].length_dist.type = CIRCPAD_DIST_PARETO; 22 | relay->states[1].length_dist.param1 = 9.196497877425955; 23 | relay->states[1].length_dist.param2 = 4.086797911814807; 24 | relay->states[1].start_length = 3; 25 | relay->states[1].max_length = 850; 26 | relay->states[1].iat_dist.type = CIRCPAD_DIST_NONE; 27 | relay->states[1].iat_dist.param1 = 1.5091988102682985; 28 | relay->states[1].iat_dist.param2 = 3.7171557072557784; 29 | relay->states[1].dist_max_sample_usec = 93752; 30 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 31 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 0; 32 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 33 | relay->states[2].length_dist.type = CIRCPAD_DIST_NONE; 34 | relay->states[2].length_dist.param1 = 5.558986316465104; 35 | relay->states[2].length_dist.param2 = 7.198858580257309; 36 | relay->states[2].start_length = 2; 37 | relay->states[2].max_length = 690; 38 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 39 | relay->states[2].iat_dist.param1 = 5.765567627587263; 40 | relay->states[2].iat_dist.param2 = 7.6815241245846755; 41 | relay->states[2].dist_max_sample_usec = 17978; 42 | relay->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 43 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 44 | relay->states[3].length_dist.type = CIRCPAD_DIST_UNIFORM; 45 | relay->states[3].length_dist.param1 = 2.8821373596173516; 46 | relay->states[3].length_dist.param2 = 0.6470170172573608; 47 | relay->states[3].start_length = 1; 48 | relay->states[3].max_length = 146; 49 | relay->states[3].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 50 | relay->states[3].iat_dist.param1 = 1.5273906057439013; 51 | relay->states[3].iat_dist.param2 = 3.326047013501766; 52 | relay->states[3].dist_max_sample_usec = 19309; 53 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0; 54 | 55 | relay->machine_num = smartlist_len(relay_padding_machines); 56 | circpad_register_padding_machine(relay, relay_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/march-nopadding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/march-nopadding.png -------------------------------------------------------------------------------- /machines/phase2/march.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/march.png -------------------------------------------------------------------------------- /machines/phase2/may-mc: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 4 | client->conditions.reduced_padding_ok = 1; 5 | 6 | client->name = "evolved"; 7 | client->machine_index = 0; 8 | client->target_hopnum = 2; 9 | client->is_origin_side = 1; 10 | client->allowed_padding_count = 1000; 11 | client->max_padding_percent = 50; 12 | circpad_machine_states_init(client, 4); 13 | client->states[0].length_dist.type = CIRCPAD_DIST_LOGISTIC; 14 | client->states[0].length_dist.param1 = 0.9054657170837088; 15 | client->states[0].length_dist.param2 = 8.395721233310635; 16 | client->states[0].start_length = 5; 17 | client->states[0].max_length = 434; 18 | client->states[0].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 19 | client->states[0].iat_dist.param1 = 2.697620362868335; 20 | client->states[0].iat_dist.param2 = 4.536505160992885; 21 | client->states[0].dist_max_sample_usec = 31237; 22 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 23 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3; 24 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 25 | client->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2; 26 | client->states[1].length_dist.type = CIRCPAD_DIST_PARETO; 27 | client->states[1].length_dist.param1 = 4.674003575761336; 28 | client->states[1].length_dist.param2 = 5.9049600823910176; 29 | client->states[1].start_length = 5; 30 | client->states[1].max_length = 594; 31 | client->states[1].iat_dist.type = CIRCPAD_DIST_GEOMETRIC; 32 | client->states[1].iat_dist.param1 = 5.850427721566305; 33 | client->states[1].iat_dist.param2 = 5.4776798413515335; 34 | client->states[1].dist_max_sample_usec = 93024; 35 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 36 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3; 37 | client->states[2].length_dist.type = CIRCPAD_DIST_NONE; 38 | client->states[2].length_dist.param1 = 1.6026451213041193; 39 | client->states[2].length_dist.param2 = 2.932035147480483; 40 | client->states[2].start_length = 1; 41 | client->states[2].max_length = 425; 42 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO; 43 | client->states[2].iat_dist.param1 = 0.6272619534018797; 44 | client->states[2].iat_dist.param2 = 0.5031279263654487; 45 | client->states[2].dist_max_sample_usec = 88570; 46 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 47 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 48 | client->states[3].length_dist.type = CIRCPAD_DIST_PARETO; 49 | client->states[3].length_dist.param1 = 4.776842508009852; 50 | client->states[3].length_dist.param2 = 4.807709366988267; 51 | client->states[3].start_length = 3; 52 | client->states[3].max_length = 494; 53 | client->states[3].iat_dist.type = CIRCPAD_DIST_PARETO; 54 | client->states[3].iat_dist.param1 = 3.3391870088596; 55 | client->states[3].iat_dist.param2 = 7.179045336148708; 56 | client->states[3].dist_max_sample_usec = 9445; 57 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 58 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 59 | 60 | client->machine_num = smartlist_len(origin_padding_machines); 61 | circpad_register_padding_machine(client, origin_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/may-mr: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | 3 | relay->name = "evolved_relay"; 4 | relay->machine_index = 0; 5 | relay->target_hopnum = 2; 6 | relay->is_origin_side = 0; 7 | relay->allowed_padding_count = 1000; 8 | relay->max_padding_percent = 50; 9 | circpad_machine_states_init(relay, 4); 10 | relay->states[0].length_dist.type = CIRCPAD_DIST_NONE; 11 | relay->states[0].length_dist.param1 = 5.378318948177472; 12 | relay->states[0].length_dist.param2 = 2.151729097089823; 13 | relay->states[0].start_length = 10; 14 | relay->states[0].max_length = 583; 15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_PARETO; 16 | relay->states[0].iat_dist.param1 = 5.460653840184872; 17 | relay->states[0].iat_dist.param2 = 7.080387541173288; 18 | relay->states[0].dist_max_sample_usec = 94722; 19 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 20 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2; 21 | relay->states[1].length_dist.type = CIRCPAD_DIST_WEIBULL; 22 | relay->states[1].length_dist.param1 = 4.131784111285114; 23 | relay->states[1].length_dist.param2 = 5.676344743391601; 24 | relay->states[1].start_length = 4; 25 | relay->states[1].max_length = 185; 26 | relay->states[1].iat_dist.type = CIRCPAD_DIST_PARETO; 27 | relay->states[1].iat_dist.param1 = 3.0151010507095166; 28 | relay->states[1].iat_dist.param2 = 9.877753111650684; 29 | relay->states[1].dist_max_sample_usec = 15782; 30 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 31 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 32 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC; 33 | relay->states[2].length_dist.param1 = 4.11964473793041; 34 | relay->states[2].length_dist.param2 = 2.7250362139341764; 35 | relay->states[2].start_length = 5; 36 | relay->states[2].max_length = 375; 37 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 38 | relay->states[2].iat_dist.param1 = 9.596309409594099; 39 | relay->states[2].iat_dist.param2 = 0.4682935207787442; 40 | relay->states[2].dist_max_sample_usec = 94733; 41 | relay->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 42 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3; 43 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 44 | relay->states[3].length_dist.param1 = 1.6167675237934875; 45 | relay->states[3].length_dist.param2 = 6.128003159320049; 46 | relay->states[3].start_length = 5; 47 | relay->states[3].max_length = 383; 48 | relay->states[3].iat_dist.type = CIRCPAD_DIST_UNIFORM; 49 | relay->states[3].iat_dist.param1 = 4.270468437086448; 50 | relay->states[3].iat_dist.param2 = 7.926284402139126; 51 | relay->states[3].dist_max_sample_usec = 55878; 52 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 53 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 54 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 55 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 56 | relay->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3; 57 | 58 | relay->machine_num = smartlist_len(relay_padding_machines); 59 | circpad_register_padding_machine(relay, relay_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/may-nopadding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/may-nopadding.png -------------------------------------------------------------------------------- /machines/phase2/may.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/may.png -------------------------------------------------------------------------------- /machines/phase2/strawman-mc: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 4 | client->conditions.reduced_padding_ok = 1; 5 | 6 | client->name = "evolved"; // because lazy 7 | client->machine_index = 0; 8 | client->target_hopnum = 2; 9 | client->is_origin_side = 1; 10 | client->allowed_padding_count = 3000; 11 | client->max_padding_percent = 90; 12 | circpad_machine_states_init(client, 3); 13 | 14 | // state 0 just waits until CIRCPAD_EVENT_PADDING_RECV 15 | client->states[0].length_dist.type = CIRCPAD_DIST_NONE; // infinite length 16 | client->states[0].iat_dist.type = CIRCPAD_DIST_NONE; // infinite length 17 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 18 | 19 | // state 1 waits for receiving something 20 | client->states[1].length_dist.type = CIRCPAD_DIST_NONE; // infinite length 21 | client->states[1].iat_dist.type = CIRCPAD_DIST_NONE; // infinite delay 22 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 23 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 24 | 25 | // state 2 sends its sampled number of padding cells quickly, unless we send some non-padding 26 | client->states[2].length_dist.type = CIRCPAD_DIST_UNIFORM; 27 | client->states[2].length_dist.param1 = 5; 28 | client->states[2].length_dist.param2 = 15; 29 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO; 30 | client->states[2].iat_dist.param1 = 3.3; 31 | client->states[2].iat_dist.param2 = 7.1; 32 | client->states[2].dist_added_shift_usec = 0; 33 | client->states[2].dist_max_sample_usec = 9445; 34 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1; 35 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 36 | 37 | client->machine_num = smartlist_len(origin_padding_machines); 38 | circpad_register_padding_machine(client, origin_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/strawman-mr: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | 3 | relay->name = "evolved_relay"; // because lazy 4 | relay->machine_index = 0; 5 | relay->target_hopnum = 2; 6 | relay->is_origin_side = 0; 7 | relay->allowed_padding_count = 100; 8 | relay->max_padding_percent = 95; 9 | circpad_machine_states_init(relay, 2); 10 | 11 | relay->states[0].length_dist.type = CIRCPAD_DIST_GEOMETRIC; 12 | relay->states[0].length_dist.param1 = 9.6; 13 | relay->states[0].length_dist.param2 = 0.9; 14 | relay->states[0].start_length = 10; 15 | relay->states[0].max_length = 1000; 16 | relay->states[0].iat_dist.type = CIRCPAD_DIST_PARETO; 17 | relay->states[0].iat_dist.param1 = 5.5; 18 | relay->states[0].iat_dist.param2 = 7.1; 19 | relay->states[0].start_length = 10; 20 | relay->states[0].dist_max_sample_usec = 94722; 21 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 22 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 23 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1; 24 | 25 | relay->states[1].length_dist.type = CIRCPAD_DIST_LOGISTIC; 26 | relay->states[1].length_dist.param1 = 4.1; 27 | relay->states[1].length_dist.param2 = 2.7; 28 | relay->states[1].start_length = 20; 29 | relay->states[1].max_length = 693; 30 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 31 | relay->states[1].iat_dist.param1 = 5.2; 32 | relay->states[1].iat_dist.param2 = 5.5; 33 | relay->states[1].dist_added_shift_usec = 0; 34 | relay->states[1].dist_max_sample_usec = 10000; 35 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1; 36 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0; 37 | relay->states[1].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1; 38 | 39 | relay->machine_num = smartlist_len(relay_padding_machines); 40 | circpad_register_padding_machine(relay, relay_padding_machines); -------------------------------------------------------------------------------- /machines/phase2/strawman-nopadding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/strawman-nopadding.png -------------------------------------------------------------------------------- /machines/phase2/strawman.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/strawman.png -------------------------------------------------------------------------------- /machines/phase3/README.md: -------------------------------------------------------------------------------- 1 | # Phase 3 2 | Here are the final two machines: Spring and Interspace. Spring is based on 3 | manually cleaning up the April machine from phase 2. Interspace builds upon 4 | Spring and is manually tuned. See the full pre-print (to appear) for many more 5 | details. Below you find the basic evaluation in the same format as for phase 2. 6 | 7 | ## Spring 8 | 9 | Computed overhead (efficiency): 10 | ``` 11 | in total for 20000 traces: 12 | - 95773038 cells 13 | - 210% average total bandwidth 14 | - 18494898 sent cells (19%) 15 | - 4534842 nonpadding 16 | - 13960056 padding 17 | - 408% average sent bandwidth 18 | - 77278140 recv cells (81%) 19 | - 40967808 nonpadding 20 | - 36310332 padding 21 | - 189% average recv bandwidth 22 | ``` 23 | 24 | Evaluated effectiveness: 25 | ``` 26 | made 2000 predictions with 2000 labels 27 | threshold 0.0, recall 0.51, precision 0.46, F1 0.48, accuracy 0.58 [tp 508, fpp 240, fnp 349, tn 651, fn 252] 28 | threshold 0.11, recall 0.51, precision 0.46, F1 0.48, accuracy 0.58 [tp 508, fpp 240, fnp 349, tn 651, fn 252] 29 | threshold 0.35, recall 0.5, precision 0.49, F1 0.49, accuracy 0.59 [tp 500, fpp 205, fnp 316, tn 684, fn 295] 30 | threshold 0.53, recall 0.45, precision 0.59, F1 0.51, accuracy 0.62 [tp 445, fpp 116, fnp 199, tn 801, fn 439] 31 | threshold 0.66, recall 0.39, precision 0.65, F1 0.49, accuracy 0.63 [tp 394, fpp 73, fnp 142, tn 858, fn 533] 32 | threshold 0.75, recall 0.35, precision 0.71, F1 0.47, accuracy 0.63 [tp 351, fpp 45, fnp 98, tn 902, fn 604] 33 | threshold 0.82, recall 0.32, precision 0.76, F1 0.45, accuracy 0.63 [tp 321, fpp 31, fnp 68, tn 932, fn 648] 34 | threshold 0.87, recall 0.29, precision 0.82, F1 0.43, accuracy 0.62 [tp 289, fpp 20, fnp 43, tn 957, fn 691] 35 | threshold 0.91, recall 0.26, precision 0.86, F1 0.4, accuracy 0.61 [tp 257, fpp 14, fnp 28, tn 972, fn 729] 36 | threshold 0.93, recall 0.23, precision 0.88, F1 0.36, accuracy 0.6 [tp 228, fpp 12, fnp 19, tn 981, fn 760] 37 | threshold 0.95, recall 0.19, precision 0.92, F1 0.32, accuracy 0.59 [tp 194, fpp 6, fnp 11, tn 989, fn 800] 38 | threshold 0.96, recall 0.17, precision 0.93, F1 0.29, accuracy 0.58 [tp 173, fpp 6, fnp 8, tn 992, fn 821] 39 | threshold 0.97, recall 0.15, precision 0.94, F1 0.26, accuracy 0.57 [tp 151, fpp 2, fnp 7, tn 993, fn 847] 40 | threshold 0.98, recall 0.13, precision 0.95, F1 0.23, accuracy 0.56 [tp 131, fpp 1, fnp 6, tn 994, fn 868] 41 | threshold 0.99, recall 0.11, precision 0.96, F1 0.2, accuracy 0.56 [tp 114, fpp 1, fnp 4, tn 996, fn 885] 42 | threshold 0.99, recall 0.098, precision 0.97, F1 0.18, accuracy 0.55 [tp 98, fpp 0, fnp 3, tn 997, fn 902] 43 | ``` 44 | 45 | Visualized (black/white = received/sent nonpadding, red/green = received/sent padding): 46 | ![spring-noppading](spring-nopadding.png) 47 | ![spring](spring.png) 48 | 49 | ## Interspace 50 | 51 | Computed overhead (efficiency): 52 | ``` 53 | in total for 20000 traces: 54 | - 96245339 cells 55 | - 229% average total bandwidth 56 | - 25306497 sent cells (26%) 57 | - 4240612 nonpadding 58 | - 21065885 padding 59 | - 597% average sent bandwidth 60 | - 70938842 recv cells (74%) 61 | - 37877812 nonpadding 62 | - 33061030 padding 63 | - 187% average recv bandwidth 64 | 65 | ``` 66 | 67 | Evaluated effectiveness: 68 | ``` 69 | made 2000 predictions with 2000 labels 70 | threshold 0.0, recall 0.35, precision 0.4, F1 0.37, accuracy 0.53 [tp 351, fpp 229, fnp 297, tn 703, fn 420] 71 | threshold 0.11, recall 0.35, precision 0.4, F1 0.37, accuracy 0.53 [tp 351, fpp 229, fnp 297, tn 703, fn 420] 72 | threshold 0.35, recall 0.33, precision 0.44, F1 0.38, accuracy 0.54 [tp 331, fpp 180, fnp 242, tn 758, fn 489] 73 | threshold 0.53, recall 0.26, precision 0.58, F1 0.36, accuracy 0.57 [tp 258, fpp 81, fnp 109, tn 891, fn 661] 74 | threshold 0.66, recall 0.21, precision 0.68, F1 0.32, accuracy 0.57 [tp 206, fpp 37, fnp 59, tn 941, fn 757] 75 | threshold 0.75, recall 0.16, precision 0.74, F1 0.26, accuracy 0.56 [tp 159, fpp 19, fnp 36, tn 964, fn 822] 76 | threshold 0.82, recall 0.13, precision 0.82, F1 0.23, accuracy 0.56 [tp 131, fpp 11, fnp 17, tn 983, fn 858] 77 | threshold 0.87, recall 0.1, precision 0.84, F1 0.18, accuracy 0.55 [tp 103, fpp 7, fnp 12, tn 988, fn 890] 78 | threshold 0.91, recall 0.081, precision 0.91, F1 0.15, accuracy 0.54 [tp 81, fpp 3, fnp 5, tn 995, fn 916] 79 | threshold 0.93, recall 0.067, precision 0.97, F1 0.13, accuracy 0.53 [tp 67, fpp 2, fnp 0, tn 1000, fn 931] 80 | threshold 0.95, recall 0.056, precision 0.97, F1 0.11, accuracy 0.53 [tp 56, fpp 2, fnp 0, tn 1000, fn 942] 81 | threshold 0.96, recall 0.046, precision 0.96, F1 0.088, accuracy 0.52 [tp 46, fpp 2, fnp 0, tn 1000, fn 952] 82 | threshold 0.97, recall 0.041, precision 1.0, F1 0.079, accuracy 0.52 [tp 41, fpp 0, fnp 0, tn 1000, fn 959] 83 | threshold 0.98, recall 0.037, precision 1.0, F1 0.071, accuracy 0.52 [tp 37, fpp 0, fnp 0, tn 1000, fn 963] 84 | threshold 0.99, recall 0.032, precision 1.0, F1 0.062, accuracy 0.52 [tp 32, fpp 0, fnp 0, tn 1000, fn 968] 85 | threshold 0.99, recall 0.026, precision 1.0, F1 0.051, accuracy 0.51 [tp 26, fpp 0, fnp 0, tn 1000, fn 974] 86 | 87 | ``` 88 | Visualized (black/white = received/sent nonpadding, red/green = received/sent padding): 89 | ![interspace-noppading](interspace-nopadding.png) 90 | ![interspace](interspace.png) -------------------------------------------------------------------------------- /machines/phase3/interspace-mc.c: -------------------------------------------------------------------------------- 1 | const struct uniform_t my_uniform = { 2 | .base = UNIFORM(my_uniform), 3 | .a = 0.0, 4 | .b = 1.0, 5 | }; 6 | 7 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 8 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 9 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 10 | client->conditions.reduced_padding_ok = 1; 11 | 12 | client->name = "interspace_client"; 13 | client->machine_index = 0; 14 | client->target_hopnum = 2; 15 | client->is_origin_side = 1; 16 | client->allowed_padding_count = 1500; 17 | client->max_padding_percent = 50; 18 | circpad_machine_states_init(client, 3); 19 | 20 | // wait until the relay is active 21 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 22 | 23 | // wait for something to either mask the length of or inject a fake request 24 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 25 | if (dist_sample(&my_uniform.base) < 0.5) { 26 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 27 | } 28 | 29 | client->states[2].length_dist.type = CIRCPAD_DIST_PARETO; 30 | client->states[2].length_dist.param1 = 4.7; 31 | client->states[2].length_dist.param2 = 4.8; 32 | client->states[2].start_length = 1; 33 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO; // tweak for log-logistic? 34 | client->states[2].iat_dist.param1 = 3.3; 35 | client->states[2].iat_dist.param2 = 7.2; 36 | client->states[2].dist_max_sample_usec = 9445; 37 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1; 38 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 39 | if (dist_sample(&my_uniform.base) < 0.5) { 40 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 41 | } 42 | 43 | client->machine_num = smartlist_len(origin_padding_machines); 44 | circpad_register_padding_machine(client, origin_padding_machines); 45 | -------------------------------------------------------------------------------- /machines/phase3/interspace-mr.c: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | 3 | // short define for sampling uniformly random [0,1.0] 4 | const struct uniform_t my_uniform = { 5 | .base = UNIFORM(my_uniform), 6 | .a = 0.0, 7 | .b = 1.0, 8 | }; 9 | #define CIRCPAD_UNI_RAND (dist_sample(&my_uniform.base)) 10 | 11 | // uniformly random select a distribution parameters between [0,10] 12 | #define CIRCPAD_RAND_DIST_PARAM1 (CIRCPAD_UNI_RAND*10) 13 | #define CIRCPAD_RAND_DIST_PARAM2 (CIRCPAD_UNI_RAND*10) 14 | 15 | relay->name = "interspace_relay"; 16 | relay->machine_index = 0; 17 | relay->target_hopnum = 2; 18 | relay->is_origin_side = 0; 19 | relay->allowed_padding_count = 1500; 20 | relay->max_padding_percent = 50; 21 | circpad_machine_states_init(relay, 4); 22 | 23 | if (CIRCPAD_UNI_RAND < 0.5) { 24 | /* 25 | machine has following states: 26 | 0. init: don't waste time early 27 | 1. wait: either extend or fake 28 | 2. extend: obfuscate length of existing bursts 29 | 3. fake: inject fake bursts 30 | */ 31 | 32 | // wait for client to send something, no point in doing stuff too early 33 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 34 | 35 | if (CIRCPAD_UNI_RAND < 0.5) { 36 | // wait: extend real burst 37 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 38 | } else { 39 | // wait: inject a fake burst after a while (FIXME: too long below) 40 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 41 | relay->states[1].iat_dist.param1 = CIRCPAD_UNI_RAND*1000; // alpha, scale and mean 42 | relay->states[1].iat_dist.param2 = CIRCPAD_UNI_RAND*10000; // shape, when > 1 larger reduces dispersion 43 | relay->states[1].dist_max_sample_usec = 100000; 44 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 45 | } 46 | 47 | // extend: add fake padding for real bursts 48 | relay->states[2].length_dist.type = CIRCPAD_DIST_PARETO; 49 | relay->states[2].length_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 50 | relay->states[2].length_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 51 | relay->states[2].start_length = 1; 52 | relay->states[2].iat_dist.type = CIRCPAD_DIST_PARETO; 53 | relay->states[2].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 54 | relay->states[2].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 55 | relay->states[2].dist_max_sample_usec = 10000; 56 | relay->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1; 57 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 58 | 59 | // fake: inject completely fake bursts 60 | relay->states[3].length_dist.type = CIRCPAD_DIST_PARETO; 61 | relay->states[3].length_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 62 | relay->states[3].length_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 63 | relay->states[3].start_length = 4; 64 | relay->states[3].iat_dist.type = CIRCPAD_DIST_PARETO; 65 | relay->states[3].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 66 | relay->states[3].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 67 | relay->states[3].dist_max_sample_usec = 10000; 68 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1; 69 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3; 70 | } else { 71 | // spring-mr 72 | relay->states[0].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 73 | relay->states[0].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 74 | relay->states[0].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 75 | relay->states[0].dist_max_sample_usec = 10000; 76 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 77 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 78 | 79 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 80 | relay->states[1].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 81 | relay->states[1].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 82 | relay->states[1].dist_max_sample_usec = 31443; 83 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 84 | 85 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 86 | relay->states[2].length_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 87 | relay->states[2].length_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 88 | relay->states[2].start_length = 5; 89 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 90 | relay->states[2].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 91 | relay->states[2].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 92 | relay->states[2].dist_max_sample_usec = 100000; 93 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 94 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3; 95 | 96 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 97 | relay->states[3].length_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 98 | relay->states[3].length_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 99 | relay->states[3].start_length = 5; 100 | relay->states[3].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 101 | relay->states[3].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1; 102 | relay->states[3].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2; 103 | relay->states[3].dist_max_sample_usec = 55878; 104 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 105 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 106 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 107 | } 108 | 109 | relay->machine_num = smartlist_len(relay_padding_machines); 110 | circpad_register_padding_machine(relay, relay_padding_machines); 111 | -------------------------------------------------------------------------------- /machines/phase3/interspace-nopadding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase3/interspace-nopadding.png -------------------------------------------------------------------------------- /machines/phase3/interspace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase3/interspace.png -------------------------------------------------------------------------------- /machines/phase3/spring-mc.c: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL; 4 | client->conditions.reduced_padding_ok = 1; 5 | 6 | client->name = "spring_client"; 7 | client->machine_index = 0; 8 | client->target_hopnum = 2; 9 | client->is_origin_side = 1; 10 | client->allowed_padding_count = 1500; 11 | client->max_padding_percent = 50; 12 | circpad_machine_states_init(client, 3); 13 | 14 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 15 | 16 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2; 17 | 18 | client->states[2].length_dist.type = CIRCPAD_DIST_PARETO; 19 | client->states[2].length_dist.param1 = 4.776842508009852; 20 | client->states[2].length_dist.param2 = 4.807709366988267; 21 | client->states[2].start_length = 1; 22 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO; 23 | client->states[2].iat_dist.param1 = 3.3391870088596; 24 | client->states[2].iat_dist.param2 = 7.179045336148708; 25 | client->states[2].dist_max_sample_usec = 9445; 26 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1; 27 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 28 | 29 | client->machine_num = smartlist_len(origin_padding_machines); 30 | circpad_register_padding_machine(client, origin_padding_machines); -------------------------------------------------------------------------------- /machines/phase3/spring-mr.c: -------------------------------------------------------------------------------- 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 2 | 3 | relay->name = "spring_relay"; 4 | relay->machine_index = 0; 5 | relay->target_hopnum = 2; 6 | relay->is_origin_side = 0; 7 | relay->allowed_padding_count = 1500; 8 | relay->max_padding_percent = 50; 9 | circpad_machine_states_init(relay, 4); 10 | relay->states[0].iat_dist.type = CIRCPAD_DIST_PARETO; 11 | relay->states[0].iat_dist.param1 = 5.460653840184872; 12 | relay->states[0].iat_dist.param2 = 7.080387541173288; 13 | relay->states[0].dist_max_sample_usec = 94722; 14 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1; 15 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1; 16 | 17 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 18 | relay->states[1].iat_dist.param1 = 1.2767765551835941; 19 | relay->states[1].iat_dist.param2 = 0.11492671368700358; 20 | relay->states[1].dist_max_sample_usec = 31443; 21 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2; 22 | 23 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC; 24 | relay->states[2].length_dist.param1 = 4.11964473793041; 25 | relay->states[2].length_dist.param2 = 2.7250362139341764; 26 | relay->states[2].start_length = 5; 27 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC; 28 | relay->states[2].iat_dist.param1 = 5.232180204916029; 29 | relay->states[2].iat_dist.param2 = 5.469677647300559; 30 | relay->states[2].dist_max_sample_usec = 94733; 31 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2; 32 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3; 33 | 34 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC; 35 | relay->states[3].length_dist.param1 = 1.6167675237934875; 36 | relay->states[3].length_dist.param2 = 6.128003159320049; 37 | relay->states[3].start_length = 5; 38 | relay->states[3].iat_dist.type = CIRCPAD_DIST_UNIFORM; 39 | relay->states[3].iat_dist.param1 = 4.270468437086448; 40 | relay->states[3].iat_dist.param2 = 7.926284402139126; 41 | relay->states[3].dist_max_sample_usec = 55878; 42 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3; 43 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0; 44 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2; 45 | 46 | relay->machine_num = smartlist_len(relay_padding_machines); 47 | circpad_register_padding_machine(relay, relay_padding_machines); -------------------------------------------------------------------------------- /machines/phase3/spring-nopadding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase3/spring-nopadding.png -------------------------------------------------------------------------------- /machines/phase3/spring.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase3/spring.png -------------------------------------------------------------------------------- /notes/machine-from-scratch.md: -------------------------------------------------------------------------------- 1 | # A Padding Machine from Scratch 2 | 3 | This document describes the process of building a "padding machine" in tor's new 4 | circuit padding framework from scratch. Notes were taken as part of porting 5 | [Adaptive Padding Early 6 | (APE)](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) from basket2 to the 7 | circuit padding framework. The goal is just to document the process and provide 8 | useful pointers along the way, not create a useful machine. 9 | 10 | The quick and dirty plan is to: 11 | 1. clone and compile tor 12 | 2. use newly built tor in TB and at small (non-exit) relay we run 13 | 3. add a bare-bones APE padding machine 14 | 4. run the machine, inspect logs for activity 15 | 5. port APE's state machine without thinking much about parameters 16 | 17 | ## Clone and compile tor 18 | 19 | ```bash 20 | git clone https://git.torproject.org/tor.git 21 | cd tor 22 | git checkout tor-0.4.1.5 23 | ``` 24 | Above we use the tag for tor-0.4.1.5 where the circuit padding framework was 25 | released, feel free to use something newer (avoid HEAD though, can have bugs). 26 | 27 | ```bash 28 | sh autogen.sh 29 | ./configure 30 | make 31 | ``` 32 | When you run `./configure` you'll be told of missing dependencies and packages 33 | to install on debian-based distributions. Important: if you plan to run `tor` on 34 | a relay as part of the real Tor network and your server runs a distribution that 35 | uses systemd, then I'd recommend that you `apt install dpkg dpkg-dev 36 | libevent-dev libssl-dev asciidoc quilt dh-apparmor libseccomp-dev dh-systemd 37 | libsystemd-dev pkg-config dh-autoreconf libfakeroot zlib1g zlib1g-dev automake 38 | liblzma-dev libzstd-dev` and ensure that tor has systemd support enabled: 39 | `./configure --enable-systemd`. Without this, on a recent Ubuntu, my tor service 40 | was forcefully restarted (SIGINT interrupt) by systemd every five minutes. 41 | 42 | If you want to install on your localsystem, run `make install`. For our case we 43 | just want the tor binary at `src/app/tor`. 44 | 45 | ## Use tor in TB and at a relay 46 | Download and install a fresh Tor Browser (TB) from torproject.org. Make sure it 47 | works. From the command line, relative to the folder created when you extracted 48 | TB, run `./Browser/start-tor-browser --verbose` to get some basic log output. 49 | Note the version of tor, in my case, `Tor 0.4.0.5 (git-bf071e34aa26e096)` as 50 | part of TB 8.5.4. Shut down TB, copy the `tor` binary that you compiled earlier 51 | and replace `Browser/TorBrowser/Tor/tor`. Start TB from the command line again, 52 | you should see a different version, in my case `Tor 0.4.1.5 53 | (git-439ca48989ece545)`. 54 | 55 | The relay we run is also on linux, and `tor` is located at `/usr/bin/tor`. To 56 | view relevant logs since last boot `sudo journalctl -b /usr/bin/tor`, where we 57 | find `Tor 0.4.0.5 running on Linux`. Copy the locally compiled `tor` to the 58 | relay at a temporary location and then make sure it's ownership and access 59 | rights are identical to `/usr/bin/tor`. Next, shut down the running tor service 60 | with `sudo service tor stop`, wait for it to stop (typically 30s), copy our 61 | locally compiled tor to replace `/usr/bin/tor` then start the service again. 62 | Checking the logs we see `or 0.4.1.5 (git-439ca48989ece545)`. 63 | 64 | Repeatedly shutting down a relay is detrimental to the network and should be 65 | avoided. Sorry about that. 66 | 67 | We have one more step left before we move on the machine: configure TB to always 68 | use our middle relay. Edit `Browser/TorBrowser/Data/Tor/torrc` and set 69 | `MiddleNodes `, where `` is the fingerprint of the 70 | relay. Start TB, visit a website, and manually confirm that the middle is used 71 | by looking at the circuit display. 72 | 73 | ## Add a bare-bones APE padding machine 74 | Now the fun part. We have several resources at our disposal (mind that links 75 | might be broken in the future, just search for the headings): 76 | - The official [Circuit Padding Developer 77 | Documentation](https://storm.torproject.org/shared/ChieH_sLU93313A2gopZYT3x2waJ41hz5Hn2uG1Uuh7). 78 | - Notes we made on the [implementation of the circuit padding 79 | framework](https://github.com/pylls/padding-machines-for-tor/blob/master/notes/circuit-padding-framework.md). 80 | - The implementation of the current circuit padding machines in tor: 81 | [circuitpadding.c](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.c) 82 | and 83 | [circuitpadding_machines.h](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.h). 84 | 85 | Please consult the above links for details. Moving forward, the focus is to 86 | describe what was done, not necessarily explaining all the details why. 87 | 88 | Since we plan to make changes to tor, create a new branch `git checkout -b 89 | circuit-padding-ape-machine tor-0.4.1.5`. 90 | 91 | We start with declaring two functions, one for the machine at the client and one 92 | at the relay, in `circuitpadding_machines.h`: 93 | 94 | ```c 95 | void circpad_machine_relay_wf_ape(smartlist_t *machines_sl); 96 | void circpad_machine_client_wf_ape(smartlist_t *machines_sl); 97 | ``` 98 | 99 | The definitions go into `circuitpadding_machines.c`: 100 | 101 | ```c 102 | /**************** Adaptive Padding Early (APE) machine ****************/ 103 | 104 | /** 105 | * Create a relay-side padding machine based on the APE design. 106 | */ 107 | void 108 | circpad_machine_relay_wf_ape(smartlist_t *machines_sl) 109 | { 110 | circpad_machine_spec_t *relay_machine 111 | = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 112 | 113 | relay_machine->name = "relay_wf_ape"; 114 | relay_machine->is_origin_side = 0; // relay-side 115 | 116 | // Pad to/from the middle relay, only when the circuit has streams 117 | relay_machine->target_hopnum = 2; 118 | relay_machine->conditions.min_hops = 2; 119 | relay_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 120 | 121 | // limits to help guard against excessive padding 122 | relay_machine->allowed_padding_count = 1; 123 | relay_machine->max_padding_percent = 1; 124 | 125 | // one state to start with: START (-> END, never takes a slot in states) 126 | circpad_machine_states_init(relay_machine, 1); 127 | relay_machine->states[CIRCPAD_STATE_START]. 128 | next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 129 | CIRCPAD_STATE_END; 130 | 131 | // register the machine 132 | relay_machine->machine_num = smartlist_len(machines_sl); 133 | circpad_register_padding_machine(relay_machine, machines_sl); 134 | 135 | log_info(LD_CIRC, 136 | "Registered relay WF APE padding machine (%u)", 137 | relay_machine->machine_num); 138 | } 139 | 140 | /** 141 | * Create a client-side padding machine based on the APE design. 142 | */ 143 | void 144 | circpad_machine_client_wf_ape(smartlist_t *machines_sl) 145 | { 146 | circpad_machine_spec_t *client_machine 147 | = tor_malloc_zero(sizeof(circpad_machine_spec_t)); 148 | 149 | client_machine->name = "client_wf_ape"; 150 | client_machine->is_origin_side = 1; // client-side 151 | 152 | /** Pad to/from the middle relay, only when the circuit has streams, and only 153 | * for general purpose circuits (typical for web browsing) 154 | */ 155 | client_machine->target_hopnum = 2; 156 | client_machine->conditions.min_hops = 2; 157 | client_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS; 158 | client_machine->conditions.purpose_mask = 159 | circpad_circ_purpose_to_mask(CIRCUIT_PURPOSE_C_GENERAL); 160 | 161 | // limits to help guard against excessive padding 162 | client_machine->allowed_padding_count = 1; 163 | client_machine->max_padding_percent = 1; 164 | 165 | // one state to start with: START (-> END, never takes a slot in states) 166 | circpad_machine_states_init(client_machine, 1); 167 | client_machine->states[CIRCPAD_STATE_START]. 168 | next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 169 | CIRCPAD_STATE_END; 170 | 171 | client_machine->machine_num = smartlist_len(machines_sl); 172 | circpad_register_padding_machine(client_machine, machines_sl); 173 | log_info(LD_CIRC, 174 | "Registered client WF APE padding machine (%u)", 175 | client_machine->machine_num); 176 | } 177 | ``` 178 | 179 | We also have to modify `circpad_machines_init()` in `circuitpadding.c` to 180 | register our machines: 181 | 182 | ```c 183 | /* Register machines for the APE WF defense */ 184 | circpad_machine_client_wf_ape(origin_padding_machines); 185 | circpad_machine_relay_wf_ape(relay_padding_machines); 186 | ``` 187 | 188 | We run `make` to get a new `tor` binary and copy it to our local TB. 189 | 190 | ## Run the machine 191 | To be able 192 | to view circuit info events in the console as we launch TB, we add `Log 193 | [circ]info notice stdout` to `torrc` of TB. 194 | 195 | Running TB to visit example.com we first find in the log: 196 | 197 | ``` 198 | Aug 30 18:36:43.000 [info] circpad_machine_client_hide_intro_circuits(): Registered client intro point hiding padding machine (0) 199 | Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_intro_circuits(): Registered relay intro circuit hiding padding machine (0) 200 | Aug 30 18:36:43.000 [info] circpad_machine_client_hide_rend_circuits(): Registered client rendezvous circuit hiding padding machine (1) 201 | Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_rend_circuits(): Registered relay rendezvous circuit hiding padding machine (1) 202 | Aug 30 18:36:43.000 [info] circpad_machine_client_wf_ape(): Registered client WF APE padding machine (2) 203 | Aug 30 18:36:43.000 [info] circpad_machine_relay_wf_ape(): Registered relay WF APE padding machine (2) 204 | ``` 205 | 206 | All good, our machine is running. Looking further we find: 207 | 208 | ``` 209 | Aug 30 18:36:55.000 [info] circpad_setup_machine_on_circ(): Registering machine client_wf_ape to origin circ 2 (5) 210 | Aug 30 18:36:55.000 [info] circpad_node_supports_padding(): Checking padding: supported 211 | Aug 30 18:36:55.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 2 (5), command 2 212 | Aug 30 18:36:55.000 [info] circpad_machine_spec_transition(): Circuit 2 circpad machine 0 transitioning from 0 to 65535 213 | Aug 30 18:36:55.000 [info] circpad_machine_spec_transitioned_to_end(): Padding machine in end state on circuit 2 (5) 214 | Aug 30 18:36:55.000 [info] circpad_circuit_machineinfo_free_idx(): Freeing padding info idx 0 on circuit 2 (5) 215 | Aug 30 18:36:55.000 [info] circpad_handle_padding_negotiated(): Middle node did not accept our padding request on circuit 2 (5) 216 | ``` 217 | We see that our middle support padding (since we upgraded to tor-0.4.1.5), that 218 | we attempt to negotiate, our machine starts on the client, transitions to the 219 | end state, and is freed. The last line shows that the middle doesn't have a 220 | padding machine that can run. 221 | 222 | Next, we follow the same steps as earlier and replace the modified `tor` at our 223 | middle relay. We don't update the logging there to avoid logging on the info 224 | level on the live network. Looking at the client log again we see that 225 | negotiation works as before except for the last line: it's missing, so the 226 | machine is running at the middle as well. 227 | 228 | ## Implementing the APE state machine 229 | 230 | Porting is fairly straightforward: define the states for all machines, add two 231 | more machines (for the receive portion of WTFP-PAD, beyond AP), and pick 232 | reasonable parameters for the distributions (I completely winged it now, as when 233 | implementing APE). The [circuit-padding-ape-machine 234 | branch](https://github.com/pylls/tor/tree/circuit-padding-ape-machine) contains 235 | the commits for the full machines with plenty of comments. 236 | 237 | Some comments on the process: 238 | 239 | - `tor-0.4.1.5` does not support two machines on the same circuit, the following 240 | fix has to be made: https://trac.torproject.org/projects/tor/ticket/31111 . 241 | The good news is that everything else seems to work after the small change in 242 | the fix. 243 | - APE randomizes its distributions. Currently, this can only be done during 244 | start of `tor`. This makes sense in the censorship circumvention setting 245 | (`obfs4`), less so for WF defenses: further randomizing each circuit is likely 246 | a PITA for attackers with few downsides. 247 | - it was annoying to figure out that the lack of systemd support in my compiled 248 | tor caused systemd to interrupt (SIGINT) my tor process at the middle relay 249 | every five minutes. Updated build steps above to hopefully save others the 250 | pain. 251 | - there's for sure some bug on relays when sending padding cells too early (?). 252 | It can happen with some probability with the APE implementation due to 253 | `circpad_machine_relay_wf_ape_send()`. Will investigate next. 254 | - Moving the registration of machines from the definition of the machines to 255 | `circpad_machines_init()` makes sense, as suggested in the circuit padding doc 256 | draft. 257 | 258 | Remember that APE is just a proof-of-concept and we make zero claims about its 259 | ability to withstand WF attacks, in particular those based on deep learning. 260 | --------------------------------------------------------------------------------