├── .gitignore
├── BOM.md
├── LICENSE
├── README.md
├── collect-traces
    ├── README.md
    ├── client
    │   ├── cleanup_containers.sh
    │   ├── docker-debian
    │   │   └── Dockerfile
    │   ├── docker-ubuntu
    │   │   └── Dockerfile
    │   ├── exp
    │   │   └── collect.py
    │   ├── run.sh
    │   └── set_tb_permissions.sh
    ├── extract
    │   ├── all.sh
    │   ├── circpadsim.py
    │   ├── extract.py
    │   └── sim-all.sh
    ├── lists
    │   ├── monitored
    │   │   ├── README.md
    │   │   └── top-50-selected-multi.list
    │   └── unmonitored
    │   │   ├── README.md
    │   │   ├── reddit-front-all.list
    │   │   ├── reddit-front-year.list
    │   │   └── reddit.py
    └── server
    │   ├── circpad-server.py
    │   └── run-collect-server.sh
├── dataset
    └── README.md
├── evaluation
    ├── once.py
    ├── overhead.py
    ├── shared.py
    ├── tweak.md
    ├── tweak.py
    ├── tweak.sh
    └── visualize.py
├── evolve
    ├── README.md
    ├── circpadsim.py
    ├── evolve.py
    ├── loop.py
    ├── machine.py
    └── shared.py
├── machines
    ├── hello-world.md
    ├── phase2
    │   ├── README.md
    │   ├── april-mc
    │   ├── april-mr
    │   ├── april-nopadding.png
    │   ├── april.png
    │   ├── february-mc
    │   ├── february-mr
    │   ├── february-nopadding.png
    │   ├── february.png
    │   ├── june-mc
    │   ├── june-mr
    │   ├── june-nopadding.png
    │   ├── june.png
    │   ├── march-mc
    │   ├── march-mr
    │   ├── march-nopadding.png
    │   ├── march.png
    │   ├── may-mc
    │   ├── may-mr
    │   ├── may-nopadding.png
    │   ├── may.png
    │   ├── strawman-mc
    │   ├── strawman-mr
    │   ├── strawman-nopadding.png
    │   └── strawman.png
    └── phase3
    │   ├── README.md
    │   ├── interspace-mc.c
    │   ├── interspace-mr.c
    │   ├── interspace-nopadding.png
    │   ├── interspace.png
    │   ├── spring-mc.c
    │   ├── spring-mr.c
    │   ├── spring-nopadding.png
    │   └── spring.png
└── notes
    ├── circuit-padding-framework.md
    └── machine-from-scratch.md


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Prerequisites
 2 | *.d
 3 | 
 4 | # Object files
 5 | *.o
 6 | *.ko
 7 | *.obj
 8 | *.elf
 9 | 
10 | # Linker output
11 | *.ilk
12 | *.map
13 | *.exp
14 | 
15 | # Precompiled Headers
16 | *.gch
17 | *.pch
18 | 
19 | # Libraries
20 | *.lib
21 | *.a
22 | *.la
23 | *.lo
24 | 
25 | # Shared objects (inc. Windows DLLs)
26 | *.dll
27 | *.so
28 | *.so.*
29 | *.dylib
30 | 
31 | # Executables
32 | *.exe
33 | *.out
34 | *.app
35 | *.i*86
36 | *.x86_64
37 | *.hex
38 | 
39 | # Debug files
40 | *.dSYM/
41 | *.su
42 | *.idb
43 | *.pdb
44 | 
45 | # Kernel Module Compile Results
46 | *.mod*
47 | *.cmd
48 | .tmp_versions/
49 | modules.order
50 | Module.symvers
51 | Mkfile.old
52 | dkms.conf
53 | 
54 | collect-traces/exp/tor-browser_en-US
55 | *_pycache__*
56 | 


--------------------------------------------------------------------------------
/BOM.md:
--------------------------------------------------------------------------------
 1 | # Software Bill of Materials
 2 | This is the 
 3 | [software bill of materials](https://en.wikipedia.org/wiki/Software_bill_of_materials) 
 4 | of the padding machines created in this project. We consider software used
 5 | during the development (like editor and OS) and experimentation out of scope.
 6 | 
 7 | The machines only depend on Tor's onion router software little-t tor
 8 | available at https://gitweb.torproject.org/tor.git/. 
 9 | 
10 | A design goal for the padding machines is to not introduce further dependencies
11 | into tor for the machines to work. 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (c) 2019, pulls
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | 1. Redistributions of source code must retain the above copyright notice, this
10 |    list of conditions and the following disclaimer.
11 | 
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 |    this list of conditions and the following disclaimer in the documentation
14 |    and/or other materials provided with the distribution.
15 | 
16 | 3. Neither the name of the copyright holder nor the names of its
17 |    contributors may be used to endorse or promote products derived from
18 |    this software without specific prior written permission.
19 | 
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Padding Machines for Tor
 2 | This is the repository for the [NGI Zero PET](https://nlnet.nl/PET/) project
 3 | "Padding Machines by Tor". The goal of the project was to create one or more
 4 | padding machines for Tor's new [circuit padding
 5 | framework](https://blog.torproject.org/new-release-tor-0405). The padding
 6 | machines should defend against [Website Fingerprinting (WF)
 7 | attacks](https://blog.torproject.org/critique-website-traffic-fingerprinting-attacks).
 8 | 
 9 | ## Project Results
10 | This project made several contributions with the help of additional funding from
11 | [the Swedish Internet Foundation](https://internetstiftelsen.se/en/) for a
12 | related project.
13 | 
14 | Notable results:
15 | - [Developer notes on the circuit padding framework](notes/circuit-padding-framework.md).
16 | - [Building a padding machine from scratch](notes/machine-from-scratch.md).
17 | - [Implemented an APE-like padding machine](https://github.com/pylls/tor/tree/circuit-padding-ape-machine). 
18 | - Tor trac tickets: [#31098](https://trac.torproject.org/projects/tor/ticket/31098),
19 |   [#31111](https://trac.torproject.org/projects/tor/ticket/31111),
20 |   [#31112](https://trac.torproject.org/projects/tor/ticket/31112),
21 |   [#31113](https://trac.torproject.org/projects/tor/ticket/31113).
22 | 
23 | - [A minimal simulator](https://github.com/pylls/circpad-sim) for padding
24 |   machines in Tor's circuit padding framework, see
25 |   [#31788](https://trac.torproject.org/projects/tor/ticket/31788). 
26 | - [Simple collection tools](collect-traces/) for collecting traces for the circpad simulator.
27 | - [The goodenough dataset](dataset/) tailored to the circpad simulator and for
28 |   creating "good enough" machines. 
29 | - [An evaluation tool](evaluation/once.py) for running the Deep Fingerprinting
30 |   (DF) attack against a dataset, producing a number of relevant metrics. Based
31 |   on a port of DF to PyTorch.
32 | - [An example machine](machines/hello-world.md) designed, implemented,
33 |   evaluated, and documented.
34 | - [Evolved machines using genetic programming](machines/phase2). The best
35 |   machine is a more effective defense against DF than WTF-PAD.
36 | - [The final padding machines for Tor](machines/phase3) consisting of a
37 |   cleaned-up version of the best evolved machine and a tailored machine that is
38 |   an even better defense.
39 | - [Tools for evolving machines](evolve/) using genetic programming.
40 | - [Highlights of the
41 |   project](https://lists.torproject.org/pipermail/tor-project/2020-November/003018.html)
42 |   were shared as part of the November 2020 Tor DEMO Day.
43 | 
44 | The work in the project is documented in a pre-print paper on arxiv. Results
45 | from the pre-print will be incorporated into a later submission to an academic
46 | conference together with new unpublished results (other project).
47 | 
48 | ## Acknowledgements
49 | This project is made possible thanks to a generous grant from the [NGI Zero
50 | PET](https://nlnet.nl/PET/) project, that in turn is made possible with
51 | financial support from the [European Commission's](https://ec.europa.eu/) [Next
52 | Generation Internet](https://www.ngi.eu/) programme, under the aegis of [DG
53 | Communications Networks, Content and
54 | Technology](https://ec.europa.eu/info/departments/communications-networks-content-and-technology_en).
55 | Co-financing (for administrative costs and equipment) is provided by [Computer
56 | Science](https://www.kau.se/en/cs) at [Karlstad
57 | University](https://www.kau.se/en). [The Swedish Internet
58 | Foundation](https://internetstiftelsen.se/en/) also funded part of the work by
59 | enabling me to spend extra time on the simulator (synergies with another
60 | project) and tweaking the Interspace machine.


--------------------------------------------------------------------------------
/collect-traces/README.md:
--------------------------------------------------------------------------------
  1 | # Howto collect a lot of traces
  2 | Here we describe how we collect traces from Tor Browser in a large scale with
  3 | relative ease. We make basic use of python, shell-scripts, and containers. The
  4 | idea is to run many headless Tor Browser clients in containers that repeatedly
  5 | get work from a collection server. The work consists of a URL to visit. While
  6 | visiting a URL the client records its tor log and uploads it the server.
  7 | 
  8 | Note that everything in this folder is of *research quality*, we share this with
  9 | the hope of making it easier for other researchers.
 10 | 
 11 | ## Modify Tor Browser
 12 | First download this folder and a fresh Linux Tor Browser install from
 13 | torproject.org. Edit `Browser/start-tor-browser`, line 12, change it to:
 14 | 
 15 | ```bash
 16 | if [ "x$DISPLAY" = "x" ] && [[ "$*" != *--headless* ]]; then
 17 | ```
 18 | 
 19 | This makes it possible to run Tor Browser in headless mode without a full X
 20 | install (no more `xvfb`, yay).
 21 | 
 22 | Tor Browser in headless mode will not display the Tor launcher start-up prompt and will hang indefinitely. To skip the prompt, create a preference file in the Firefox profile that ships with TB: `Browser/TorBrowser/Data/Browser/profile.default/user.js`, and add this single line:
 23 | 
 24 | ```
 25 | user_pref("extensions.torlauncher.prompt_at_startup", false);
 26 | ```
 27 | 
 28 | Edit `Browser/TorBrowser/Data/Tor/torrc` and set any necessary restrictions,
 29 | e.g., `EntryNodes`, `MiddleNodes` or`UseEntryGuards`, depending on experiment to
 30 | run. If you're using the [circuitpadding
 31 | simulator](https://github.com/pylls/circpad-sim), build the custom `tor` binary,
 32 | add it to TB at `Browser/TorBrowser/Tor/`, and add ``Log [circ]info notice
 33 | stdout'' to `torrc`.
 34 | 
 35 | When we collected our traces for the goodenough dataset we used the following torrc:
 36 | 
 37 | ```
 38 | Log [circ]info notice stdout
 39 | UseEntryGuards 0
 40 | ```
 41 | 
 42 | ## Build the docker container
 43 | 1. On the machine(s) you want to use for collection, install docker. 
 44 | 2. Build the Dockerfile in either `docker-debian` or `docker-ubuntu`, depending
 45 |    on what fits the environment where you built the custom `tor`binary. You
 46 |    build the container by running: `docker build -t wf-collect .` (note the
 47 |    dot).
 48 | 
 49 | ## Starting containers
 50 | On each machine:
 51 | 1. Copy `tor-browser_en-US` that you modified earlier into `exp`. 
 52 | 2. Run `./set_tb_permissions.sh`. 
 53 | 
 54 | Edit `run.sh` and then run it.
 55 | 
 56 | For our experiments we created three zip-files of Tor Browser with different
 57 | security levels/settings set and put them all in the `exp` folder. We then used
 58 | the following command to rotate on each machine:
 59 | 
 60 | ```
 61 | rm -rf collect/exp/tor-browser_en-US && cd collect/exp/ && unzip tor-browser_en-US-safest.zip && cd ../ && ./set_tb_permissions.sh && ./run.sh
 62 | ```
 63 | 
 64 | ## Setup a collection server
 65 | Run `circpad-server.py` on a server that can be reached from the docker
 66 | containers. The parameters to the script are largely self-explanatory:
 67 | 
 68 | ```
 69 | usage: circpad-server.py [-h] -l L -n N -d D [-m M] [-s S]
 70 | 
 71 | optional arguments:
 72 |   -h, --help  show this help message and exit
 73 |   -l L        file with list of sites to visit, one site per line
 74 |   -n N        number of samples
 75 |   -d D        data folder for storing results
 76 |   -m M        minimum number of lines in torlog to accept
 77 |   -s S        stop collecting at this many logs collected, regardless of
 78 |               remaining sites or samples (useful for unmonitored sites)
 79 | ```
 80 | 
 81 | All clients will attempt to get work from the server, and on failure, sleeps for
 82 | the specified timeout (default: 60s) before trying again. For our experiments we
 83 | used 7 machines with 20 containers each talking to a server with a single modest
 84 | core without much trouble. All machines, including the server, were located in
 85 | the same cluster. Running the server on separate physical machines may mean the
 86 | server becomes a too-big bottleneck during collection due to being single
 87 | threaded.
 88 | 
 89 | ## Extract traces from the dataset
 90 | Once you've collected your raw dataset the next step is to extract the useful
 91 | logs and get some circpad traces. The `extract` folder contains all you need:
 92 | the `extract.py` script will verify that the logs contain traces from visiting
 93 | the intended websites and structure the dataset as in our goodenough dataset.
 94 | 
 95 | ```
 96 | usage: extract.py [-h] -i I -o O -t T -l L [--monitored] [--unmonitored]
 97 |                   [-c C] [-s S] [-m M]
 98 | 
 99 | optional arguments:
100 |   -h, --help     show this help message and exit
101 |   -i I           input folder of logs
102 |   -o O           output folder for logs
103 |   -t T           output folder for traces
104 |   -l L           file with list of sites to visit, one site per line
105 |   --monitored    extract monitored
106 |   --unmonitored  extract unmonitored
107 |   -c C           the number of monitored classes
108 |   -s S           the number of samples
109 |   -m M           minimum number of lines in a trace
110 | ```
111 | 
112 | See the helper `all.sh` script for examples on how to use `extract.py`.
113 | 


--------------------------------------------------------------------------------
/collect-traces/client/cleanup_containers.sh:
--------------------------------------------------------------------------------
1 | docker stop $(docker ps -a -q)
2 | docker rm $(docker ps -a -q)


--------------------------------------------------------------------------------
/collect-traces/client/docker-debian/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM debian:testing
 2 | MAINTAINER Tobias Pulls <tobias.pulls@kau.se>
 3 | 
 4 | # install and clean since we'll be running many copies
 5 | RUN apt-get update && apt-get install -y \
 6 |         dumb-init \
 7 |         python3 \
 8 |         python3-requests \
 9 |         firefox-esr
10 | RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
11 | 
12 | # setup a user to not run as root
13 | ENV HOME /home/user
14 | ENV LANG C.UTF-8
15 | RUN useradd --create-home --home-dir $HOME user
16 | WORKDIR $HOME
17 | USER user
18 | 
19 | ENTRYPOINT ["dumb-init", "--"]
20 | 


--------------------------------------------------------------------------------
/collect-traces/client/docker-ubuntu/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM ubuntu:rolling
 2 | MAINTAINER Tobias Pulls <tobias.pulls@kau.se>
 3 | 
 4 | # install and clean since we'll be running many copies
 5 | RUN apt-get update && apt-get install -y \
 6 |         dumb-init \
 7 |         python3 \
 8 |         python3-requests \
 9 |         firefox
10 | RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
11 | 
12 | # setup a user to not run as root
13 | ENV HOME /home/user
14 | ENV LANG C.UTF-8
15 | RUN useradd --create-home --home-dir $HOME user
16 | WORKDIR $HOME
17 | USER user
18 | 
19 | ENTRYPOINT ["dumb-init", "--"]
20 | 


--------------------------------------------------------------------------------
/collect-traces/client/exp/collect.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | """Collect Network Traces from Tor Browser
  3 | 
  4 | We assume that this script will execute from multiple containers that share the
  5 | exact same script arguments. The script gets its work and uploads the traces it
  6 | collects to the sever at the specific url.
  7 | """
  8 | import argparse
  9 | import os
 10 | import sys
 11 | import random
 12 | import shutil
 13 | import string
 14 | import tempfile
 15 | import time
 16 | import datetime
 17 | import subprocess
 18 | import signal
 19 | 
 20 | import requests
 21 | from requests.exceptions import Timeout, ConnectionError, ConnectTimeout
 22 | 
 23 | ap = argparse.ArgumentParser()
 24 | ap.add_argument("-b", required=True, 
 25 |     help="folder with Tor Browser")
 26 | ap.add_argument("-u", required=True, 
 27 |     help="the complete URL to the server")
 28 | 
 29 | ap.add_argument("-a", required=False, default=5, type=int,
 30 |     help="number of attempts to collect each trace")
 31 | ap.add_argument("-t", required=False, default=60, type=int,
 32 |     help="timeout (s) for each TB visit")
 33 | ap.add_argument("-m", required=False, default=100, type=int,
 34 |     help="minimum number of liens in torlog to accept")
 35 | args = vars(ap.parse_args())
 36 | 
 37 | TBFILE = "start-tor-browser.desktop"
 38 | CIRCPAD_EVENT = "circpad_trace_event"
 39 | 
 40 | tmpdir  = tmpdir = tempfile.mkdtemp()
 41 | 
 42 | def main():
 43 |     if not os.path.exists(args["b"]):
 44 |         sys.exit(f"Tor Browser directory {args['b']} does not exist")
 45 |     if not os.path.isfile(os.path.join(args["b"], TBFILE)):
 46 |         sys.exit(f"Tor Browser directory {args['b']} missing {TBFILE}")
 47 | 
 48 |     # on SIGINT remove the temporary folder
 49 |     def signal_handler(sig, frame):
 50 |         shutil.rmtree(tmpdir)
 51 |         sys.exit(0)
 52 |     signal.signal(signal.SIGINT, signal_handler)
 53 | 
 54 |     tb = make_tb_copy(args["b"])
 55 |     print("two warmup visits for fresh consensus and whatnot update checks")
 56 |     print(f"\t got {len(visit('kau.se/en', tb, args['t']))} log-lines")
 57 |     print(f"\t got {len(visit('kau.se/en/cs', tb, args['t']))} log-lines")
 58 | 
 59 |     work = ""
 60 |     last_site = ""
 61 |     while True:
 62 |         # either work will be empty or contain the log from collecting below
 63 |         if work == "":
 64 |             # get new work
 65 |             work = get_work()
 66 |         else:
 67 |             # upload work and get new work
 68 |             work = upload_work(work, last_site)
 69 | 
 70 |         # do any work if we got any, or sleep a bit
 71 |         if work != "":
 72 |             last_site = work
 73 |             work = collect(last_site, tb)
 74 |         else:
 75 |             time.sleep(args["t"])
 76 | 
 77 |     # cleanup, if we ever get here somehow
 78 |     shutil.rmtree(tmpdir)
 79 | 
 80 | def get_work():
 81 |     try:
 82 |         response = requests.get(args["u"], timeout=args["t"])
 83 |         if response:
 84 |             return response.content.decode('UTF-8')
 85 |     except (Timeout, ConnectionError, ConnectTimeout):
 86 |         return ""
 87 |     return ""
 88 | 
 89 | def upload_work(log, site):
 90 |     print(f"\t {now()} uploading log of len {len(log)}...")
 91 | 
 92 |     try:
 93 |         response = requests.post(
 94 |             args["u"],
 95 |             timeout=args["t"],
 96 |             data=[('log', '\n'.join(log)), ('site', site)]
 97 |         )
 98 |         if response:
 99 |             return response.content.decode('UTF-8')
100 |     except (Timeout, ConnectionError, ConnectTimeout):
101 |         return ""
102 |     return ""
103 | 
104 | def collect(site, tb_orig):
105 |     print(f"attempting to collect site {site}")
106 |     for _ in range(args["a"]):
107 |         # create fresh TB copy for this visit
108 |         tb = make_tb_copy(tb_orig)
109 | 
110 |         # visit with TB, blocking, and get stdout (the log)
111 |         log = visit(site, tb, args["t"])
112 |         print(f"\t {now()} got {len(log)} circpad events in log")
113 | 
114 |         # cleanup our TB copy
115 |         shutil.rmtree(tb)
116 | 
117 |         # done if long enough trace
118 |         if len(log) >= args["m"]:
119 |             return log
120 |     
121 |     return ""
122 | 
123 | def make_tb_copy(src):
124 |     dst = os.path.join(tmpdir, 
125 |     ''.join(random.choices(string.ascii_uppercase + string.digits, k=24)))
126 | 
127 |     # ibus breaks on multiple copies that move location, need to ignore
128 |     shutil.copytree(src, dst, ignore=shutil.ignore_patterns('ibus'))
129 |     return dst
130 | 
131 | def visit(url, tb, timeout):
132 |     tb = os.path.join(tb, "Browser", "start-tor-browser")
133 |     url = url.replace("'", "\\'")
134 |     url = url.replace(";", "\;")
135 |     cmd = f"timeout -k 5 {str(timeout)} {tb} --verbose --headless {url}"
136 |     print(f"\t {now()} {cmd}")
137 | 
138 |     result = subprocess.run(
139 |         cmd,
140 |         capture_output=True,
141 |         text=True,
142 |         shell=True
143 |     )
144 |     
145 |     return filter_circpad_lines(result.stdout)
146 | 
147 | def filter_circpad_lines(stdout):
148 |     ''' Filters the log for trace events from the circuitpadding 
149 |     framework, saving space.
150 |     '''
151 |     out = []
152 |     lines = stdout.split("\n")
153 |     for l in lines:
154 |         if CIRCPAD_EVENT in l:
155 |             out.append(l)
156 |     
157 |     return out
158 | 
159 | def now():
160 |     return datetime.datetime.now()
161 | 
162 | if __name__ == "__main__":
163 |     main()
164 | 


--------------------------------------------------------------------------------
/collect-traces/client/run.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # the number of docker instances to create
 4 | WORKERS=20
 5 | 
 6 | # the absolute path to the experiment folder where you put collect.py and TB
 7 | FOLDER=/home/user/collect/exp/
 8 | 
 9 | # the number of seconds to collect data for per instance visit
10 | TIMEOUT=60
11 | 
12 | # the minimum number of circpad trace events to accept
13 | MIN=100
14 | 
15 | # the URL of the server
16 | SERVER=http://server:5000
17 | 
18 | for ((n=0;n<$WORKERS;n++)) do
19 |     docker run -d \
20 |     -v $FOLDER:/home/user/exp wf-collect \
21 |     python3 -u \
22 |     /home/user/exp/collect.py \
23 |     -b /home/user/exp/tor-browser_en-US/ \
24 |     -u $SERVER \
25 |     -m $MIN \
26 |     -t $TIMEOUT
27 | done
28 | 


--------------------------------------------------------------------------------
/collect-traces/client/set_tb_permissions.sh:
--------------------------------------------------------------------------------
1 | chmod a+r -R exp/tor-browser_en-US/
2 | find exp/tor-browser_en-US/ -type d -print0 | xargs -0 chmod 755


--------------------------------------------------------------------------------
/collect-traces/extract/all.sh:
--------------------------------------------------------------------------------
 1 | # we assume that raw contains folders as collected by `circpad-server.py` with the same lists
 2 | 
 3 | # extract monitored
 4 | ./extract.py -i raw/safer-mon/ -o safer/client-logs/monitored/ -t safer/client-traces/monitored/ -l top-50-selected-multi.list --monitored
 5 | ./extract.py -i raw/safest-mon/ -o safest/client-logs/monitored/ -t safest/client-traces/monitored/ -l top-50-selected-multi.list --monitored
 6 | ./extract.py -i raw/standard-mon/ -o standard/client-logs/monitored/ -t standard/client-traces/monitored/ -l top-50-selected-multi.list --monitored
 7 | # extract unmonitored
 8 | ./extract.py -i raw/safer4-unmon/ -o safer/client-logs/unmonitored/ -t safer/client-traces/unmonitored/ -l reddit-front-year.list --unmonitored
 9 | ./extract.py -i raw/safest4-unmon/ -o safest/client-logs/unmonitored/ -t safest/client-traces/unmonitored/ -l reddit-front-year.list --unmonitored
10 | ./extract.py -i raw/standard-unmon/ -o standard/client-logs/unmonitored/ -t standard/client-traces/unmonitored/ -l reddit-front-year.list --unmonitored
11 | 
12 | # simulate fake relay traces
13 | ./simrelaytrace.py -i safer/client-traces/monitored/ -o safer/fakerelay-traces/monitored/
14 | ./simrelaytrace.py -i safer/client-traces/unmonitored/ -o safer/fakerelay-traces/unmonitored/
15 | ./simrelaytrace.py -i safest/client-traces/monitored/ -o safest/fakerelay-traces/monitored/
16 | ./simrelaytrace.py -i safest/client-traces/unmonitored/ -o safest/fakerelay-traces/unmonitored/
17 | ./simrelaytrace.py -i standard/client-traces/monitored/ -o standard/fakerelay-traces/monitored/
18 | ./simrelaytrace.py -i standard/client-traces/unmonitored/ -o standard/fakerelay-traces/unmonitored/
19 | 


--------------------------------------------------------------------------------
/collect-traces/extract/circpadsim.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import sys
  3 | import socket
  4 | 
  5 | CIRCPAD_ERROR_WRONG_FORMAT = "invalid trace format"
  6 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin"
  7 | 
  8 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent"
  9 | CIRCPAD_EVENT_NONPADDING_RECV = "circpad_cell_event_nonpadding_received"
 10 | CIRCPAD_EVENT_PADDING_SENT = "circpad_cell_event_padding_sent"
 11 | CIRCPAD_EVENT_PADDING_RECV = "circpad_cell_event_padding_received"
 12 | 
 13 | CIRCPAD_LOG = "circpad_trace_event"
 14 | CIRCPAD_LOG_TIMESTAMP = "timestamp="
 15 | CIRCPAD_LOG_CIRC_ID = "client_circ_id="
 16 | CIRCPAD_LOG_EVENT = "event="
 17 | 
 18 | CIRCPAD_BLACKLISTED_ADDRESSES = ["aus1.torproject.org"]
 19 | CIRCPAD_BLACKLISTED_EVENTS = [
 20 |     "circpad_negotiate_logging"
 21 | ]
 22 | 
 23 | def circpad_get_all_addresses(trace):
 24 |     addresses = []
 25 |     for l in trace:
 26 |         if len(l) < 2:
 27 |             sys.exit(CIRCPAD_ERROR_WRONG_FORMAT)
 28 |         if CIRCPAD_ADDRESS_EVENT in l[1]:
 29 |             if len(l[1]) < 2:
 30 |                 sys.exit(CIRCPAD_ERROR_WRONG_FORMAT)
 31 |             addresses.append(l[1].split()[1])
 32 |     return addresses
 33 | 
 34 | def circpad_get_nonpadding_times(trace):
 35 |     sent_nonpadding, recv_nonpadding = [], []
 36 | 
 37 |     for l in trace:
 38 |         split = l.split()
 39 |         if CIRCPAD_EVENT_NONPADDING_SENT in split[1]:
 40 |             sent_nonpadding.append(split[0])
 41 |         elif CIRCPAD_EVENT_NONPADDING_RECV in split[1]:
 42 |             recv_nonpadding.append(split[0])
 43 |     
 44 |     return sent_nonpadding, recv_nonpadding
 45 | 
 46 | def circpad_get_padding_times(trace):
 47 |     sent_padding, recv_padding = [], []
 48 | 
 49 |     for l in trace:
 50 |         split = l.split()
 51 |         if CIRCPAD_EVENT_PADDING_SENT in split[1]:
 52 |             sent_padding.append(split[0])
 53 |         elif CIRCPAD_EVENT_PADDING_RECV in split[1]:
 54 |             recv_padding.append(split[0])
 55 |     
 56 |     return sent_padding, recv_padding
 57 | 
 58 | def circpad_parse_line(line):
 59 |     split = line.split()
 60 |     assert(len(split) >= 2)
 61 |     event = split[1]
 62 |     timestamp = int(split[0])
 63 | 
 64 |     return event, timestamp
 65 | 
 66 | def circpad_lines_to_trace(lines):
 67 |     trace = []
 68 |     for l in lines:
 69 |         event, timestamp = circpad_parse_line(l)
 70 |         trace.append((timestamp, event))
 71 |     return trace
 72 | 
 73 | def circpad_extract_log_traces(
 74 |     log_lines,
 75 |     source_client=True,
 76 |     source_relay=True,
 77 |     allow_ips=False,
 78 |     filter_client_negotiate=False,
 79 |     filter_relay_negotiate=False,
 80 |     max_length=999999999
 81 |     ):
 82 |     # helper function
 83 |     def blacklist_hit(d):
 84 |         for a in circpad_get_all_addresses(d):
 85 |             if a in CIRCPAD_BLACKLISTED_ADDRESSES:
 86 |                 return True
 87 |         return False
 88 | 
 89 |     # helper to extract one line
 90 |     def extract_from_line(line):
 91 |         n = line.index(CIRCPAD_LOG_TIMESTAMP)+len(CIRCPAD_LOG_TIMESTAMP)
 92 |         timestamp = line[n:].split(" ", maxsplit=1)[0]
 93 |         n = line.index(CIRCPAD_LOG_CIRC_ID)+len(CIRCPAD_LOG_CIRC_ID)
 94 |         cid = line[n:].split(" ", maxsplit=1)[0]
 95 | 
 96 |         # an event is the last part, no need to split on space like we did earlier
 97 |         n = line.index(CIRCPAD_LOG_EVENT)+len(CIRCPAD_LOG_EVENT)
 98 |         event = line[n:]
 99 |         
100 |         return int(cid), int(timestamp), event
101 | 
102 |     circuits = {}
103 |     base = -1
104 |     for line in log_lines:
105 |         if CIRCPAD_LOG in line:
106 |             # skip client/relay if they shouldn't be part of the trace
107 |             if not source_client and "source=client" in line:
108 |                 continue
109 |             if not source_relay and "source=relay" in line:
110 |                 continue
111 | 
112 |             # extract trace and make timestamps relative
113 |             cid, timestamp, event = extract_from_line(line)
114 |             if base == -1:
115 |                 base = timestamp
116 |             timestamp = timestamp - base
117 | 
118 |             # store trace
119 |             if cid in circuits.keys():
120 |                 if len(circuits[cid]) < max_length:
121 |                     circuits[cid] = circuits.get(cid) + [(timestamp, event)]
122 |             else:
123 |                 circuits[cid] = [(timestamp, event)]
124 | 
125 |     # filter out circuits with blacklisted addresses
126 |     for cid in list(circuits.keys()):
127 |         if blacklist_hit(circuits[cid]):
128 |             del circuits[cid]
129 |     # filter out circuits with only IPs (unless arg says otherwise)
130 |     for cid in list(circuits.keys()):
131 |         if not allow_ips and circpad_only_ips_in_trace(circuits[cid]):
132 |             del circuits[cid]
133 | 
134 |     # remove blacklisted events (and associated events)
135 |     for cid in list(circuits.keys()):
136 |         circuits[cid] = circpad_remove_blacklisted_events(circuits[cid],
137 |                         filter_client_negotiate, filter_relay_negotiate)
138 |     
139 |     return circuits
140 |     
141 | 
142 | def circpad_remove_blacklisted_events(
143 |     trace, 
144 |     filter_client_negotiate, 
145 |     filter_relay_negotiate
146 |     ):
147 |     
148 |     result = []
149 |     ignore_next_send_cell = False
150 | 
151 |     for line in trace:
152 |         strline = str(line) # What the hell was this before?
153 | 
154 |         # If we hit a blacklisted event, this means we should ignore the next
155 |         # sent nonpadding cell. Since the blacklisted event should only be
156 |         # triggered client-side, there shouldn't be any impact on relay traces.
157 |         if any(b in strline for b in CIRCPAD_BLACKLISTED_EVENTS):
158 |             ignore_next_send_cell = True
159 |         else:
160 |             if ignore_next_send_cell and CIRCPAD_EVENT_NONPADDING_SENT in strline:
161 |                 ignore_next_send_cell = False
162 |             else:
163 |                 result.append(line)
164 |                 
165 |     return result
166 | 
167 | def circpad_only_ips_in_trace(trace):
168 |     def is_ipv4(addr):
169 |         try:
170 |             socket.inet_aton(addr)
171 |         except (socket.error, TypeError):
172 |             return False
173 |         return True
174 |     def is_ipv6(addr):
175 |         try:
176 |             socket.inet_pton(addr,socket.AF_INET6)
177 |         except (socket.error, TypeError):
178 |             return False
179 |         return True
180 | 
181 |     for a in circpad_get_all_addresses(trace):
182 |         if not is_ipv4(a) and not is_ipv6(a):
183 |             return False
184 |     return True
185 | 
186 | 
187 | def circpad_to_wf(trace, cells=False, timecells=False, dirtime=False, strip=False):
188 |     ''' Get a WF representation of the trace in the specified format.
189 | 
190 |     We support three formats:
191 |     - cells, each line only contains 1 or -1 for outgoing or incoming cells.
192 |     - timecells, relative timestamp (ms) added before each cell.
193 |     - directionaltime, each line has relative time multiplied with cell value.
194 | 
195 |     If the strip flag is set, events prior to a first domain resolution is
196 |     stripped from the trace (if present). Cirucits are typically created in the
197 |     background by Tor Browser to speed-up browsing for users. Removing this is
198 |     beneficital for WF attackers, because it's often assumed (more or less
199 |     realistically so) that an attacker can often detect this (mainly by a
200 |     significant time of "silence" on the wire, followed by what is assumed to be
201 |     a website load).
202 | 
203 |     FIXME: For timecells and dirtime, current magnitute is nanoseconds, might be
204 |     more efficient to round to seconds with lower resultion, especially for deep
205 |     learning attacks.
206 |     '''
207 |     result = []
208 | 
209 |     # only strip if we find the event for an address being resolved
210 |     if strip:
211 |         for i, l in enumerate(trace):
212 |             if CIRCPAD_ADDRESS_EVENT in l[1]:
213 |                 trace = trace[i:]
214 |                 break
215 | 
216 |     for l in trace:
217 |         # outgoing is positive
218 |         if CIRCPAD_EVENT_NONPADDING_SENT in l[1] or \
219 |            CIRCPAD_EVENT_PADDING_SENT in l[1]:
220 |             if cells:
221 |                 result.append("1")
222 |             if timecells:
223 |                 result.append(f"{l[0]} 1")
224 |             if dirtime:
225 |                 result.append(f"{l[0]}") 
226 |             
227 |         # incoming is negative
228 |         elif CIRCPAD_EVENT_NONPADDING_RECV in l[1] or \
229 |            CIRCPAD_EVENT_PADDING_RECV in l[1]:
230 |             if cells:
231 |                 result.append("-1")
232 |             if timecells:
233 |                 result.append(f"{l[0]} -1")
234 |             if dirtime:
235 |                 result.append(f"{l[0]*-1}") 
236 |     return result
237 | 


--------------------------------------------------------------------------------
/collect-traces/extract/extract.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import argparse
  3 | import os
  4 | import sys
  5 | import shutil
  6 | import circpadsim
  7 | 
  8 | ''' Given an input and output folders, extract results.
  9 | 
 10 | Monitored: dimension, use backups, give error, check for already done
 11 | 
 12 | Unmonitored: num, check if done, pick random
 13 | '''
 14 | ap = argparse.ArgumentParser()
 15 | ap.add_argument("-i", required=True,
 16 |     help="input folder of logs")
 17 | ap.add_argument("-o", required=True,
 18 |     help="output folder for logs")
 19 | ap.add_argument("-t", required=True,
 20 |     help="output folder for traces")
 21 | ap.add_argument("-l", required=True,
 22 |     help="file with list of sites to visit, one site per line")
 23 | 
 24 | ap.add_argument("--monitored", required=False, default=False,
 25 |     action="store_true", help="extract monitored")
 26 | ap.add_argument("--unmonitored", required=False, default=False,
 27 |     action="store_true", help="extract unmonitored")
 28 | 
 29 | ap.add_argument("-c", required=False, type=int, default=500,
 30 |     help="the number of monitored classes")
 31 | ap.add_argument("-s", required=False, type=int, default=20,
 32 |     help="the number of samples")
 33 | ap.add_argument("-m", required=False, default=100, type=int,
 34 |     help="minimum number of lines in a trace")
 35 | args = vars(ap.parse_args())
 36 | 
 37 | def main():
 38 |     if (
 39 |         (args["monitored"] and args["unmonitored"]) or
 40 |         (not args["monitored"] and not args["unmonitored"])
 41 |     ):
 42 |         sys.exit("needs exactly one of --monitored or --unmonitored")
 43 | 
 44 |     if not os.path.isdir(args["i"]):
 45 |         sys.exit(f"{args['i']} is not a directory")
 46 |     if not os.path.isdir(args["o"]):
 47 |         sys.exit(f"{args['o']} is not a directory")
 48 |     if not os.path.isdir(args["t"]):
 49 |         sys.exit(f"{args['t']} is not a directory")
 50 | 
 51 |     inlist = os.listdir(args["i"])
 52 |     if len(inlist) < args["c"]*args["s"]:
 53 |         sys.exit(
 54 |             f'tasked to extract {args["c"]*args["s"]} samples, '
 55 |             f'but {args["i"]} contains at most '
 56 |             f'{len(inlist)} samples'
 57 |         )
 58 | 
 59 |     outlist = os.listdir(args["o"])
 60 |     if len(outlist) > 0:
 61 |         sys.exit(f'{args["o"]} is not empty')
 62 | 
 63 |     tracelist = os.listdir(args["t"])
 64 |     if len(tracelist) > 0:
 65 |         sys.exit(f'{args["t"]} is not empty')
 66 |     
 67 |     print(f"reading sites list {args['l']}")
 68 |     sites = get_sites_list()
 69 |     print(f"ok, list has {len(sites)} starting sites")
 70 | 
 71 |     if args["monitored"]:
 72 |         print("monitored")
 73 |         for c in range(args["c"]):
 74 |             print(f"{c}-{0}")
 75 |             # every class has backup traces, that is, extract logs we collected,
 76 |             # starting from the intended sample counter until there is no more
 77 |             # such file
 78 |             backup = args["s"]
 79 |             site = sites[c]
 80 |             for i in range(args["s"]):
 81 |                 infname = f"{c}-{i}.log"
 82 |                 trace, readfname, backup = find_good_trace(c, i, backup, site)
 83 |                 write_trace(trace, results_trace_file(infname))
 84 |                 write_log(readfname, infname)
 85 |     else:
 86 |         print("unmonitored")
 87 |         n = 0
 88 |         for index, site in enumerate(sites):
 89 |             if n >= args["c"]*args["s"]:
 90 |                 break
 91 | 
 92 |             infname = f"{index}-0.log"
 93 |             if not os.path.exists(os.path.join(args["i"], infname)):
 94 |                 continue
 95 |             
 96 |             trace, good = extract_trace(infname, site)
 97 |             if not good:
 98 |                 print(f"not good {infname}")
 99 |                 continue
100 |             
101 |             write_trace(trace, results_trace_file(infname))
102 |             write_log(infname, infname)        
103 | 
104 |             n += 1
105 |             if n % 100 == 0:
106 |                 print(n)
107 | 
108 | def write_log(src, dst):
109 |     shutil.copy(
110 |         os.path.join(args["i"], src), 
111 |         os.path.join(args["o"], dst)
112 |     )
113 | 
114 | def write_trace(output, fname):
115 |     # make time relative before writing
116 |     base = -1
117 |     with open(fname, "w") as f:
118 |         for l in output:
119 |             t = int(l[0])
120 |             if base == -1:
121 |                 base = t
122 |             t = t - base
123 |             f.write(f"{t:016d} {l[1].strip()}\n")
124 | 
125 | def find_good_trace(c, i, backup, site):
126 |     inst = i
127 |     while True:
128 |         infname = f"{c}-{inst}.log"
129 |         if not os.path.exists(os.path.join(args["i"], infname)):
130 |             sys.exit(f"not enough logs for class {c}, instance {i}")
131 | 
132 |         trace, good = extract_trace(infname, site)
133 |         if good:
134 |             return trace, infname, backup
135 |         # no good, try a backup
136 |         print(f"need backup for {c}-{i}")
137 |         inst = backup
138 |         backup += 1
139 | 
140 | def extract_trace(infname, site):
141 |     circuits = {}
142 |     with open(os.path.join(args["i"], infname), 'r') as f:
143 |         circuits = circpadsim.circpad_extract_log_traces(f.readlines(),
144 |             True, True, False, False, False, 10*1000)
145 | 
146 |     if len(circuits) == 0:
147 |         return "", False
148 | 
149 |     # try to find the first circuit with our site that is of acceptable length
150 |     for cid in circuits:
151 |         for l in circuits[cid]:
152 |             s = l[1].split(" ")
153 |             if s[len(s)-1].rstrip() in site:
154 |                 if len(circuits[cid]) >= args["m"]:
155 |                     return circuits[cid], True
156 | 
157 |     return "", False
158 | 
159 | def results_trace_file(fname):
160 |     if os.path.splitext(fname)[1] == ".log":
161 |           return os.path.join(args["t"], os.path.splitext(fname)[0]+'.trace')
162 |     return os.path.join(args["t"], fname+'.trace')
163 | 
164 | def get_sites_list():
165 |     l = []
166 |     with open(args["l"]) as f:
167 |         for line in f:
168 |             site = line.rstrip()
169 |             if site in l:
170 |                 print(f"warning, list of sites has duplicate: {site}")
171 |             l.append(site)
172 |     return l
173 | 
174 | if __name__ == "__main__":
175 |     main()
176 | 


--------------------------------------------------------------------------------
/collect-traces/extract/sim-all.sh:
--------------------------------------------------------------------------------
1 | ./simrelaytrace.py -i safer/client-traces/monitored/ -o safer/fakerelay-traces/monitored/
2 | ./simrelaytrace.py -i safer/client-traces/unmonitored/ -o safer/fakerelay-traces/unmonitored/
3 | ./simrelaytrace.py -i safest/client-traces/monitored/ -o safest/fakerelay-traces/monitored/
4 | ./simrelaytrace.py -i safest/client-traces/unmonitored/ -o safest/fakerelay-traces/unmonitored/
5 | #./simrelaytrace.py -i standard/client-traces/monitored/ -o standard/fakerelay-traces/monitored/
6 | #./simrelaytrace.py -i standard/client-traces/unmonitored/ -o standard/fakerelay-traces/unmonitored/
7 | 


--------------------------------------------------------------------------------
/collect-traces/lists/monitored/README.md:
--------------------------------------------------------------------------------
 1 | # top-50-selected-multi list
 2 | This document describes how the top-50-selected-multi.list file was created. Had
 3 | to get to Alexa rank 212. 
 4 | 
 5 | ## February update
 6 | - replaced five headlines from headlines.yahoo.co.jp
 7 | - replaced one link on ebay.com
 8 |   replaced one etsy.com link, shop taking a break
 9 | 
10 | ## January update
11 | - replaced two headlines that 404:ed from headlines.yahoo.co.jp
12 | - replaced four items that was replaced from ebay.com
13 | - replaced four articles that 404:ed from suara.com
14 | 
15 | ## How created
16 | Approach for the list is simple but boring. Starting from the top of the Alexa
17 | top-list, asking two questions to decide if we include the site or not:
18 | - Is the site reliable to visit over Tor? That is, not behind some cloudwall or
19 |   blacklist? Also, does it load reliably. 
20 | - Does the site contain several _similar_ webpages beyond the frontpage that can
21 |   be accessed? 
22 | 
23 | ## General pruning
24 | - remove all tracking stuff in URL that still gives the page on a clean TB
25 | - avoid porn (for sake of work)
26 | - try not to mix significant content types on pages, like mixing videos and text
27 |   articles on news-site (that's two different classes)
28 | 
29 | ## Per-site notes
30 | - wikipedia.org: took 10 links from today's feature article
31 | - amazon.com: first 10 of Today's deals under $25
32 | - reddit.com: the first 10 subreddits on the frontpage
33 | - okezone.com: the first 10 articles
34 | - yahoo.co.jp: the first 10 articles
35 | - tor.stackexchange.com: 10 latest questions
36 | - ebay.com: first 10 offers
37 | - aliexpress.com: top 10 selection items
38 | - msn.com: first 10 articles
39 | - tribunnews.com: first 10 articles
40 | - twitch.tv: top 10 games listing
41 | - yandex.ru: top 10 clips in the first category shown
42 | - imdb.com: top 10 of movies opening this week
43 | - aws.amazon.com: the first product page from the first 10 products (reading order)
44 | - booking.com: search in first 10 places listed
45 | - medium.com: top 10 features stories
46 | - detik.com: top 10 in news feed
47 | - bbc.com: top 10 articles
48 | - indeed.com: top 10 popular job searches
49 | - w3schools.com: top 10 links left column
50 | - nytimes.com: first 10 articles
51 | - cnn.com: first 10 articles, mixed content types
52 | - imgur.com: top 10 viral
53 | - fandom.com: top 10 articles with big underlineable links
54 | - stackexchange.com: top 10 hot links
55 | - soundcloud.com: top 10 trending
56 | - github.com: top 10 trending today
57 | - nih.gov: 10 latest news releases
58 | - theguardian.com: 10 frontpage linked articles
59 | - slideshare.net: top 10 featured slides
60 | - sindonews.com: top 10 "TERPOPULER"
61 | - freepik.com: top 10 Feepik's choice
62 | - uol.com.br: 10 articles on frontpage
63 | - walmart.com: top 10 shop categories
64 | - etsy.com: top 10 personalized jewellery from frontpage
65 | - wikihow.com: top 10 linked wikis
66 | - craigslist.org: 10 different cities in different US states
67 | - ladbible.com: top 10 trending
68 | - archive.org: top 10 collections at the archive
69 | - nicovideo.jp: top 10 ranked videos
70 | - setn.com: 10 non-video articles on frontpage
71 | - forbes.com: 10 popular articles
72 | - thepiratebay.org: 10 top categories listings
73 | - pixabay.com: 10 popular image categories
74 | - gfycat.com: top 10 trending gifs
75 | - healthline.com: 10 first articles
76 | - dictionary.com: 10 random articles
77 | - suara.com: 10 newsarticles
78 | - sciencedirect.com: 10 article listing
79 | - foxnews.com: 10 frontpage articles


--------------------------------------------------------------------------------
/collect-traces/lists/unmonitored/README.md:
--------------------------------------------------------------------------------
 1 | # Unmonitored lists from reddit
 2 | Using the praw library to access the reddit API, on the 4th of December 2019:
 3 | 
 4 | - `reddit-front-year.list` consists of 11716 URLs filtered from 51070
 5 |   submissions to r/frontpage limited to submissions this year
 6 | - `reddit-front-all.list` consists of 14167 URLs filtered from 54512 submissions
 7 |   to r/frontpage wit no time filter (all)
 8 | 
 9 | The filtering was done with `reddit.py`, using its built-in blacklist, as well
10 | as the monitored file `top-50-selected-multi.list`.
11 | 


--------------------------------------------------------------------------------
/collect-traces/lists/unmonitored/reddit.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | import argparse
 3 | import sys
 4 | import os
 5 | import praw
 6 | from urllib.parse import urlsplit
 7 | 
 8 | ap = argparse.ArgumentParser()
 9 | ap.add_argument("-m", required=True, default="",
10 |     help="location of monitored list to load")
11 | ap.add_argument("-u", required=True, default="",
12 |     help="location of unmonitored list to save")
13 | ap.add_argument("-n", required=True, type=int,
14 |     help="the total number of unique sites to get")
15 | args = vars(ap.parse_args())
16 | 
17 | blacklist = [
18 |     # remove direct image links
19 |     ".gif",
20 |     ".jpg",
21 |     ".jpeg",
22 |     ".png",
23 |     # remove image hosting sites
24 |     "redd.it",
25 |     "reddit.com",
26 |     "reddituploads.com",
27 |     "imgur.com",
28 |     "gfycat.com",
29 |     # youtube and twitter both treat Tor bad
30 |     "youtube.com",
31 |     "youtu.be",
32 |     "twitter.com",
33 | ]
34 | 
35 | def main():
36 |     if not os.path.exists(args["m"]):
37 |         sys.exit(f"{args['m']}, no such file (argument -m)")
38 |     if os.path.exists(args["u"]):
39 |         sys.exit(f"{args['u']} already exists")
40 | 
41 |     # load monitored list, clean the urls, filter on base
42 |     monitored = get_sites_list()
43 | 
44 |     # loop over submissions until done
45 |     reddit = praw.Reddit(client_id='REPLACE',
46 |                     client_secret='REPLACE',
47 |                     password='REPLACE',
48 |                     user_agent='a research python script by /u/REPLACE, collecting URLs for website fingerprinting attacks',
49 |                     username='REPLACE')    
50 |     unmonitored = []
51 |     count = 0
52 |     with open(args["u"], 'w') as f:
53 |         for submission in reddit.subreddit("all").top(time_filter="year", limit=args["n"]):
54 |             count += 1
55 |             # https://praw.readthedocs.io/en/latest/code_overview/models/submission.html
56 |             base = base_url(submission.url)
57 |             if not any(b in submission.url for b in blacklist):
58 |                 if not any(m in base for m in monitored):
59 |                     print(f"base {base}\t full {submission.url}")
60 |                     if not submission.url in unmonitored:
61 |                         unmonitored.append(submission.url)
62 |                         f.write(f"{submission.url}\n")
63 | 
64 |     print(f"\ngot {len(unmonitored)} sites, {count} submissions")
65 | 
66 | def get_sites_list():
67 |     l = []
68 |     with open(args["m"]) as f:
69 |         for line in f:
70 |             site = base_url(line.rstrip())
71 |             # only add unique base URLs, faster lookup
72 |             if not site in l:
73 |                 l.append(site)
74 |     return l
75 | 
76 | def base_url(u):
77 |     return urlsplit(u).netloc
78 | 
79 | if __name__ == "__main__":
80 |     main()


--------------------------------------------------------------------------------
/collect-traces/server/circpad-server.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import argparse
  3 | import os
  4 | import random
  5 | import socket
  6 | import sys
  7 | from flask import Flask, request
  8 | 
  9 | app = Flask(__name__)
 10 | 
 11 | ap = argparse.ArgumentParser()
 12 | ap.add_argument("-l", required=True,
 13 |     help="file with list of sites to visit, one site per line")
 14 | ap.add_argument("-n", required=True, type=int,
 15 |     help="number of samples")
 16 | ap.add_argument("-d", required=True,
 17 |     help="data folder for storing results")
 18 | 
 19 | ap.add_argument("-m", required=False, default=500, type=int,
 20 |     help="minimum number of lines in torlog to accept")
 21 | ap.add_argument("-s", required=False, default=-1, type=int,
 22 |     help="stop collecting at this many logs collected, regardless of remaining sites or samples (useful for unmonitored sites)")
 23 | args = vars(ap.parse_args())
 24 | 
 25 | RESULTSFMT = "{}-{}.log"
 26 | 
 27 | sites = []
 28 | remaining_sites = []
 29 | collected_samples = {}
 30 | total_collected = 0
 31 | 
 32 | def main():
 33 |     global total_collected
 34 | 
 35 |     if not os.path.exists(args["d"]):
 36 |         sys.exit(f"data directory {args['d']} does not exist")
 37 | 
 38 |     print(f"reading sites list {args['l']}")
 39 |     starting_sites = get_sites_list()
 40 |     print(f"ok, list has {len(starting_sites)} starting sites")
 41 | 
 42 |     for site in starting_sites:
 43 |         sites.append(site)
 44 |         remaining_sites.append(site)
 45 |         collected_samples[site] = 0
 46 | 
 47 |         for _ in range(args["n"]):
 48 |             if os.path.isfile(results_file(site)):
 49 |                 total_collected = total_collected + 1
 50 |                 # record the collected sample
 51 |                 collected_samples[site] = collected_samples[site] + 1
 52 |                 # if we got enough samples, all done
 53 |                 if collected_samples[site] >= args["n"]:
 54 |                     remaining_sites.remove(site)
 55 |     
 56 |     if args["s"] > 0 and total_collected >= args["s"]:
 57 |         sys.exit(f"already done, collected {total_collected} logs")
 58 | 
 59 |     if args["s"] > 0:
 60 |         remaining = args['s'] - total_collected
 61 |         print(f"set to collect {args['s']} logs, need {remaining} more")
 62 |     
 63 |     print(f"list has {len(remaining_sites)} remaining sites")
 64 | 
 65 |     app.run(host="0.0.0.0", threaded=False)
 66 | 
 67 | @app.route('/', methods=['GET', 'POST'])
 68 | def handler():
 69 |     if request.method == 'POST':
 70 |         add_log(request.form['log'], request.form['site']) 
 71 |     next = get_next_item()
 72 |     print(f"\tnext item is {next}")
 73 |     return next
 74 | 
 75 | def add_log(log, site):
 76 |     global total_collected
 77 |     
 78 |     # already done?
 79 |     if not site in remaining_sites:
 80 |         print(f"\t got already done site")
 81 |         return
 82 | 
 83 |     log = log.split("\n")
 84 | 
 85 |     if not is_complete_circpad_log(log):
 86 |         print(f"\t got incomplete log for {site}")
 87 |         return
 88 |     
 89 |     print(f"\tgot log of {len(log)} events for site {site}")
 90 | 
 91 |     # store the log
 92 |     with open(results_file(site), 'w') as f:
 93 |         for l in log:
 94 |             f.write(f"{l}\n")
 95 | 
 96 |     # update count of samples
 97 |     collected_samples[site] = collected_samples[site] + 1
 98 |     if collected_samples[site] >= args["n"]:
 99 |         remaining_sites.remove(site)
100 |         total_collected += 1
101 | 
102 | def is_complete_circpad_log(log):
103 |     circuits = circpad_extract_log_traces(log)
104 |     n = 0
105 |     for cid in circuits:
106 |         if len(circuits[cid]) >= args["m"]:
107 |             n += 1
108 |     
109 |     # A "complete" circpad log has exactly one sizeable trace, but at times we
110 |     # get extra traces, e.g., due to TB extensions phoning home. I found that
111 |     # the best approach was to collect potentially some less useful logs and
112 |     # then discard at the end (see extraction tools).
113 |     # return n == 1
114 | 
115 |     # at least we got one log that looks good
116 |     return n >= 1
117 | 
118 | def get_next_item():
119 |     global total_collected
120 | 
121 |     # already done?
122 |     if args["s"] > 0 and total_collected >= args["s"]:
123 |         return ""
124 | 
125 |     # got more work?
126 |     if len(remaining_sites) > 0:
127 |         random.shuffle(remaining_sites)
128 |         return remaining_sites[0]
129 | 
130 |     return ""
131 | 
132 | def get_sites_list():
133 |     l = []
134 |     with open(args["l"]) as f:
135 |         for line in f:
136 |             site = line.rstrip()
137 |             if site in l:
138 |                 print(f"warning, list of sites has duplicate: {site}")
139 |             l.append(site)
140 |     return l
141 | 
142 | def results_file(site):
143 |     index = sites.index(site)
144 |     sample = collected_samples[site]
145 |     return os.path.join(args["d"], RESULTSFMT.format(index, sample))
146 | 
147 | 
148 | CIRCPAD_ERROR_WRONG_FORMAT = "invalid trace format"
149 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin"
150 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent"
151 | 
152 | CIRCPAD_LOG = "circpad_trace_event"
153 | CIRCPAD_LOG_TIMESTAMP = "timestamp="
154 | CIRCPAD_LOG_CIRC_ID = "client_circ_id="
155 | CIRCPAD_LOG_EVENT = "event="
156 | 
157 | CIRCPAD_BLACKLISTED_ADDRESSES = ["aus1.torproject.org"]
158 | CIRCPAD_BLACKLISTED_EVENTS = [
159 |     "circpad_negotiate_logging"
160 | ]
161 | 
162 | def circpad_get_all_addresses(trace):
163 |     addresses = []
164 |     for l in trace:
165 |         if len(l) < 2:
166 |             sys.exit(CIRCPAD_ERROR_WRONG_FORMAT)
167 |         if CIRCPAD_ADDRESS_EVENT in l[1]:
168 |             if len(l[1]) < 2:
169 |                 sys.exit(CIRCPAD_ERROR_WRONG_FORMAT)
170 |             addresses.append(l[1].split()[1])
171 |     return addresses
172 | 
173 | def circpad_parse_line(line):
174 |     split = line.split()
175 |     assert(len(split) >= 2)
176 |     event = split[1]
177 |     timestamp = int(split[0])
178 | 
179 |     return event, timestamp
180 | 
181 | def circpad_lines_to_trace(lines):
182 |     trace = []
183 |     for l in lines:
184 |         event, timestamp = circpad_parse_line(l)
185 |         trace.append((timestamp, event))
186 |     return trace
187 | 
188 | def circpad_extract_log_traces(
189 |     log_lines,
190 |     source_client=True,
191 |     source_relay=True,
192 |     allow_ips=False,
193 |     filter_client_negotiate=False,
194 |     filter_relay_negotiate=False
195 |     ):
196 |     # helper function
197 |     def blacklist_hit(d):
198 |         for a in circpad_get_all_addresses(d):
199 |             if a in CIRCPAD_BLACKLISTED_ADDRESSES:
200 |                 return True
201 |         return False
202 | 
203 |     # helper to extract one line
204 |     def extract_from_line(line):
205 |         n = line.index(CIRCPAD_LOG_TIMESTAMP)+len(CIRCPAD_LOG_TIMESTAMP)
206 |         timestamp = line[n:].split(" ", maxsplit=1)[0]
207 |         n = line.index(CIRCPAD_LOG_CIRC_ID)+len(CIRCPAD_LOG_CIRC_ID)
208 |         cid = line[n:].split(" ", maxsplit=1)[0]
209 | 
210 |         # an event is the last part, no need to split on space like we did earlier
211 |         n = line.index(CIRCPAD_LOG_EVENT)+len(CIRCPAD_LOG_EVENT)
212 |         event = line[n:]
213 |         
214 |         return int(cid), int(timestamp), event
215 | 
216 |     circuits = {}
217 |     base = -1
218 |     for line in log_lines:
219 |         if CIRCPAD_LOG in line:
220 |             # skip client/relay if they shouldn't be part of the trace
221 |             if not source_client and "source=client" in line:
222 |                 continue
223 |             if not source_relay and "source=relay" in line:
224 |                 continue
225 | 
226 |             # extract trace and make timestamps relative
227 |             cid, timestamp, event = extract_from_line(line)
228 |             if base == -1:
229 |                 base = timestamp
230 |             timestamp = timestamp - base
231 | 
232 |             # store trace
233 |             if cid in circuits.keys():
234 |                 circuits[cid] = circuits.get(cid) + [(timestamp, event)]
235 |             else:
236 |                 circuits[cid] = [(timestamp, event)]
237 | 
238 |     # filter out circuits with blacklisted addresses
239 |     for cid in list(circuits.keys()):
240 |         if blacklist_hit(circuits[cid]):
241 |             del circuits[cid]
242 |     # filter out circuits with only IPs (unless arg says otherwise)
243 |     for cid in list(circuits.keys()):
244 |         if not allow_ips and circpad_only_ips_in_trace(circuits[cid]):
245 |             del circuits[cid]
246 | 
247 |     # remove blacklisted events (and associated events)
248 |     for cid in list(circuits.keys()):
249 |         circuits[cid] = circpad_remove_blacklisted_events(circuits[cid],
250 |                         filter_client_negotiate, filter_relay_negotiate)
251 |     
252 |     return circuits
253 |     
254 | 
255 | def circpad_remove_blacklisted_events(
256 |     trace, 
257 |     filter_client_negotiate, 
258 |     filter_relay_negotiate
259 |     ):
260 |     
261 |     result = []
262 |     ignore_next_send_cell = True
263 | 
264 |     for line in trace:
265 |         # If we hit a blacklisted event, this means we should ignore the next
266 |         # sent nonpadding cell. Since the blacklisted event should only be
267 |         # triggered client-side, there shouldn't be any impact on relay traces.
268 |         if any(b in line for b in CIRCPAD_BLACKLISTED_EVENTS):
269 |             ignore_next_send_cell = True
270 |         else:
271 |             if ignore_next_send_cell and CIRCPAD_EVENT_NONPADDING_SENT in line:
272 |                 ignore_next_send_cell = False
273 |             else:
274 |                 result.append(line)
275 |                 
276 |     return result
277 | 
278 | def circpad_only_ips_in_trace(trace):
279 |     def is_ipv4(addr):
280 |         try:
281 |             socket.inet_aton(addr)
282 |         except (socket.error, TypeError):
283 |             return False
284 |         return True
285 |     def is_ipv6(addr):
286 |         try:
287 |             socket.inet_pton(addr,socket.AF_INET6)
288 |         except (socket.error, TypeError):
289 |             return False
290 |         return True
291 | 
292 |     for a in circpad_get_all_addresses(trace):
293 |         if not is_ipv4(a) and not is_ipv6(a):
294 |             return False
295 |     return True
296 | 
297 | if __name__ == '__main__':
298 |     main()
299 | 


--------------------------------------------------------------------------------
/collect-traces/server/run-collect-server.sh:
--------------------------------------------------------------------------------
1 | # Simple helper script. The two runs used for the goodenough dataset using their
2 | # respective lists (see zips). I manually (un)commented the lines below per
3 | # part, the server won't stop when done.
4 | 
5 | #python3.6 circpad-server.py -d safer-mon/ -l top-50-selected-multi.list -n 30 -m 100
6 | python3.6 circpad-server.py -d safer-unmon/ -l reddit-front-year.list -n 1 -s 11000 -m 100
7 | 


--------------------------------------------------------------------------------
/dataset/README.md:
--------------------------------------------------------------------------------
 1 | # The Goodenough dataset
 2 | We set out to create a dataset that better reflects the challenges of an
 3 | attacker than the typical datasets used in the evaluation of Website
 4 | Fingerprinting attacks. The dataset consists of 10,000 monitored samples and
 5 | 10,00 unmonitored samples. The monitored samples represent 50 classes of popular
 6 | websites taken from the Alexa toplist (all within Alexa top-300 at the time of
 7 | collection). For each website/class, we selected 10 webpages to represent that
 8 | class, with the intent of evaluating _webpage-to-website_ fingerprinting. For
 9 | example, for the website reddit.com, we selected 10 URLs to popular subreddits
10 | such as https://www.reddit.com/r/wholesomememes/. Similarly, for wikipedia.org,
11 | we selected articles such as https://en.wikipedia.org/wiki/Dinosaur, etc. The
12 | full list of websites and webpages are available as part of the dataset. We
13 | collected 20 samples per webpage, resulting in 50x10x20=10,000 monitored
14 | samples.
15 | 
16 | As a complement, we collected 10,000 unmonitored webpages from reddit.com/r/all
17 | (top last year). We made sure to exclude webpages of monitored websites, which
18 | include self-hosted images at Reddit. We also excluded direct image links, since
19 | they are too distinct to the monitored webpages, and links to YouTube and
20 | Twitter that have a tendency of not treating traffic from Tor nicely (i.e.,
21 | sporadically blocking access).
22 | 
23 | The dataset consists of:
24 | - complete lists of visited monitored and unmonitored websites
25 | - logs from Tor Browser
26 | - traces extracted for the [circuit padding simulator](https://github.com/pylls/circpad-sim)
27 | - fakerelay traces that are [simulated](https://github.com/pylls/circpad-sim/blob/master/simrelaytrace.py) from the client traces
28 | 
29 | The final traces have all been verified to work fine with the circuit padding
30 | simulator. There are complete sets of traces for the [three security
31 | levels/settings of Tor
32 | Browser](https://tb-manual.torproject.org/security-settings/).
33 | 
34 | We collected the dataset twice so far in the beginning of January and February
35 | to allow for comparisons over time. We did minimal changes to the webpages
36 | visited due to, .e.g., removed content. See list README for details of our
37 | changes. (There's also a dataset from December 2019 but only for some security
38 | levels, reach out in case you're interested.)
39 | 
40 | Download links (may change in the future, please reference this repository):
41 | - https://dart.cse.kau.se/goodenough/goodenough-jan-2020.zip 5.4 GiB compressed, 165 GiB extracted
42 | - https://dart.cse.kau.se/goodenough/goodenough-feb-2020.zip 6.1G compressed, 176 GiB extracted
43 | 
44 | ```
45 | $ sha256sum goodenough-*
46 | 37ab85288ebd8c9059b93716e2b21235a06063d252242f01c4274d0605e28131  goodenough-feb-2020.zip
47 | 82123a774275b9b6830a9208591f4e9c7bf759d12ed690db8694362fbca9bcac  goodenough-jan-2020.zip
48 | ```
49 | 


--------------------------------------------------------------------------------
/evaluation/once.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | """once.py
  4 | 
  5 | This script can be used to once run Deep Fingerprinting (DF) on the goodenough
  6 | dataset and produce some metrics. Right now only working on extracted cells, but
  7 | made to be straightforward to extend.
  8 | 
  9 | There are many parameters to tweak, see the arguments and help below.  Works
 10 | well on my machine with a RTX 2070 (see batch size based on memory). 
 11 | 
 12 | Supports saving and loading datasets as well as models.
 13 | 
 14 | The DF implementation is ported to PyTorch with inspiration from
 15 | https://github.com/lin-zju/deep-fp .
 16 | """
 17 | 
 18 | import argparse
 19 | import os
 20 | import sys
 21 | import numpy as np
 22 | import torch
 23 | import torch.nn as nn
 24 | import torch.nn.functional as F
 25 | from torch.utils import data
 26 | import datetime
 27 | import pickle
 28 | import csv
 29 | import shared
 30 | 
 31 | ap = argparse.ArgumentParser()
 32 | # load and save dataset/model
 33 | ap.add_argument("--ld", required=False, default="",
 34 |     help="load dataset from pickle, provide path to pickled file")
 35 | ap.add_argument("--sd", required=False, default="",
 36 |     help="save dataset, provide path to dump pickled file")
 37 | ap.add_argument("--lm", required=False, default="",
 38 |     help="load model from pickle, provide path to pickled file")
 39 | ap.add_argument("--sm", required=False, default="",
 40 |     help="save model, provide path to dump pickled file")
 41 | 
 42 | ## extra output
 43 | ap.add_argument("--csv", required=False, default=None,
 44 |     help="save resulting metrics in provided path in csv format")
 45 | ap.add_argument("--extra", required=False, default="",
 46 |     help="value of extra column in csv output")
 47 | 
 48 | # extract/train new dataset/model
 49 | ap.add_argument("--ed", required=False, default="",
 50 |     help="extract dataset, path with {monitored,unmonitored} subfolders")
 51 | ap.add_argument("--train", required=False, default=False,
 52 |     action="store_true", help="train model")
 53 | 
 54 | # experiment parameters
 55 | ap.add_argument("--epochs", required=False, type=int, default=30,
 56 |     help="the number of epochs for training")
 57 | ap.add_argument("--batchsize", required=False, type=int, default=750,
 58 |     help="batch size")
 59 | ap.add_argument("-f", required=False, type=int, default=0,
 60 |     help="the fold number (partition offset)")
 61 | ap.add_argument("-l", required=False, type=int, default=5000,
 62 |     help="max input length used in DF")
 63 | ap.add_argument("-z", required=False, default="",
 64 |     help="zero each sample between sample[zero], e.g., 0:10 for the first 10 cells")
 65 | 
 66 | # dataset dimensions
 67 | ap.add_argument("-c", required=False, type=int, default=50,
 68 |     help="the number of monitored classes")
 69 | ap.add_argument("-p", required=False, type=int, default=10,
 70 |     help="the number of partitions")
 71 | ap.add_argument("-s", required=False, type=int, default=20,
 72 |     help="the number of samples")
 73 | args = vars(ap.parse_args())
 74 | 
 75 | def now():
 76 |     return datetime.datetime.now().strftime("%H:%M:%S")
 77 | 
 78 | def main():
 79 |     if (
 80 |         (args["ld"] == "" and args["ed"] == "") or 
 81 |         (args["ld"] != "" and args["ed"] != "")
 82 |     ):
 83 |         sys.exit(f"needs exactly one of --ld and --ed")
 84 | 
 85 | 
 86 |     dataset, labels = {}, {}
 87 |     if args["ld"] != "":
 88 |         print(f"attempting to load dataset from pickle file {args['ld']}")
 89 |         dataset, labels = pickle.load(open(args["ld"], "rb"))
 90 |         # flatten dataset with extra details (generated by tweak.py)
 91 |         for k in dataset:
 92 |             dataset[k][0][dataset[k][0] > 1.0] = 1.0
 93 |             dataset[k][0][dataset[k][0] < -1.0] = -1.0
 94 | 
 95 |     else:
 96 |         if not os.path.isdir(args["ed"]):
 97 |             sys.exit(f"{args['ed']} is not a directory")
 98 | 
 99 |         mon_dir = os.path.join(args["ed"], "monitored")
100 |         if not os.path.isdir(mon_dir):
101 |             sys.exit(f"{mon_dir} is not a directory")
102 | 
103 |         unm_dir = os.path.join(args["ed"], "unmonitored")
104 |         if not os.path.isdir(unm_dir):
105 |             sys.exit(f"{unm_dir} is not a directory")
106 | 
107 |         print(f"{now()} starting to load dataset from folder...")
108 |         dataset, labels = shared.load_dataset(
109 |             mon_dir,
110 |             unm_dir,
111 |             args["c"],
112 |             args["p"],
113 |             args["s"],
114 |             args["l"],
115 |             shared.trace2cells
116 |         )
117 |         if args["sd"] != "":
118 |             pickle.dump((dataset, labels), open(args["sd"], "wb"))
119 |             print(f"saved dataset to {args['sd']}")
120 | 
121 |     print(f"{now()} loaded {len(dataset)} items in dataset with {len(labels)} labels")
122 | 
123 |     split = shared.split_dataset(args["c"], args["p"], args["s"], args["f"], labels)
124 |     print(
125 |         f"{now()} split {len(split['train'])} training, "
126 |         f"{len(split['validation'])} validation, and "
127 |         f"{len(split['test'])} testing"
128 |     )
129 | 
130 |     if args["z"] != "":
131 |         dataset = shared.zero_dataset(dataset, args["z"])
132 |         print(f"{now()} zeroed each item in dataset as data[{args['z']}]")
133 | 
134 |     model = DFNet(args["c"]+1) # one class for unmonitored
135 |     if args["lm"] != "":
136 |         model = torch.load(args["lm"])
137 |         print(f"loaded model from {args['lm']}")
138 | 
139 |     device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
140 |     if torch.cuda.is_available():
141 |         print(f"{now()} using {torch.cuda.get_device_name(0)}")
142 |         model.cuda()
143 | 
144 |     if args["train"]:
145 |         # Note below that shuffle=True is *essential*, 
146 |         # see https://stackoverflow.com/questions/54354465/
147 |         train_gen = data.DataLoader(
148 |             shared.Dataset(split["train"], dataset, labels),
149 |             batch_size=args["batchsize"], shuffle=True,
150 |         )
151 |         validation_gen = data.DataLoader(
152 |             shared.Dataset(split["validation"], dataset, labels),
153 |             batch_size=args["batchsize"], shuffle=True,
154 |         )
155 | 
156 |         optimizer = torch.optim.Adamax(params=model.parameters())
157 |         criterion = torch.nn.CrossEntropyLoss()
158 | 
159 |         for epoch in range(args["epochs"]):
160 |             print(f"{now()} epoch {epoch}")
161 | 
162 |             # training
163 |             model.train()
164 |             torch.set_grad_enabled(True)
165 |             running_loss = 0.0
166 |             n = 0
167 |             for x, Y in train_gen:
168 |                 x, Y = x.to(device), Y.to(device)
169 |                 optimizer.zero_grad()
170 |                 outputs = model(x)
171 |                 loss = criterion(outputs, Y)
172 |                 loss.backward()
173 |                 optimizer.step()
174 |                 running_loss += loss.item()
175 |                 n+=1
176 |             print(f"\ttraining loss {running_loss/n}")
177 | 
178 |             # validation
179 |             model.eval()
180 |             torch.set_grad_enabled(False)
181 |             running_corrects = 0
182 |             n = 0
183 |             for x, Y in validation_gen:
184 |                 x, Y = x.to(device), Y.to(device)
185 | 
186 |                 outputs = model(x)
187 |                 _, preds = torch.max(outputs, 1)
188 |                 running_corrects += torch.sum(preds == Y)
189 |                 n += len(Y)
190 |             print(f"\tvalidation accuracy {float(running_corrects)/float(n)}")
191 | 
192 |         if args["sm"] != "":
193 |             torch.save(model, args["sm"])
194 |             print(f"saved model to {args['sm']}")
195 |     
196 |     # testing
197 |     testing_gen = data.DataLoader(
198 |         shared.Dataset(split["test"], dataset, labels), 
199 |         batch_size=args["batchsize"]
200 |     )
201 |     model.eval()
202 |     torch.set_grad_enabled(False)
203 |     predictions = []
204 |     p_labels = []
205 |     for x, Y in testing_gen:
206 |         x = x.to(device)
207 |         outputs = model(x)
208 |         index = F.softmax(outputs, dim=1).data.cpu().numpy()
209 |         predictions.extend(index.tolist())
210 |         p_labels.extend(Y.data.numpy().tolist())
211 | 
212 |     print(f"{now()} made {len(predictions)} predictions with {len(p_labels)} labels")
213 |     csvline = []
214 |     threshold = np.append([0], 1.0 - 1 / np.logspace(0.05, 2, num=15, endpoint=True))
215 |     threshold = np.around(threshold, decimals=4)
216 |     for th in threshold:
217 |         tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1 = shared.metrics(th, 
218 |                                             predictions, p_labels, args["c"])
219 |         print(
220 |             f"\tthreshold {th:4.2}, "
221 |             f"recall {recall:4.2}, "
222 |             f"precision {precision:4.2}, "
223 |             f"F1 {f1:4.2}, "
224 |             f"accuracy {accuracy:4.2}   "
225 |             f"[tp {tp:>5}, fpp {fpp:>5}, fnp {fnp:>5}, tn {tn:>5}, fn {fn:>5}]"
226 |         )
227 |         csvline.append([
228 |             th, recall, precision, f1, tp, fpp, fnp, tn, fn, args["extra"]
229 |         ])
230 | 
231 |     if args["csv"]:
232 |         with open(args["csv"], "w", newline="") as csvfile:
233 |             w = csv.writer(csvfile, delimiter=",")
234 |             w.writerow(["th", "recall", "precision", "f1", "tp", "fpp", "fnp", "tn", "fn", "extra"])
235 |             w.writerows(csvline)
236 |         print(f"saved testing results to {args['csv']}")
237 | 
238 | class DFNet(nn.Module):
239 |     def __init__(self, classes, fc_in_features = 512*10):
240 |         super(DFNet, self).__init__()
241 |         # sources used when writing this, struggled with the change in output
242 |         # size due to the convolutions and stumbled upon below:
243 |         # - https://github.com/lin-zju/deep-fp/blob/master/lib/modeling/backbone/dfnet.py
244 |         # - https://ezyang.github.io/convolution-visualizer/index.html
245 |         self.kernel_size = 7
246 |         self.padding_size = 3
247 |         self.pool_stride_size = 4
248 |         self.pool_size = 7
249 | 
250 |         self.block1 = self.__block(1, 32, nn.ELU())
251 |         self.block2 = self.__block(32, 64, nn.ReLU())
252 |         self.block3 = self.__block(64, 128, nn.ReLU())
253 |         self.block4 = self.__block(128, 256, nn.ReLU())
254 | 
255 |         self.fc = nn.Sequential(
256 |             nn.Linear(fc_in_features, 512),
257 |             nn.BatchNorm1d(512),
258 |             nn.ReLU(),
259 |             nn.Dropout(0.7),
260 |             nn.Linear(512, 512),
261 |             nn.BatchNorm1d(512),
262 |             nn.ReLU(),
263 |             nn.Dropout(0.5)
264 |         )
265 | 
266 |         self.prediction = nn.Sequential(
267 |             nn.Linear(512, classes),
268 |             # when using CrossEntropyLoss, already computed internally
269 |             #nn.Softmax(dim=1) # dim = 1, don't softmax batch
270 |         )
271 |     
272 |     def __block(self, channels_in, channels, activation):
273 |         return nn.Sequential(
274 |             nn.Conv1d(channels_in, channels, self.kernel_size, padding=self.padding_size),
275 |             nn.BatchNorm1d(channels),
276 |             activation,
277 |             nn.Conv1d(channels, channels, self.kernel_size, padding=self.padding_size),
278 |             nn.BatchNorm1d(channels),
279 |             activation,
280 |             nn.MaxPool1d(self.pool_size, stride=self.pool_stride_size, padding=self.padding_size),
281 |             nn.Dropout(p=0.1)
282 |         )
283 | 
284 |     def forward(self, x):
285 |         x = self.block1(x)
286 |         x = self.block2(x)
287 |         x = self.block3(x)
288 |         x = self.block4(x)
289 |         x = x.flatten(start_dim=1) # dim = 1, don't flatten batch
290 |         x = self.fc(x)
291 |         x = self.prediction(x)
292 | 
293 |         return x
294 | 
295 | if __name__ == "__main__":
296 |     main()


--------------------------------------------------------------------------------
/evaluation/overhead.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | import argparse
 3 | import shared
 4 | import pickle
 5 | import sys
 6 | import os
 7 | import numpy as np
 8 | 
 9 | ap = argparse.ArgumentParser()
10 | ap.add_argument("--ld", required=True,
11 |     help="load dataset from pickle, provide path to pickled file")
12 | args = vars(ap.parse_args())
13 | 
14 | def main():
15 |     '''Bandwidth overhead is based on number of padding and non-padding cells in
16 |     all traces.
17 |     '''
18 |     print(f"attempting to load dataset from pickle file {args['ld']}")
19 |     dataset, labels = pickle.load(open(args["ld"], "rb"))
20 | 
21 |     t_sent_padding = []
22 |     t_sent_nonpadding = []
23 |     t_sent_overhead = []
24 |     t_recv_padding = []
25 |     t_recv_nonpadding = []
26 |     t_recv_overhead = []
27 | 
28 |     for trace in dataset:
29 |         unique, counts = np.unique(dataset[trace][0], return_counts=True)
30 |         d = dict(zip(unique, counts))
31 |         sent_nonpadding = d[1]
32 |         recv_nonpadding = d[-1]
33 |         
34 |         sent_padding = 0
35 |         if 2 in d:
36 |             sent_padding = d[2]
37 | 
38 |         recv_padding = 0
39 |         if -2 in d:
40 |             recv_padding = d[-2]
41 |             
42 |         if sent_nonpadding == 0:
43 |             sys.exit(f"sent 0 nonpadding cells, broken trace?")
44 |         if recv_nonpadding == 0:
45 |             sys.exit(f"recv 0 nonpadding cells, broken trace?")
46 | 
47 |         t_sent_padding.append(sent_padding)
48 |         t_sent_nonpadding.append(sent_nonpadding)
49 |         t_sent_overhead.append(float(sent_padding+sent_nonpadding) / float(sent_nonpadding))
50 | 
51 |         t_recv_padding.append(recv_padding)
52 |         t_recv_nonpadding.append(recv_nonpadding)
53 |         t_recv_overhead.append(float(recv_padding+recv_nonpadding) / float(recv_nonpadding))
54 | 
55 |     sent_padding = sum(t_sent_padding)
56 |     sent_nonpadding = sum(t_sent_nonpadding)
57 |     sent_cells = sent_padding + sent_nonpadding
58 |     
59 |     recv_padding = sum(t_recv_padding)
60 |     recv_nonpadding = sum(t_recv_nonpadding)
61 |     recv_cells = recv_padding + recv_nonpadding
62 | 
63 |     total_cells = sent_cells + recv_cells
64 | 
65 |     avg_sent = float(sent_cells)/float(sent_nonpadding)
66 |     avg_recv = float(recv_cells)/float(recv_nonpadding)
67 |     avg_total = float(total_cells)/float(recv_nonpadding+sent_nonpadding)
68 | 
69 |     print(f"in total for {len(t_sent_padding)} traces:")
70 |     print(f"\t- {total_cells} cells")
71 |     print(f"\t- {avg_total:.0%} average total bandwidth")
72 |     
73 |     print(f"\t- {sent_cells} sent cells ({float(sent_cells)/float(total_cells):.0%})")
74 |     print(f"\t\t- {sent_nonpadding} nonpadding")
75 |     print(f"\t\t- {sent_padding} padding")
76 |     print(f"\t\t- {avg_sent:.0%} average sent bandwidth")
77 | 
78 |     print(f"\t- {recv_cells} recv cells ({float(recv_cells)/float(total_cells):.0%})")
79 |     print(f"\t\t- {recv_nonpadding} nonpadding")
80 |     print(f"\t\t- {recv_padding} padding")
81 |     print(f"\t\t- {avg_recv:.0%} average recv bandwidth")
82 | 
83 | if __name__ == "__main__":
84 |     main()


--------------------------------------------------------------------------------
/evaluation/shared.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import numpy as np
  3 | import os
  4 | import sys
  5 | from torch.utils import data
  6 | 
  7 | def metrics(threshold, predictions, labels, label_unmon):
  8 |     ''' Computes a range of metrics.
  9 | 
 10 |     For details on the metrics, see, e.g., https://www.cs.kau.se/pulls/hot/baserate/
 11 |     '''
 12 |     tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1 = 0, 0, 0, 0, 0, 0.0, 0.0, 0.0, 0.0
 13 | 
 14 |     for i in range(len(predictions)):
 15 |         label_pred = np.argmax(predictions[i])
 16 |         prob_pred = max(predictions[i])
 17 |         label_correct = labels[i]
 18 | 
 19 |         # we split on monitored or unmonitored correct label
 20 |         if label_correct != label_unmon:
 21 |             # either confident and correct,
 22 |             if prob_pred >= threshold and label_pred == label_correct:
 23 |                 tp = tp + 1
 24 |             # confident and wrong monitored label, or
 25 |             elif prob_pred >= threshold and label_pred != label_unmon:
 26 |                 fpp = fpp + 1
 27 |             # wrong because not confident or predicted unmonitored for monitored
 28 |             else:
 29 |                 fn = fn + 1
 30 |         else:
 31 |             if prob_pred < threshold or label_pred == label_unmon: # correct prediction?
 32 |                 tn = tn + 1
 33 |             elif label_pred < label_unmon: # predicted monitored for unmonitored
 34 |                 fnp = fnp + 1
 35 |             else: # this should never happen
 36 |                 sys.exit(f"this should never, wrongly labelled data for {label_pred}")
 37 | 
 38 |     if tp + fn + fpp > 0:
 39 |         recall = round(float(tp) / float(tp + fpp + fn), 4)
 40 |     if tp + fpp + fnp > 0:
 41 |         precision = round(float(tp) / float(tp + fpp + fnp), 4)
 42 | 
 43 |     if precision > 0 and recall > 0:
 44 |         f1 = round(2*((precision*recall)/(precision+recall)), 4)
 45 | 
 46 |     accuracy = round(float(tp + tn) / float(tp + fpp + fnp + fn + tn), 4)
 47 | 
 48 |     return tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1
 49 | 
 50 | 
 51 | class Dataset(data.Dataset):
 52 |    def __init__(self, ids, dataset, labels):
 53 |        self.ids = ids
 54 |        self.dataset = dataset
 55 |        self.labels = labels
 56 | 
 57 |    def __len__(self):
 58 |        return len(self.ids)
 59 | 
 60 |    def __getitem__(self, index):
 61 |        ID = self.ids[index]
 62 |        return self.dataset[ID], self.labels[ID]
 63 | 
 64 | def load_dataset(
 65 |     mon_dir, unm_dir, 
 66 |     classes, partitions, samples,
 67 |     length, extract_func
 68 |     ):
 69 |     ''' Loads the dataset from disk into two dictionaries for data and labels.
 70 | 
 71 |     The dictionaries are indexed by sample ID. The ID encodes if its a monitored
 72 |     or unmonitored sample to make it easier to debug, as well as some info about
 73 |     the corresponding data file on disk. 
 74 | 
 75 |     This function works assumes the structure of the following dataset:
 76 |     - "top50-partitioned-reddit-levels-cirucitpadding" 
 77 |     '''
 78 |     data = {}
 79 |     labels = {}
 80 | 
 81 |     # load monitored data
 82 |     for c in range(0,classes):
 83 |         for p in range(0,partitions):
 84 |             site = c*10 + p
 85 |             for s in range(0,samples):
 86 |                 ID = f"m-{c}-{p}-{s}"
 87 |                 labels[ID] = c
 88 | 
 89 |                 # file format is {site}-{sample}.trace
 90 |                 fname = f"{site}-{s}.trace"
 91 |                 with open(os.path.join(mon_dir, fname), "r") as f:
 92 |                     data[ID] = extract_func(f.read(), length)
 93 | 
 94 |     # load unmonitored data
 95 |     dirlist = os.listdir(unm_dir)
 96 |     # make sure we only load a balanced dataset
 97 |     dirlist = dirlist[:len(data)]
 98 |     for fname in dirlist:
 99 |         ID = f"u-{fname}"
100 |         labels[ID] = classes # start from 0 for monitored
101 |         with open(os.path.join(unm_dir, fname), "r") as f:
102 |             data[ID] = extract_func(f.read(), length)
103 | 
104 |     return data, labels
105 | 
106 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent"
107 | CIRCPAD_EVENT_NONPADDING_RECV = "circpad_cell_event_nonpadding_received"
108 | CIRCPAD_EVENT_PADDING_SENT = "circpad_cell_event_padding_sent"
109 | CIRCPAD_EVENT_PADDING_RECV = "circpad_cell_event_padding_received"
110 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin"
111 | 
112 | def trace2cells(log, length, strip=True):
113 |     ''' A fast specialised function to generate cells from a trace.
114 | 
115 |     Based on circpad_to_wf() in circpad-sim/common.py, but only for cells.
116 |     '''
117 |     data = np.zeros((1, length), dtype=np.float32)
118 |     n = 0
119 | 
120 |     if strip:
121 |         for i, line in enumerate(log):
122 |             if CIRCPAD_ADDRESS_EVENT in line:
123 |                 log = log[i:]
124 |                 break
125 | 
126 |     s = log.split("\n")
127 |     for line in s:
128 |         # outgoing is positive
129 |         if CIRCPAD_EVENT_NONPADDING_SENT in line or \
130 |            CIRCPAD_EVENT_PADDING_SENT in line:
131 |             data[0][n] = 1.0
132 |             n += 1
133 |         # incoming is negative
134 |         elif CIRCPAD_EVENT_NONPADDING_RECV in line or \
135 |            CIRCPAD_EVENT_PADDING_RECV in line:
136 |             data[0][n] = -1.0
137 |             n += 1
138 |         
139 |         if n == length:
140 |             break
141 | 
142 |     return data
143 | 
144 | def split_dataset(
145 |     classes, partitions, samples, fold, labels,
146 |     ):
147 |     '''Splits the dataset based on fold.
148 | 
149 |     The split is only based on IDs, not the actual data. The result is a 8:1:1
150 |     split into training, validation, and testing.
151 |     '''
152 |     training = []
153 |     validation = []
154 |     testing = []
155 | 
156 |     # monitored, split by _partition_
157 |     for c in range(0,classes):
158 |         for p in range(0,partitions):
159 |             for s in range(0,samples):
160 |                 ID = f"m-{c}-{p}-{s}"
161 |                 i = (p+fold) % partitions
162 | 
163 |                 if i < partitions-2:
164 |                     training.append(ID)
165 |                 elif i < partitions-1:
166 |                     validation.append(ID)
167 |                 else:
168 |                     testing.append(ID)
169 | 
170 |     # unmonitored
171 |     counter = 0
172 |     for k in labels.keys():
173 |         if not k.startswith("u"):
174 |             continue
175 |         i = (counter+fold) % partitions
176 |         if i < partitions-2:
177 |             training.append(k)
178 |         elif i < partitions-1:
179 |             validation.append(k)
180 |         else:
181 |             testing.append(k)     
182 |         counter += 1
183 | 
184 |     split = {}
185 |     split["train"] = training
186 |     split["validation"] = validation
187 |     split["test"] = testing
188 |     return split
189 | 
190 | def zero_dataset(dataset, z):
191 |     index = z.split(":")
192 |     start = int(index[0])
193 |     stop = int(index[1])
194 |     data = np.zeros((stop-start), dtype=np.float32)
195 |     for k, v in dataset.items():
196 |         v[:,start:stop] = data
197 |         dataset[k] = v
198 |     return dataset


--------------------------------------------------------------------------------
/evaluation/tweak.md:
--------------------------------------------------------------------------------
 1 | # Howto use tweak.py
 2 | 
 3 | Below is an example of how to run `tweak.py`:
 4 | ```
 5 | ./tweak.py --client dataset-feb/standard/client-traces/ --relay dataset-feb/standard/fakerelay-traces/ -t ../tor --mc client-machine --mr relay-machine --save tmp.pkl
 6 | ```
 7 | 
 8 | The help output explains most flags:
 9 | 
10 | ```
11 | usage: tweak.py [-h] --client CLIENT --relay RELAY [-c C] [-p P] [-s S] -t T [-w W] [-l L] --mc MC --mr MR --save SAVE
12 | 
13 | optional arguments:
14 |   -h, --help       show this help message and exit
15 |   --client CLIENT  input folder of client circpadtrace files
16 |   --relay RELAY    input folder of relay circpadtrace files
17 |   -c C             the number of monitored classes
18 |   -p P             the number of partitions
19 |   -s S             the number of samples
20 |   -t T             path to tor folder (bob/tor, not bob/tor/src)
21 |   -w W             number of workers for simulating machines
22 |   -l L             max length of extracted cells
23 |   --mc MC          path to file of client machine (c-code) to tweak
24 |   --mr MR          path to file of relay machine (c-code) to tweak
25 |   --save SAVE      file to save results to
26 | ```
27 | 
28 | The expected input format (`--client` and `--relay`) is that of the dataset in
29 | this repository. For the tor folder (`-t`), see
30 | [circpad-sim](https://github.com/pylls/circpad-sim). Machines you tweak (`--mc`
31 | and `--mr`) have to be of the appropriate format. Several examples are available
32 | in
33 | [machines/phase2/](https://github.com/pylls/padding-machines-for-tor/tree/master/machines/phase2/).
34 | 
35 | For how to use tweak.py as part of tweaking a padding machine, see `tweak.sh` in
36 | this folder and the [phase 2
37 | writeup]((https://github.com/pylls/padding-machines-for-tor/tree/master/machines/phase2/)).
38 | 


--------------------------------------------------------------------------------
/evaluation/tweak.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | ''' Tweak a pair of machines. 
  3 | 
  4 | The goal is to be able to rapidly tweak a single pair of machines as fast as
  5 | possible against DF. '''
  6 | import argparse
  7 | import sys
  8 | import os
  9 | import subprocess
 10 | import tempfile
 11 | import signal
 12 | import numpy as np
 13 | import pickle
 14 | from multiprocessing import Pool
 15 | import logging
 16 | import shared
 17 | 
 18 | logging.basicConfig(level = logging.INFO, format = "%(asctime)s %(message)s")
 19 | 
 20 | ap = argparse.ArgumentParser()
 21 | # dataset and its dimensions, assuming same count unmon as mon
 22 | ap.add_argument("--client", required=True, 
 23 |     help="input folder of client circpadtrace files")
 24 | ap.add_argument("--relay", required=True, 
 25 |     help="input folder of relay circpadtrace files")
 26 | ap.add_argument("-c", required=False, type=int, default=50,
 27 |     help="the number of monitored classes")
 28 | ap.add_argument("-p", required=False, type=int, default=10,
 29 |     help="the number of partitions")
 30 | ap.add_argument("-s", required=False, type=int, default=20,
 31 |     help="the number of samples")
 32 | 
 33 | # exp
 34 | ap.add_argument("-t", required=True, 
 35 |     help="path to tor folder (bob/tor, not bob/tor/src)")
 36 | ap.add_argument("-w", required=False, type=int, default=10,
 37 |     help="number of workers for simulating machines")
 38 | ap.add_argument("-l", required=False, type=int, default=5000,
 39 |     help="max length of extracted cells")
 40 | 
 41 | # machines to tweak
 42 | ap.add_argument("--mc", required=True, 
 43 |     help="path to file of client machine (c-code) to tweak")
 44 | ap.add_argument("--mr", required=True, 
 45 |     help="path to file of relay machine (c-code) to tweak")
 46 | 
 47 | # pickle dump results
 48 | ap.add_argument("--save", required=True, help="file to save results to")
 49 | args = vars(ap.parse_args())
 50 | 
 51 | TOR_CIRCPADSIM_SRC_LOC = "src/test/test_circuitpadding_sim.c"
 52 | CLIENT_MACHINE_TOKEN = "//REPLACE-client-padding-machine-REPLACE"
 53 | RELAY_MACHINE_TOKEN = "//REPLACE-relay-padding-machine-REPLACE"
 54 | TOR_CIRCPADSIM_CMD = os.path.join(args["t"], "src/test/test circuitpadding_sim/..")
 55 | TOR_CIRCPADSIM_CMD_FORMAT = f"{TOR_CIRCPADSIM_CMD} --info --circpadsim {{}} {{}} 1"
 56 | 
 57 | tmpdir = tempfile.mkdtemp()
 58 | original_src = "" 
 59 | src_path = os.path.join(args["t"], TOR_CIRCPADSIM_SRC_LOC)
 60 | 
 61 | def main():
 62 |     # properly restore tor source when closed
 63 |     signal.signal(signal.SIGINT, sigint_handler)
 64 | 
 65 |     # list of input traces, sorted assuming the matching client and relay traces
 66 |     # have the same name in respective folders
 67 |     c_mon_dir = os.path.join(args["client"], "monitored")
 68 |     if not os.path.isdir(c_mon_dir):
 69 |         sys.exit(f"{c_mon_dir} is not a directory")
 70 |     c_unm_dir = os.path.join(args["client"], "unmonitored")
 71 |     if not os.path.isdir(c_unm_dir):
 72 |         sys.exit(f"{c_unm_dir} is not a directory")
 73 |     r_mon_dir = os.path.join(args["relay"], "monitored")
 74 |     if not os.path.isdir(r_mon_dir):
 75 |         sys.exit(f"{r_mon_dir} is not a directory")
 76 |     r_unm_dir = os.path.join(args["relay"], "unmonitored")
 77 |     if not os.path.isdir(r_unm_dir):
 78 |         sys.exit(f"{r_unm_dir} is not a directory")
 79 | 
 80 |     logging.info(f"loading original traces")
 81 |     labels, fnames_client, fnames_relay = load_dataset(
 82 |         c_mon_dir, c_unm_dir,
 83 |         r_mon_dir, r_unm_dir,
 84 |         args["c"], args["p"], args["s"]
 85 |     )
 86 |     logging.info(f"loaded {len(labels)} traces")
 87 | 
 88 |     # load machines to tweak
 89 |     with open(args["mc"], "r") as f:
 90 |         mc = f.read()
 91 |     with open(args["mr"], "r") as f:
 92 |         mr = f.read()
 93 | 
 94 |     logging.info(f"adding machines")
 95 |     add_machines(mc, mr)
 96 | 
 97 |     logging.info("simulating machines")
 98 |     client_traces, _ = simulate_machines(labels, fnames_client, fnames_relay, extract_cells_detailed)
 99 | 
100 |     logging.info(f"pickle dump to {args['save']}")
101 |     pickle.dump((client_traces, labels), open(args["save"], "wb"))
102 | 
103 |     logging.info(f"done")
104 | 
105 | def add_machines(client, relay):
106 |     # read source
107 |     global original_src, src_path
108 |     if original_src == "":
109 |         with open(src_path, "r") as myfile:
110 |             original_src = myfile.read()
111 |     assert(original_src != "")
112 |     assert(CLIENT_MACHINE_TOKEN in original_src)
113 |     assert(RELAY_MACHINE_TOKEN in original_src)
114 | 
115 |     # replace with machines and save the modified source
116 |     modified_src = original_src.replace(CLIENT_MACHINE_TOKEN, client)
117 |     modified_src = modified_src.replace(RELAY_MACHINE_TOKEN, relay)
118 |     with open(src_path, "w") as f:
119 |         f.write(modified_src)
120 | 
121 |     # make new machines, then restore original source
122 |     make_tor()
123 |     restore_source()
124 | 
125 | def restore_source():
126 |     global original_src, src_path
127 |     with open(src_path, "w") as f:
128 |         f.write(original_src)
129 | 
130 | def sigint_handler(foo=1, bar=2):
131 |     restore_source()
132 |     sys.exit(0)
133 | 
134 | def make_tor():
135 |     cmd = f"cd {args['t']} && make"
136 |     result = subprocess.run(cmd, stdout=subprocess.DEVNULL, shell=True)
137 |     if result.returncode != 0:
138 |             logging.info(cmd)
139 |     assert(result.returncode == 0)
140 | 
141 | def simulate_machines(
142 |     labels, fnames_client, fnames_relay,
143 |     extract_func,
144 |     extract_client=True,
145 |     extract_relay=False,
146 |     ):
147 | 
148 |     todo = []
149 |     logging.info(f"\t\tlisting {len(labels)} traces to simulate")
150 |     for ID in labels:
151 |         todo.append(
152 |             (fnames_client[ID], fnames_relay[ID], ID,
153 |             extract_func, extract_client, extract_relay)
154 |         )
155 | 
156 |     logging.info(f"\t\trunning with {args['w']} workers")
157 |     p = Pool(args["w"])
158 |     results = p.starmap(do_simulate_machines, todo)
159 | 
160 |     logging.info(f"\t\textracting results")
161 |     # ID -> extracted
162 |     out_client = {}
163 |     out_relay = {}
164 |     for result in results:
165 |         if extract_client:
166 |             out_client[result[0]] = result[1]
167 |         if extract_relay:
168 |             out_relay[result[0]] = result[2]
169 | 
170 |     p.close()
171 | 
172 |     return out_client, out_relay
173 | 
174 | def do_simulate_machines(
175 |     client, relay, ID,
176 |     extract_func, extract_client=True, extract_relay=False
177 |     ):
178 |     cmd = TOR_CIRCPADSIM_CMD_FORMAT.format(client, relay)
179 |     result = subprocess.run(cmd, capture_output=True, text=True, shell=True)
180 |     if result.returncode != 0:
181 |         logging.error(f"got returncode {result.returncode} for cmd {cmd}")
182 |     assert(result.returncode == 0)
183 | 
184 |     # parse out the simulated logs, get client and relay traces
185 |     client_out = []
186 |     relay_out = []
187 |     log = result.stdout.split("\n")
188 |     if extract_client:
189 |         client_out = extract_func(log, client=True)
190 |     if extract_relay:
191 |         relay_out = extract_func(log, client=False)
192 | 
193 |     return (ID, client_out, relay_out)
194 | 
195 | def extract_cells_detailed(log, client=True):
196 |     i = 0
197 |     length = args["l"]
198 |     data = np.zeros((1, length), dtype=np.float32)
199 |     for line in log:
200 |         if i >= length:
201 |             break
202 | 
203 |         if client and not "source=client" in line:
204 |             continue
205 |         elif not client and not "source=relay" in line:
206 |             continue
207 | 
208 |         if shared.CIRCPAD_EVENT_NONPADDING_SENT in line:
209 |             data[0][i] = 1.0 # outgoing is positive
210 |             i += 1
211 |         elif shared.CIRCPAD_EVENT_PADDING_SENT in line:
212 |             data[0][i] = 2.0
213 |             i += 1
214 |         elif shared.CIRCPAD_EVENT_NONPADDING_RECV in line:
215 |             data[0][i] = -1.0
216 |             i += 1
217 |         elif shared.CIRCPAD_EVENT_PADDING_RECV in line:
218 |             data[0][i] = -2.0
219 |             i += 1
220 | 
221 |     return data
222 | 
223 | def load_dataset(
224 |     c_mon_dir, c_unm_dir, 
225 |     r_mon_dir, r_unm_dir, 
226 |     classes, partitions, samples
227 |     ):
228 | 
229 |     # ID -> class
230 |     labels = {}
231 |     # ID -> fname
232 |     fnames_client = {}
233 |     fnames_relay = {}
234 | 
235 |     # monitored
236 |     for c in range(0,classes):
237 |         for p in range(0,partitions):
238 |             site = c*10 + p
239 |             for s in range(0,samples):
240 |                 ID = f"m-{c}-{p}-{s}"
241 |                 fname = f"{site}-{s}.trace"
242 | 
243 |                 labels[ID] = c
244 |                 fnames_client[ID] = os.path.join(c_mon_dir, fname)
245 |                 fnames_relay[ID] = os.path.join(r_mon_dir, fname)
246 |                 if not os.path.exists(fnames_client[ID]):
247 |                     sys.exit(f"{fnames_client[ID]} does not exist")
248 |                 if not os.path.exists(fnames_relay[ID]):
249 |                     sys.exit(f"{fnames_relay[ID]} does not exist")
250 | 
251 |     # unmonitored
252 |     dirlist = os.listdir(c_unm_dir)[:len(labels)]
253 |     for fname in dirlist:
254 |         ID = f"u-{fname}"
255 | 
256 |         labels[ID] = classes # start from 0 for monitored
257 |         fnames_client[ID] = os.path.join(c_unm_dir, fname)
258 |         fnames_relay[ID] = os.path.join(r_unm_dir, fname)
259 |         if not os.path.exists(fnames_relay[ID]):
260 |             sys.exit(f"{fnames_relay[ID]} does not exist")
261 | 
262 |     # we need to provide:
263 |     # - for simulate_machines, the full fname with path for each pair
264 |     # - for df, labels as above and a way to get the data from simulate_machines that maps from ID
265 |     # have simulate_machines produce ID -> data for client and relay
266 |     return labels, fnames_client, fnames_relay
267 | 
268 | if __name__ == "__main__":
269 |     main()
270 | 


--------------------------------------------------------------------------------
/evaluation/tweak.sh:
--------------------------------------------------------------------------------
1 | # example of how to tweak machines stored in tmp-mc and tmp-mr, standard february dataset
2 | ./tweak.py --client dataset-feb/standard/client-traces/ --relay dataset-feb/standard/fakerelay-traces/ -t ../tor --mc phase2/strawman-mc --mr phase2/strawman-mr --save tmp.pkl -s $1 -w 8
3 | ./once.py --ld tmp.pkl --train -s $1
4 | ./overhead.py --ld tmp.pkl
5 | ./visualize.py --ld tmp.pkl -s tmp
6 | ./visualize.py --ld tmp.pkl -s tmp-nopadding --hide
7 | 


--------------------------------------------------------------------------------
/evaluation/visualize.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | import argparse
 3 | import sys
 4 | import os
 5 | import random
 6 | import numpy as np
 7 | import pickle
 8 | from PIL import Image
 9 | 
10 | import circpadsim
11 | 
12 | ap = argparse.ArgumentParser()
13 | ap.add_argument("--ld", required=True,
14 |     help="load dataset from pickle, provide path to pickled file")
15 | 
16 | ap.add_argument("-s", default="test",
17 |     help="save filename prefix")
18 | 
19 | # dimensions of the image
20 | ap.add_argument("-x", type=int, default=5000,
21 |     help="image width (x-axis)")
22 | ap.add_argument("-y", type=int, default=1000,
23 |     help="image height (y-axis)")
24 | 
25 | ap.add_argument("--hide", required=False, default=False,
26 |     action="store_true", help="hide padding cells")
27 | args = vars(ap.parse_args())
28 | 
29 | # TOMATO colors below
30 | COLOR_BACKGROUND = [0, 0, 0, 0] # transparent PNG (alpha 0)
31 | COLOR_NONPADDING_RECV = [0, 0, 0, 255] # black - most data is nonpadding received
32 | COLOR_NONPADDING_SENT = [255, 255, 255, 255] # white - sent nonpadding data
33 | COLOR_PADDING_RECV = [170, 57, 57, 255] # red - most padding is received padding
34 | COLOR_PADDING_SENT = [45, 136, 45, 255] # green - outgoing padding
35 | 
36 | def main():
37 |     print(f"attempting to load dataset from pickle file {args['ld']}")
38 |     dataset, _ = pickle.load(open(args["ld"], "rb"))
39 | 
40 |     image = Image.fromarray(get_img_data(dataset, args["y"], args["x"]))
41 |     image.save(open(f"{args['s']}.png", "wb"))
42 | 
43 | def get_img_data(dataset, n, width):
44 |     data = np.full((n, width, 4), COLOR_BACKGROUND, dtype=np.uint8)
45 | 
46 |     for y, k in enumerate(dataset):
47 |         if y >= n:
48 |             break
49 |         x = 0
50 |         for v in dataset[k][0]:
51 |             if x >= width:
52 |                 break
53 |             if v == 1:
54 |                 data[y][x] =COLOR_NONPADDING_SENT
55 |                 x += 1
56 |             elif v == -1:
57 |                 data[y][x] = COLOR_NONPADDING_RECV
58 |                 x += 1
59 |             elif not args["hide"] and v == 2:
60 |                 data[y][x] = COLOR_PADDING_SENT
61 |                 x += 1
62 |             elif not args["hide"] and v == -2:
63 |                 data[y][x] = COLOR_PADDING_RECV
64 |                 x += 1
65 | 
66 |     return data
67 | 
68 | if __name__ == "__main__":
69 |     main()


--------------------------------------------------------------------------------
/evolve/README.md:
--------------------------------------------------------------------------------
1 | # Evolving Machines
2 | 
3 | This is provided as-is for the sake of helping other researchers pondering
4 | burning CPU and GPU for new machines. All code highly research grade, certified
5 | mostly working by trial-and-error running on a single box. If you have any
6 | questions, want to discuss approaches here, or just rant about how bad the code
7 | is feel free to reach out for a chat.
8 | 
9 | The main file is `loop.py`. Happy digging!


--------------------------------------------------------------------------------
/evolve/circpadsim.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import sys
  3 | import socket
  4 | 
  5 | CIRCPAD_ERROR_WRONG_FORMAT = "invalid trace format"
  6 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin"
  7 | 
  8 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent"
  9 | CIRCPAD_EVENT_NONPADDING_RECV = "circpad_cell_event_nonpadding_received"
 10 | CIRCPAD_EVENT_PADDING_SENT = "circpad_cell_event_padding_sent"
 11 | CIRCPAD_EVENT_PADDING_RECV = "circpad_cell_event_padding_received"
 12 | 
 13 | CIRCPAD_LOG = "circpad_trace_event"
 14 | CIRCPAD_LOG_TIMESTAMP = "timestamp="
 15 | CIRCPAD_LOG_CIRC_ID = "client_circ_id="
 16 | CIRCPAD_LOG_EVENT = "event="
 17 | 
 18 | CIRCPAD_BLACKLISTED_ADDRESSES = ["aus1.torproject.org"]
 19 | CIRCPAD_BLACKLISTED_EVENTS = [
 20 |     "circpad_negotiate_logging"
 21 | ]
 22 | 
 23 | def circpad_get_all_addresses(trace):
 24 |     addresses = []
 25 |     for l in trace:
 26 |         if len(l) < 2:
 27 |             sys.exit(CIRCPAD_ERROR_WRONG_FORMAT)
 28 |         if CIRCPAD_ADDRESS_EVENT in l[1]:
 29 |             if len(l[1]) < 2:
 30 |                 sys.exit(CIRCPAD_ERROR_WRONG_FORMAT)
 31 |             addresses.append(l[1].split()[1])
 32 |     return addresses
 33 | 
 34 | def circpad_get_nonpadding_times(trace):
 35 |     sent_nonpadding, recv_nonpadding = [], []
 36 | 
 37 |     for l in trace:
 38 |         split = l.split()
 39 |         if CIRCPAD_EVENT_NONPADDING_SENT in split[1]:
 40 |             sent_nonpadding.append(split[0])
 41 |         elif CIRCPAD_EVENT_NONPADDING_RECV in split[1]:
 42 |             recv_nonpadding.append(split[0])
 43 |     
 44 |     return sent_nonpadding, recv_nonpadding
 45 | 
 46 | def circpad_get_padding_times(trace):
 47 |     sent_padding, recv_padding = [], []
 48 | 
 49 |     for l in trace:
 50 |         split = l.split()
 51 |         if CIRCPAD_EVENT_PADDING_SENT in split[1]:
 52 |             sent_padding.append(split[0])
 53 |         elif CIRCPAD_EVENT_PADDING_RECV in split[1]:
 54 |             recv_padding.append(split[0])
 55 |     
 56 |     return sent_padding, recv_padding
 57 | 
 58 | def circpad_parse_line(line):
 59 |     split = line.split()
 60 |     assert(len(split) >= 2)
 61 |     event = split[1]
 62 |     timestamp = int(split[0])
 63 | 
 64 |     return event, timestamp
 65 | 
 66 | def circpad_lines_to_trace(lines):
 67 |     trace = []
 68 |     for l in lines:
 69 |         event, timestamp = circpad_parse_line(l)
 70 |         trace.append((timestamp, event))
 71 |     return trace
 72 | 
 73 | def circpad_extract_log_traces(
 74 |     log_lines,
 75 |     source_client=True,
 76 |     source_relay=True,
 77 |     allow_ips=False,
 78 |     filter_client_negotiate=False,
 79 |     filter_relay_negotiate=False,
 80 |     max_length=999999999
 81 |     ):
 82 |     # helper function
 83 |     def blacklist_hit(d):
 84 |         for a in circpad_get_all_addresses(d):
 85 |             if a in CIRCPAD_BLACKLISTED_ADDRESSES:
 86 |                 return True
 87 |         return False
 88 | 
 89 |     # helper to extract one line
 90 |     def extract_from_line(line):
 91 |         n = line.index(CIRCPAD_LOG_TIMESTAMP)+len(CIRCPAD_LOG_TIMESTAMP)
 92 |         timestamp = line[n:].split(" ", maxsplit=1)[0]
 93 |         n = line.index(CIRCPAD_LOG_CIRC_ID)+len(CIRCPAD_LOG_CIRC_ID)
 94 |         cid = line[n:].split(" ", maxsplit=1)[0]
 95 | 
 96 |         # an event is the last part, no need to split on space like we did earlier
 97 |         n = line.index(CIRCPAD_LOG_EVENT)+len(CIRCPAD_LOG_EVENT)
 98 |         event = line[n:]
 99 |         
100 |         return int(cid), int(timestamp), event
101 | 
102 |     circuits = {}
103 |     base = -1
104 |     for line in log_lines:
105 |         if CIRCPAD_LOG in line:
106 |             # skip client/relay if they shouldn't be part of the trace
107 |             if not source_client and "source=client" in line:
108 |                 continue
109 |             if not source_relay and "source=relay" in line:
110 |                 continue
111 | 
112 |             # extract trace and make timestamps relative
113 |             cid, timestamp, event = extract_from_line(line)
114 |             if base == -1:
115 |                 base = timestamp
116 |             timestamp = timestamp - base
117 | 
118 |             # store trace
119 |             if cid in circuits.keys():
120 |                 if len(circuits[cid]) < max_length:
121 |                     circuits[cid] = circuits.get(cid) + [(timestamp, event)]
122 |             else:
123 |                 circuits[cid] = [(timestamp, event)]
124 | 
125 |     # filter out circuits with blacklisted addresses
126 |     for cid in list(circuits.keys()):
127 |         if blacklist_hit(circuits[cid]):
128 |             del circuits[cid]
129 |     # filter out circuits with only IPs (unless arg says otherwise)
130 |     for cid in list(circuits.keys()):
131 |         if not allow_ips and circpad_only_ips_in_trace(circuits[cid]):
132 |             del circuits[cid]
133 | 
134 |     # remove blacklisted events (and associated events)
135 |     for cid in list(circuits.keys()):
136 |         circuits[cid] = circpad_remove_blacklisted_events(circuits[cid],
137 |                         filter_client_negotiate, filter_relay_negotiate)
138 |     
139 |     return circuits
140 |     
141 | 
142 | def circpad_remove_blacklisted_events(
143 |     trace, 
144 |     filter_client_negotiate, 
145 |     filter_relay_negotiate
146 |     ):
147 |     
148 |     result = []
149 |     ignore_next_send_cell = False
150 | 
151 |     for line in trace:
152 |         strline = str(line) # What the hell was this before?
153 | 
154 |         # If we hit a blacklisted event, this means we should ignore the next
155 |         # sent nonpadding cell. Since the blacklisted event should only be
156 |         # triggered client-side, there shouldn't be any impact on relay traces.
157 |         if any(b in strline for b in CIRCPAD_BLACKLISTED_EVENTS):
158 |             ignore_next_send_cell = True
159 |         else:
160 |             if ignore_next_send_cell and CIRCPAD_EVENT_NONPADDING_SENT in strline:
161 |                 ignore_next_send_cell = False
162 |             else:
163 |                 result.append(line)
164 |                 
165 |     return result
166 | 
167 | def circpad_only_ips_in_trace(trace):
168 |     def is_ipv4(addr):
169 |         try:
170 |             socket.inet_aton(addr)
171 |         except (socket.error, TypeError):
172 |             return False
173 |         return True
174 |     def is_ipv6(addr):
175 |         try:
176 |             socket.inet_pton(addr,socket.AF_INET6)
177 |         except (socket.error, TypeError):
178 |             return False
179 |         return True
180 | 
181 |     for a in circpad_get_all_addresses(trace):
182 |         if not is_ipv4(a) and not is_ipv6(a):
183 |             return False
184 |     return True
185 | 
186 | 
187 | def circpad_to_wf(
188 |     trace, 
189 |     cells=False, timecells=False, dirtime=False, cellevents=False,
190 |     strip=False
191 |     ):
192 |     ''' Get a WF representation of the trace in the specified format.
193 | 
194 |     We support three formats:
195 |     - cells, each line only contains 1 or -1 for outgoing or incoming cells.
196 |     - timecells, relative timestamp (ms) added before each cell.
197 |     - directionaltime, each line has relative time multiplied with cell value.
198 |     - cellevents, each line consists of the trace event for (non)padding cells.
199 | 
200 |     If the strip flag is set, events prior to a first domain resolution is
201 |     stripped from the trace (if present). Circuits are typically created in the
202 |     background by Tor Browser to speed-up browsing for users. Removing this is
203 |     beneficital for WF attackers, because it's often assumed (more or less
204 |     realistically so) that an attacker can often detect this (mainly by a
205 |     significant time of "silence" on the wire, followed by what is assumed to be
206 |     a website load).
207 | 
208 |     FIXME: For timecells and dirtime, current magnitute is nanoseconds, might be
209 |     more efficient to round to seconds with lower resultion, especially for deep
210 |     learning attacks.
211 |     '''
212 |     result = []
213 | 
214 |     # only strip if we find the event for an address being resolved
215 |     if strip:
216 |         for i, l in enumerate(trace):
217 |             if CIRCPAD_ADDRESS_EVENT in l[1]:
218 |                 trace = trace[i:]
219 |                 break
220 | 
221 |     for l in trace:
222 |         # outgoing is positive
223 |         if CIRCPAD_EVENT_NONPADDING_SENT in l[1] or \
224 |            CIRCPAD_EVENT_PADDING_SENT in l[1]:
225 |             if cells:
226 |                 result.append("1")
227 |             if timecells:
228 |                 result.append(f"{l[0]} 1")
229 |             if dirtime:
230 |                 result.append(f"{l[0]}") 
231 |             if cellevents:
232 |                 result.append(l[1])
233 |             
234 |         # incoming is negative
235 |         elif CIRCPAD_EVENT_NONPADDING_RECV in l[1] or \
236 |            CIRCPAD_EVENT_PADDING_RECV in l[1]:
237 |             if cells:
238 |                 result.append("-1")
239 |             if timecells:
240 |                 result.append(f"{l[0]} -1")
241 |             if dirtime:
242 |                 result.append(f"{l[0]*-1}") 
243 |             if cellevents:
244 |                 result.append(l[1])
245 |     return result
246 | 


--------------------------------------------------------------------------------
/evolve/evolve.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import machine
  3 | import random
  4 | import math
  5 | 
  6 | import numpy as np
  7 | 
  8 | '''
  9 | This is more genetic programming than genetic algorithms due to how we represent
 10 | the machines to evolve. 
 11 | 
 12 | TL;DR Johan suggestions: have advanced selection or simple selection and
 13 | mutation, mutation probability can be kept constant, elite fraction a good idea,
 14 | was worthwhile to remove all useless individuals instead of keeping for
 15 | diversity (Internet seems mixed on this though).
 16 | 
 17 | inspiration https://deap.readthedocs.io/en/master/api/tools.html
 18 | '''
 19 | 
 20 | def mutation(m, probability, exp):
 21 |     # with some probability, mutate each part of each state of a machine in place
 22 |     for s in m.states:
 23 |         if random.random() < probability:
 24 |             s.randomize_iat_dist(exp, probability)
 25 |         if random.random() < probability:
 26 |             s.randomize_length_dist(exp, probability)
 27 |         if random.random() < probability:
 28 |             s.randomize_transitions(exp, probability)
 29 | 
 30 | def crossover(m1, m2, probability):
 31 |     # with some probability, performs single-point crossover in place
 32 |     if random.random() < probability:
 33 |         c = random.randint(0, len(m1.states)-1)
 34 |         tmp = m1.states[:c]
 35 |         m1.states[:c] = m2.states[:c]
 36 |         m2.states[:c] = tmp
 37 | 
 38 | def selection(ml, fitness_func):
 39 |     # given a list of machines, selects the best, using the fitness function
 40 |     # we order, letting next_generation discard
 41 | 
 42 |     fl = []
 43 |     for mp in ml:
 44 |         fl.append(fitness_func(mp))
 45 |     
 46 |     # sort ml and fl together
 47 |     #fl, ml = (list(t) for t in zip(*sorted(zip(fl, ml))))
 48 |     idx   = np.argsort(fl)
 49 |     fl = list(np.array(fl)[idx])
 50 |     ml = list(np.array(ml)[idx])
 51 |     fl.reverse()
 52 |     ml.reverse()
 53 | 
 54 |     return ml, fl
 55 | 
 56 | def initial_population(mc, mr, exp):
 57 |     pop = []
 58 |     for _ in range(exp["population_size"]):
 59 |         pop.append([mc.randomize(exp), mr.randomize(exp)])
 60 |     return pop
 61 | 
 62 | def next_generation(ml, fl, exp):
 63 |     """ 
 64 |     Given a sorted list of pairs of machines (better to worse) and their
 65 |     fitness, creates the next generation of machines. Is elitist, keeping the
 66 |     best machines as-is, and includes some machines randomly for diversity. The
 67 |     rest of the population is evolved using crossover and mutation from randomly
 68 |     selected machines, selected by weight based on their fitness.
 69 |     """
 70 | 
 71 |     # elitist, pick a fraction of the best for the next generation
 72 |     n = math.floor(len(ml)*exp["elitist_frac"])
 73 |     ng = ml[:n]
 74 | 
 75 |     # diverse, pick a random fraction for the next generation
 76 |     n = math.floor(len(ml)*exp["diversity_frac"])
 77 |     ng.extend(random.choices(ml, k=n))
 78 | 
 79 |     # evolve remaining next generation
 80 |     while(len(ng) < len(ml)):
 81 |         # select two random parents, weighted by fitness
 82 |         parents = random.choices(ml, weights=fl, k=2)
 83 |         
 84 |         # make two new machines as clones
 85 |         m0c = parents[0][0].clone()
 86 |         m0r = parents[0][1].clone()
 87 |         m1c = parents[1][0].clone()
 88 |         m1r = parents[1][1].clone()
 89 | 
 90 |         # TODO: crossover between pairs, but never swap roles, with some probability?
 91 | 
 92 |         # crossover of states, per role
 93 |         crossover(m0c, m1c, exp["crossover_prob"])
 94 |         crossover(m0r, m1r, exp["crossover_prob"])
 95 | 
 96 |         # mutate each machine
 97 |         mutation(m0c, exp["mutation_prob"], exp)
 98 |         mutation(m0r, exp["mutation_prob"], exp)
 99 |         mutation(m1c, exp["mutation_prob"], exp)
100 |         mutation(m1r, exp["mutation_prob"], exp)
101 | 
102 |         # done, add to population
103 |         ng.append([m0c, m0r])
104 |         ng.append([m1c, m1r])
105 | 
106 |     # we may end up evolving one machine too many above in case elitist and
107 |     # diverse fractions result in an uneven number of machines
108 |     return ng[:len(ml)]
109 | 
110 | def main():
111 |     # can do head and tail independent
112 |     # add probabilistic (consensus parameter) transition from head to tail and done
113 |     # start with safest; simpler and more realistic evaluation of effectiveness
114 |     # efficiency in absolutes (like the Sith!)
115 |     
116 |     # example hardcoded state
117 |     s = machine.MachineState(
118 |         iat_dist=machine.Distribution(machine.DistType.LOG_LOGISTIC, 2, 10),
119 |         length_dist=machine.Distribution(machine.DistType.UNIFORM, 1, 5),
120 |         length_dist_add=1,
121 |         length_dist_max=100,
122 |         transitions=[[machine.Event.PADDING_SENT, 0], [machine.Event.NONPADDING_SENT, 0]]
123 |     )
124 | 
125 |     # can we create a machine?
126 |     m = machine.Machine(name="goodenough", states=[s])
127 |     print(f"{m}\n")
128 |     print(f"m has ID {m.id()}")
129 | 
130 |     # can we generate c?
131 |     conditions = "hardcoded_conditions;"
132 |     print(conditions)
133 |     print(m.to_c("generated"))
134 | 
135 |     # parameters for our experiment in a dict, easier to save
136 |     exp = {}
137 |     exp["num_states"] = 3
138 |     exp["iat_d_low"] = 0.0
139 |     exp["iat_d_high"] = 10.0
140 |     exp["iat_a_low"] = 0
141 |     exp["iat_a_high"] = 10
142 |     exp["iat_m_low"] = 100
143 |     exp["iat_m_high"] = 100*1000
144 |     exp["length_d_low"] = 0
145 |     exp["length_d_high"] = 100
146 |     exp["length_a_low"] = 10
147 |     exp["length_a_high"] = 100
148 |     exp["length_m_low"] = 100
149 |     exp["length_m_high"] = 1*1000
150 | 
151 |     # random machine check
152 |     r = m.randomize(exp)
153 |     print(r)
154 |     print(r.to_c("random"))
155 | 
156 |     # mutation check
157 |     r2 = r.clone()
158 |     mutation(r2, 0.5, exp)
159 |     print(f"\n{r}\n\n{r2}")
160 | 
161 |     # crossover check
162 |     m1 = m.randomize(exp)
163 |     m2 = m.randomize(exp)
164 |     print(f"\n{m1}\n\n{m2}")
165 |     crossover(m1, m2, 1.0)
166 |     print(f"\n{m1}\n\n{m2}")
167 | 
168 |     # initial population, we evolve machines in *pairs*, highly asymmetrical setting
169 |     exp["name"] = "evolved"
170 |     exp["target_hopnum"] = 1
171 |     exp["population_size"] = 10
172 |     exp["allowed_padding_count_client"] = 1000
173 |     exp["max_padding_percent_client"] = 50
174 |     exp["allowed_padding_count_relay"] = 1000
175 |     exp["max_padding_percent_relay"] = 50
176 |     ## template client and relay machines with our parameters
177 |     mc = machine.Machine(
178 |         is_origin_side=True, name=exp["name"], target_hopnum=exp["target_hopnum"],
179 |         allowed_padding_count=exp["allowed_padding_count_client"], 
180 |         max_padding_percent=exp["max_padding_percent_client"],
181 |         )
182 |     mr = mc.clone()
183 |     mr.is_origin_side = False
184 |     mr.allowed_padding_count = exp["allowed_padding_count_relay"]
185 |     mr.max_padding_percent = exp["max_padding_percent_relay"]
186 |     print("")
187 | 
188 |     ml = initial_population(mc, mr, exp)
189 |     print(f"ml has {len(ml)} pairs of machines")
190 | 
191 |     def bad_fit_func(mp):
192 |         return random.random()
193 | 
194 |     ml, fl = selection(ml, bad_fit_func)
195 | 
196 |     for i in range(len(fl)):
197 |         print(f"fitness {fl[i]:1.2} for {ml[i]}")
198 | 
199 |     # next_generation check, can be done without working fitness function
200 |     exp["mutation_prob"] = 0.2
201 |     exp["crossover_prob"] = 0.7
202 |     exp["elitist_frac"] = 0.2
203 |     exp["diversity_frac"] = 0.1
204 | 
205 |     # create dummy fl
206 |     fl = [f for f in range(10)]
207 |     ng = next_generation(ml, fl, exp)
208 |     print(f"ng has {len(ng)} pairs of machines")
209 | 
210 | if __name__ == "__main__":
211 |     main()


--------------------------------------------------------------------------------
/evolve/machine.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | from enum import Enum
  3 | import copy
  4 | import random
  5 | import hashlib
  6 | 
  7 | # the possible discrete distributions
  8 | class DistType(Enum):
  9 |     NONE = "CIRCPAD_DIST_NONE"
 10 |     UNIFORM = "CIRCPAD_DIST_UNIFORM"
 11 |     LOGISTIC = "CIRCPAD_DIST_LOGISTIC"
 12 |     LOG_LOGISTIC = "CIRCPAD_DIST_LOG_LOGISTIC"
 13 |     GEOMETRIC = "CIRCPAD_DIST_GEOMETRIC"
 14 |     WEIBULL = "CIRCPAD_DIST_WEIBULL"
 15 |     PARETO = "CIRCPAD_DIST_PARETO"
 16 | 
 17 | class Distribution:
 18 |     def __init__(self, dist_type=DistType.NONE, param1=0, param2=0):
 19 |         self.dist_type = dist_type
 20 |         self.param1 = param1
 21 |         self.param2 = param2
 22 |     
 23 |     def __str__(self):
 24 |         return f"{self.dist_type} {self.param1:.2f} {self.param2:.2f}"
 25 | 
 26 |     def randomize(self, a, b):
 27 |         self.dist_type = random.choice(list(DistType))
 28 |         self.param1 = random.uniform(a,b)
 29 |         self.param2 = random.uniform(a,b)
 30 |         return self
 31 | 
 32 | # the events specified by struct circpad_event_t
 33 | class Event(Enum):
 34 |     # a non-padding cell was received
 35 |     NONPADDING_RECV = "CIRCPAD_EVENT_NONPADDING_RECV"
 36 |     # a non-padding cell was sent
 37 |     NONPADDING_SENT = "CIRCPAD_EVENT_NONPADDING_SENT"
 38 |     # a padding cell (RELAY_COMMAND_DROP) was sent
 39 |     PADDING_SENT = "CIRCPAD_EVENT_PADDING_SENT"
 40 |     # a padding cell was received
 41 |     PADDING_RECV = "CIRCPAD_EVENT_PADDING_RECV"
 42 |     # we tried to schedule padding but we ended up picking the infinity bin
 43 |     # which means that padding was delayed infinitely
 44 |     INFINITY = "CIRCPAD_EVENT_INFINITY"
 45 |     # all histogram bins are empty (we are out of tokens) 
 46 |     BINS_EMPTY = "CIRCPAD_EVENT_BINS_EMPTY"
 47 |     # out of allowed cells to send in state
 48 |     LENGTH_COUNT = "CIRCPAD_EVENT_LENGTH_COUNT"
 49 | 
 50 | class MachineState:
 51 |     # TODO histogram with iat_histogram and token_removal
 52 |     def __init__(
 53 |         self,
 54 |         # IAT-dist
 55 |         iat_dist=None,
 56 |         # dist_added_shift_usec
 57 |         iat_dist_add=0,
 58 |         # dist_max_sample_usec
 59 |         iat_dist_max=None,
 60 |         # length-dist
 61 |         length_dist=None,
 62 |         # start_length
 63 |         length_dist_add=0,
 64 |         # max_length
 65 |         length_dist_max=None,
 66 |         # should we decrement length when we see a nonpadding packet?
 67 |         length_includes_nonpadding=False,
 68 |         # the transitions from this state, we ignore by default
 69 |         transitions = []
 70 |         ):
 71 |         self.iat_dist = iat_dist
 72 |         self.iat_dist_add = iat_dist_add
 73 |         self.iat_dist_max = iat_dist_max
 74 |         self.length_dist = length_dist
 75 |         self.length_dist_add = length_dist_add
 76 |         self.length_dist_max = length_dist_max
 77 |         self.length_includes_nonpadding = length_includes_nonpadding
 78 |         self.transitions = transitions
 79 |     
 80 |     def __str__(self):
 81 |         r = f"\tiat-dist {self.iat_dist}"
 82 |         if self.iat_dist_add > 0 or self.iat_dist_max != None:
 83 |             r += f" clamped to  [{self.iat_dist_add}, {self.iat_dist_max}],\n"
 84 |         
 85 |         r += f"\tlength-dist {self.length_dist}"
 86 |         if self.length_dist_add > 0 or self.length_dist_max != None:
 87 |             r += f" clamped to [{self.length_dist_add}, {self.length_dist_max}],\n"
 88 |         
 89 |         if self.length_includes_nonpadding:
 90 |             r += f"\tlength_includes_nonpadding,\n"
 91 | 
 92 |         if len(self.transitions) == 0:
 93 |             r += f"\tno transitions"
 94 |         else:
 95 |             r += f"\ttransitions\n\t["
 96 |             for t in self.transitions:
 97 |                 r += f"\n\t\t{t[0]} -> {t[1]},"
 98 |             r += f"\n\t]"
 99 | 
100 |         return r
101 |     
102 |     def to_c(self, prefix):
103 |         '''
104 |         Returns c code with the prefix added for each line. 
105 | 
106 |         The prefix should be generated by the caller in a format to fit the
107 |         following format _before_ the first . (excluding the dot):
108 | 
109 |         machine->states[index].length_dist.type = CIRCPAD_DIST_UNIFORM;
110 | 
111 |         One possible prefix would be "machine->states[1]" for the "machine"
112 |         variable name and the state with index 1.
113 |         '''
114 | 
115 |         c = ""
116 |         prefix = f"\n{prefix}"
117 | 
118 |         if not self.length_dist is None:
119 |             c += f"{prefix}.length_dist.type = {self.length_dist.dist_type.value};"
120 |             c += f"{prefix}.length_dist.param1 = {self.length_dist.param1};"
121 |             c += f"{prefix}.length_dist.param2 = {self.length_dist.param2};"
122 |             
123 |             if self.length_dist_add > 0:
124 |                 c += f"{prefix}.start_length = {self.length_dist_add};"
125 | 
126 |             if not self.length_dist_max is None:
127 |                 c += f"{prefix}.max_length = {self.length_dist_max};"
128 | 
129 |         if not self.iat_dist is None:
130 |             c += f"{prefix}.iat_dist.type = {self.iat_dist.dist_type.value};"
131 |             c += f"{prefix}.iat_dist.param1 = {self.iat_dist.param1};"
132 |             c += f"{prefix}.iat_dist.param2 = {self.iat_dist.param2};"
133 |             
134 |             if self.iat_dist_add > 0:
135 |                 c += f"{prefix}.dist_added_shift_usec = {self.iat_dist_add};"
136 | 
137 |             if not self.iat_dist_max is None:
138 |                 c += f"{prefix}.dist_max_sample_usec = {self.iat_dist_max};"
139 |             else:
140 |                 # BUG: circuitpadding.c, line 560, should check if set like for length
141 |                 c += f"{prefix}.dist_max_sample_usec = CIRCPAD_DELAY_INFINITE;"
142 | 
143 |         if self.length_includes_nonpadding:
144 |             c += f"{prefix}.length_includes_nonpadding = 1;"
145 | 
146 |         for t in self.transitions:
147 |             c += f"{prefix}.next_state[{t[0].value}] = {t[1]};"
148 | 
149 |         return c
150 | 
151 |     def randomize(self, exp):
152 |         self.randomize_iat_dist(exp)
153 |         self.randomize_length_dist(exp)
154 |         self.randomize_transitions(exp)
155 |         return self
156 |     
157 |     def randomize_iat_dist(
158 |         self,
159 |         exp,
160 |         probability=1.0,
161 |         ):
162 |         if random.random() < probability:
163 |             self.iat_dist = Distribution().randomize(
164 |                 exp["iat_d_low"], 
165 |                 exp["iat_d_high"]
166 |             )
167 |         if random.random() < probability:
168 |             self.iat_dist_add = random.randint(
169 |                 exp["iat_a_low"], 
170 |                 exp["iat_a_high"]
171 |             )
172 |         if random.random() < probability:
173 |             self.iat_dist_max = random.randint(
174 |                 exp["iat_m_low"], 
175 |                 exp["iat_m_high"]
176 |             )
177 | 
178 |     def randomize_length_dist(
179 |         self,
180 |         exp,
181 |         probability=1.0,
182 |         ):
183 |         if random.random() < probability:
184 |             self.length_dist = Distribution().randomize(
185 |                 exp["length_d_low"],
186 |                 exp["length_d_high"]
187 |             )
188 |         if random.random() < probability:
189 |             self.length_dist_add = random.randint(
190 |                 exp["length_a_low"],
191 |                 exp["length_a_high"]
192 |             )
193 |         if random.random() < probability:
194 |             self.length_dist_max = random.randint(
195 |                 exp["length_m_low"],
196 |                 exp["length_m_high"]
197 |             )
198 | 
199 |     def randomize_transitions(
200 |         self,
201 |         exp,
202 |         probability=1.0,
203 |         ):
204 |         # by chance, a state that end up never having any transitions from or to
205 |         # it may appear useless, but like "introns", they are useful "material"
206 |         # later for mutation it seems
207 |         self.transitions = []
208 |         for e in Event:
209 |             # TODO: with histograms we can consider these, for now, ignore
210 |             if e == Event.INFINITY or e == Event.BINS_EMPTY:
211 |                 continue
212 |             if random.random() < probability:
213 |                 self.transitions.append([e,random.randint(0,exp["num_states"]-1)])
214 | 
215 | 
216 | class Machine:
217 |     # TODO: conditions
218 |     def __init__(
219 |         self,
220 |         # just a user-friendly machine name for logs 
221 |         name = "",
222 |         # which machine index slot should this machine go into
223 |         machine_index = 0,
224 |         # send a padding negotiate to shut down machine at end state?
225 |         should_negotiate_end = False,
226 |         # origin side or relay side
227 |         is_origin_side = False,
228 |         # which hop in the circuit should we send padding to/from?
229 |         # 1-indexed (ie: hop #1 is guard, #2 middle, #3 exit).
230 |         target_hopnum = 0,
231 |         # if this flag is enabled, don't close circuits that use this machine
232 |         manage_circ_lifetime = False,
233 |         # how many padding cells can be sent before we apply overhead limits?
234 |         allowed_padding_count = 0,
235 |         # padding percent cap: Stop padding if we exceed this percent overhead.
236 |         max_padding_percent = 0,
237 |         # list of states
238 |         states = [],
239 |         ):
240 |         self.name = name
241 |         self.machine_index = machine_index
242 |         self.should_negotiate_end = should_negotiate_end
243 |         self.is_origin_side = is_origin_side
244 |         self.target_hopnum = target_hopnum
245 |         self.manage_circ_lifetime = manage_circ_lifetime
246 |         self.allowed_padding_count = allowed_padding_count
247 |         self.max_padding_percent = max_padding_percent
248 |         self.states = states
249 | 
250 |     def __str__(self):
251 |         r = f"{self.name}, index {self.machine_index}"
252 | 
253 |         if self.should_negotiate_end:
254 |             r += f", should_negotiate_end"
255 | 
256 |         if self.is_origin_side:
257 |             r += f", origin side"
258 |         else:
259 |             r += f", relay side"
260 |         
261 |         r += f", sending padding to/from "
262 |         if self.target_hopnum == 0:
263 |             r += f"guard"
264 |         elif self.target_hopnum == 1:
265 |             r += f"middle"
266 |         else:
267 |             r += f"exit"
268 |         
269 |         if self.manage_circ_lifetime:
270 |             r += f", manage_circ_lifetime"
271 |         if self.allowed_padding_count > 0:
272 |             r += f", allowed_padding_count {self.allowed_padding_count}"
273 |         if self.max_padding_percent > 0:
274 |             r += f", max_padding_percent {self.max_padding_percent}"
275 |         
276 |         r += f", states:\n["
277 |         for s in self.states:
278 |             r += f"\n{s}, "
279 |         r += f"\n]"
280 | 
281 |         return r
282 | 
283 |     def to_c(self, varname):
284 |        # transforms the machine to c code with the specified variable name
285 | 
286 |         prefix = f"\n{varname}"
287 |         c = f"{prefix}->name = \"{self.name}\";"
288 |         c += f"{prefix}->machine_index = {self.machine_index};"
289 |         c += f"{prefix}->target_hopnum = {self.target_hopnum};"
290 | 
291 |         if self.should_negotiate_end:
292 |             c += f"{prefix}->should_negotiate_end = 1;"
293 |         if self.is_origin_side:
294 |             c += f"{prefix}->is_origin_side = 1;"
295 |         else:
296 |             c += f"{prefix}->is_origin_side = 0;"
297 |         if self.manage_circ_lifetime:
298 |             c += f"{prefix}->manage_circ_lifetime = 1;"
299 | 
300 |         c += f"{prefix}->allowed_padding_count = {self.allowed_padding_count};"
301 |         c += f"{prefix}->max_padding_percent = {self.max_padding_percent};"
302 | 
303 |         c += f"\ncircpad_machine_states_init({varname}, {len(self.states)});"
304 |         for k, s in enumerate(self.states):
305 |             c += s.to_c(f"{varname}->states[{k}]")
306 | 
307 |         return c
308 | 
309 |     def clone(self):
310 |         return copy.deepcopy(self)
311 | 
312 |     def randomize(self, exp):
313 |         # create a randomized clone of this machine with num_states random states
314 |         r = self.clone()
315 |         r.states = []
316 |         for _ in range(exp["num_states"]):
317 |             r.states.append(MachineState().randomize(exp))
318 | 
319 |         return r
320 |     
321 |     def id(self):
322 |         return hashlib.sha256(self.to_c("").encode("ascii")).hexdigest()[:16]
323 | 


--------------------------------------------------------------------------------
/evolve/shared.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import numpy as np
  3 | import os
  4 | import sys
  5 | from torch.utils import data
  6 | import torch.nn as nn
  7 | 
  8 | def metrics(threshold, predictions, labels, label_unmon):
  9 |     ''' Computes a range of metrics.
 10 | 
 11 |     For details on the metrics, see, e.g., https://www.cs.kau.se/pulls/hot/baserate/
 12 |     '''
 13 |     tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1 = 0, 0, 0, 0, 0, 0.0, 0.0, 0.0, 0.0
 14 | 
 15 |     # extended metric: per-class monitored stats
 16 |     monitored_right = {}
 17 |     monitored_total = {}
 18 | 
 19 |     for i in range(len(predictions)):
 20 |         label_pred = np.argmax(predictions[i])
 21 |         prob_pred = max(predictions[i])
 22 |         label_correct = labels[i]
 23 | 
 24 |         # we split on monitored or unmonitored correct label
 25 |         if label_correct != label_unmon:
 26 |             monitored_total[label_correct] = monitored_total.get(label_correct, 0) + 1
 27 |             # either confident and correct,
 28 |             if prob_pred >= threshold and label_pred == label_correct:
 29 |                 tp = tp + 1
 30 |                 monitored_right[label_pred] = monitored_right.get(label_pred, 0) + 1
 31 |             # confident and wrong monitored label, or
 32 |             elif prob_pred >= threshold and label_pred != label_unmon:
 33 |                 fpp = fpp + 1
 34 |             # wrong because not confident or predicted unmonitored for monitored
 35 |             else:
 36 |                 fn = fn + 1
 37 |         else:
 38 |             if prob_pred < threshold or label_pred == label_unmon: # correct prediction?
 39 |                 tn = tn + 1
 40 |             elif label_pred < label_unmon: # predicted monitored for unmonitored
 41 |                 fnp = fnp + 1
 42 |             else: # this should never happen
 43 |                 sys.exit(f"this should never, wrongly labelled data for {label_pred}")
 44 | 
 45 |     if tp + fn + fpp > 0:
 46 |         recall = round(float(tp) / float(tp + fpp + fn), 4)
 47 |     if tp + fpp + fnp > 0:
 48 |         precision = round(float(tp) / float(tp + fpp + fnp), 4)
 49 | 
 50 |     if precision > 0 and recall > 0:
 51 |         f1 = round(2*((precision*recall)/(precision+recall)), 4)
 52 | 
 53 |     accuracy = round(float(tp + tn) / float(tp + fpp + fnp + fn + tn), 4)
 54 | 
 55 |     return tp, fpp, fnp, tn, fn, accuracy, recall, precision, f1, monitored_right, monitored_total
 56 | 
 57 | 
 58 | class Dataset(data.Dataset):
 59 |    def __init__(self, ids, dataset, labels):
 60 |        self.ids = ids
 61 |        self.dataset = dataset
 62 |        self.labels = labels
 63 | 
 64 |    def __len__(self):
 65 |        return len(self.ids)
 66 | 
 67 |    def __getitem__(self, index):
 68 |        ID = self.ids[index]
 69 |        return self.dataset[ID], self.labels[ID]
 70 | 
 71 | def load_dataset(
 72 |     mon_dir, unm_dir, 
 73 |     classes, partitions, samples,
 74 |     length, extract_func
 75 |     ):
 76 |     ''' Loads the dataset from disk into two dictionaries for data and labels.
 77 | 
 78 |     The dictionaries are indexed by sample ID. The ID encodes if its a monitored
 79 |     or unmonitored sample to make it easier to debug, as well as some info about
 80 |     the corresponding data file on disk. 
 81 | 
 82 |     This function works assumes the structure of the following dataset:
 83 |     - "top50-partitioned-reddit-levels-cirucitpadding" 
 84 |     '''
 85 |     data = {}
 86 |     labels = {}
 87 | 
 88 |     # load monitored data
 89 |     for c in range(0,classes):
 90 |         for p in range(0,partitions):
 91 |             site = c*10 + p
 92 |             for s in range(0,samples):
 93 |                 ID = f"m-{c}-{p}-{s}"
 94 |                 labels[ID] = c
 95 | 
 96 |                 # file format is {site}-{sample}.trace
 97 |                 fname = f"{site}-{s}.trace"
 98 |                 with open(os.path.join(mon_dir, fname), "r") as f:
 99 |                     data[ID] = extract_func(f.read(), length)
100 | 
101 |     # load unmonitored data
102 |     dirlist = os.listdir(unm_dir)
103 |     # make sure we only load a balanced dataset
104 |     dirlist = dirlist[:len(data)]
105 |     for fname in dirlist:
106 |         ID = f"u-{fname}"
107 |         labels[ID] = classes # start from 0 for monitored
108 |         with open(os.path.join(unm_dir, fname), "r") as f:
109 |             data[ID] = extract_func(f.read(), length)
110 | 
111 |     return data, labels
112 | 
113 | CIRCPAD_EVENT_NONPADDING_SENT = "circpad_cell_event_nonpadding_sent"
114 | CIRCPAD_EVENT_NONPADDING_RECV = "circpad_cell_event_nonpadding_received"
115 | CIRCPAD_EVENT_PADDING_SENT = "circpad_cell_event_padding_sent"
116 | CIRCPAD_EVENT_PADDING_RECV = "circpad_cell_event_padding_received"
117 | CIRCPAD_ADDRESS_EVENT = "connection_ap_handshake_send_begin"
118 | 
119 | def trace2cells(log, length, strip=True):
120 |     ''' A fast specialised function to generate cells from a trace.
121 | 
122 |     Based on circpad_to_wf() in circpad-sim/common.py, but only for cells.
123 |     '''
124 |     data = np.zeros((1, length), dtype=np.float32)
125 |     n = 0
126 | 
127 |     if strip:
128 |         for i, line in enumerate(log):
129 |             if CIRCPAD_ADDRESS_EVENT in line:
130 |                 log = log[i:]
131 |                 break
132 | 
133 |     s = log.split("\n")
134 |     for line in s:
135 |         # outgoing is positive
136 |         if CIRCPAD_EVENT_NONPADDING_SENT in line or \
137 |            CIRCPAD_EVENT_PADDING_SENT in line:
138 |             data[0][n] = 1.0
139 |             n += 1
140 |         # incoming is negative
141 |         elif CIRCPAD_EVENT_NONPADDING_RECV in line or \
142 |            CIRCPAD_EVENT_PADDING_RECV in line:
143 |             data[0][n] = -1.0
144 |             n += 1
145 |         
146 |         if n == length:
147 |             break
148 | 
149 |     return data
150 | 
151 | def split_dataset(
152 |     classes, partitions, samples, fold, labels, multiplier=1,
153 |     ):
154 |     '''Splits the dataset based on fold.
155 | 
156 |     The split is only based on IDs, not the actual data. The result is a 8:1:1
157 |     split into training, validation, and testing.
158 |     '''
159 |     training = []
160 |     validation = []
161 |     testing = []
162 | 
163 |     # monitored, split by _partition_
164 |     for c in range(0,classes):
165 |         for p in range(0,partitions):
166 |             for s in range(0,samples):
167 |                 for x in range(0,multiplier):
168 |                     ID = f"m-{c}-{p}-{s}"
169 |                     if multiplier > 1:
170 |                         ID = f"{ID}-{x}"
171 | 
172 |                     i = (p+fold) % partitions
173 |                     if i < partitions-2:
174 |                         training.append(ID)
175 |                     elif i < partitions-1:
176 |                         validation.append(ID)
177 |                     else:
178 |                         testing.append(ID)
179 | 
180 |     # unmonitored
181 |     counter = 0
182 |     for k in labels.keys():
183 |         if not k.startswith("u"):
184 |             continue
185 |         i = (counter+fold) % partitions
186 |         if i < partitions-2:
187 |             training.append(k)
188 |         elif i < partitions-1:
189 |             validation.append(k)
190 |         else:
191 |             testing.append(k)     
192 |         counter += 1
193 | 
194 |     split = {}
195 |     split["train"] = training
196 |     split["validation"] = validation
197 |     split["test"] = testing
198 |     return split
199 | 
200 | def zero_dataset(dataset, z):
201 |     index = z.split(":")
202 |     start = int(index[0])
203 |     stop = int(index[1])
204 |     data = np.zeros((stop-start), dtype=np.float32)
205 |     for k, v in dataset.items():
206 |         v[:,start:stop] = data
207 |         dataset[k] = v
208 |     return dataset
209 | 
210 | class DFNet(nn.Module):
211 |     def __init__(self, classes, fc_in_features = 512*10):
212 |         super(DFNet, self).__init__()
213 |         # https://ezyang.github.io/convolution-visualizer/index.html
214 |         # https://github.com/lin-zju/deep-fp/blob/master/lib/modeling/backbone/dfnet.py
215 |         self.kernel_size = 7
216 |         self.padding_size = 3
217 |         self.pool_stride_size = 4
218 |         self.pool_size = 7
219 | 
220 |         self.block1 = self.__block(1, 32, nn.ELU())
221 |         self.block2 = self.__block(32, 64, nn.ReLU())
222 |         self.block3 = self.__block(64, 128, nn.ReLU())
223 |         self.block4 = self.__block(128, 256, nn.ReLU())
224 | 
225 |         self.fc = nn.Sequential(
226 |             nn.Linear(fc_in_features, 512),
227 |             nn.BatchNorm1d(512),
228 |             nn.ReLU(),
229 |             nn.Dropout(0.7),
230 |             nn.Linear(512, 512),
231 |             nn.BatchNorm1d(512),
232 |             nn.ReLU(),
233 |             nn.Dropout(0.5)
234 |         )
235 | 
236 |         self.prediction = nn.Sequential(
237 |             nn.Linear(512, classes),
238 |             # when using CrossEntropyLoss, already computed internally
239 |             #nn.Softmax(dim=1) # dim = 1, don't softmax batch
240 |         )
241 |     
242 |     def __block(self, channels_in, channels, activation):
243 |         return nn.Sequential(
244 |             nn.Conv1d(channels_in, channels, self.kernel_size, padding=self.padding_size),
245 |             nn.BatchNorm1d(channels),
246 |             activation,
247 |             nn.Conv1d(channels, channels, self.kernel_size, padding=self.padding_size),
248 |             nn.BatchNorm1d(channels),
249 |             activation,
250 |             nn.MaxPool1d(self.pool_size, stride=self.pool_stride_size, padding=self.padding_size),
251 |             nn.Dropout(p=0.1)
252 |         )
253 | 
254 |     def forward(self, x):
255 |         x = self.block1(x)
256 |         x = self.block2(x)
257 |         x = self.block3(x)
258 |         x = self.block4(x)
259 |         x = x.flatten(start_dim=1) # dim = 1, don't flatten batch
260 |         x = self.fc(x)
261 |         x = self.prediction(x)
262 | 
263 |         return x


--------------------------------------------------------------------------------
/machines/hello-world.md:
--------------------------------------------------------------------------------
  1 | # The Hello World Machine
  2 | This shows the steps we plan to take to design, implement, evaluate, and
  3 | document machines. It's just meant to be an example.
  4 | 
  5 | The goal of this example machine is simple: to send at least one padding cell in
  6 | each direction, to/from client from/to relay. We don't care about sending more
  7 | padding, being efficient, or making much sense. It's just an example.
  8 | 
  9 | ## Design
 10 | Since we don't care much about anything for this machine, and I'm lazy, I'm just
 11 | going to take a randomly generated machine from a tool we're in the process of
 12 | tweaking.
 13 | 
 14 | ## Implementation
 15 | This gets ugly. Below we look at the client and relay machines:
 16 | 
 17 | ```c
 18 | circpad_machine_spec_t *gen_client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 19 | gen_client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 20 | gen_client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
 21 | gen_client->conditions.reduced_padding_ok = 1;
 22 | gen_client->name = "evolved";
 23 | gen_client->machine_index = 0;
 24 | gen_client->target_hopnum = 1;
 25 | gen_client->is_origin_side = 1;
 26 | gen_client->allowed_padding_count = 200;
 27 | gen_client->max_padding_percent = 50;circpad_machine_states_init(gen_client, 6);
 28 | gen_client->states[0].length_dist.type = CIRCPAD_DIST_GEOMETRIC;
 29 | gen_client->states[0].length_dist.param1 = 4.814274646108755;
 30 | gen_client->states[0].length_dist.param2 = 4.869971264299856;
 31 | gen_client->states[0].start_length = 4;
 32 | gen_client->states[0].max_length = 505;
 33 | gen_client->states[0].iat_dist.type = CIRCPAD_DIST_NONE;
 34 | gen_client->states[0].iat_dist.param1 = 7.526266612653222;
 35 | gen_client->states[0].iat_dist.param2 = 7.403589208246087;
 36 | gen_client->states[0].start_length = 8;
 37 | gen_client->states[0].dist_max_sample_usec = 63304;
 38 | gen_client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 4;
 39 | gen_client->states[1].length_dist.type = CIRCPAD_DIST_PARETO;
 40 | gen_client->states[1].length_dist.param1 = 4.403617327117251;
 41 | gen_client->states[1].length_dist.param2 = 5.996417832959251;
 42 | gen_client->states[1].start_length = 6;
 43 | gen_client->states[1].max_length = 483;
 44 | gen_client->states[1].iat_dist.type = CIRCPAD_DIST_GEOMETRIC;
 45 | gen_client->states[1].iat_dist.param1 = 8.361216732993883;
 46 | gen_client->states[1].iat_dist.param2 = 0.9264596277951376;
 47 | gen_client->states[1].start_length = 8;
 48 | gen_client->states[1].dist_max_sample_usec = 7065;
 49 | gen_client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
 50 | gen_client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
 51 | gen_client->states[2].length_dist.type = CIRCPAD_DIST_WEIBULL;
 52 | gen_client->states[2].length_dist.param1 = 1.0426652399191527;
 53 | gen_client->states[2].length_dist.param2 = 3.091020838174913;
 54 | gen_client->states[2].start_length = 10;
 55 | gen_client->states[2].max_length = 887;
 56 | gen_client->states[2].iat_dist.type = CIRCPAD_DIST_WEIBULL;
 57 | gen_client->states[2].iat_dist.param1 = 5.667292387983577;
 58 | gen_client->states[2].iat_dist.param2 = 7.958737236028522;
 59 | gen_client->states[2].dist_max_sample_usec = 23447;
 60 | gen_client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 4;
 61 | gen_client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
 62 | gen_client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1;
 63 | gen_client->states[3].length_dist.type = CIRCPAD_DIST_PARETO;
 64 | gen_client->states[3].length_dist.param1 = 9.929415473412345;
 65 | gen_client->states[3].length_dist.param2 = 5.546471576686779;
 66 | gen_client->states[3].start_length = 7;
 67 | gen_client->states[3].max_length = 936;
 68 | gen_client->states[3].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
 69 | gen_client->states[3].iat_dist.param1 = 3.332738685735962;
 70 | gen_client->states[3].iat_dist.param2 = 6.678039275209297;
 71 | gen_client->states[3].start_length = 3;
 72 | gen_client->states[3].dist_max_sample_usec = 38700;
 73 | gen_client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3;
 74 | gen_client->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1;
 75 | gen_client->states[4].length_dist.type = CIRCPAD_DIST_UNIFORM;
 76 | gen_client->states[4].length_dist.param1 = 2.8857540118794556;
 77 | gen_client->states[4].length_dist.param2 = 6.125818574119025;
 78 | gen_client->states[4].start_length = 2;
 79 | gen_client->states[4].max_length = 820;
 80 | gen_client->states[4].iat_dist.type = CIRCPAD_DIST_PARETO;
 81 | gen_client->states[4].iat_dist.param1 = 4.519039376257881;
 82 | gen_client->states[4].iat_dist.param2 = 7.220421029371751;
 83 | gen_client->states[4].start_length = 6;
 84 | gen_client->states[4].dist_max_sample_usec = 79621;
 85 | gen_client->states[4].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
 86 | gen_client->states[4].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
 87 | gen_client->states[4].next_state[CIRCPAD_EVENT_PADDING_RECV] = 4;
 88 | gen_client->machine_num = smartlist_len(origin_padding_machines);
 89 | circpad_register_padding_machine(gen_client, origin_padding_machines);
 90 | ```
 91 | Notice the generated states and their transitions (next_state). We see that no
 92 | state transitions to state 3 except for state 3 itself. In other words, state 3
 93 | is completely useless. Oh well, such is the life of a generated machine.
 94 | 
 95 | ```c
 96 | circpad_machine_spec_t *gen_relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 97 | gen_relay->name = "evolved";
 98 | gen_relay->machine_index = 0;
 99 | gen_relay->target_hopnum = 1;
100 | gen_relay->allowed_padding_count = 2000;
101 | gen_relay->max_padding_percent = 50;circpad_machine_states_init(gen_relay, 6);
102 | gen_relay->states[0].length_dist.type = CIRCPAD_DIST_WEIBULL;
103 | gen_relay->states[0].length_dist.param1 = 1.1119908099375175;
104 | gen_relay->states[0].length_dist.param2 = 9.295631276879977;
105 | gen_relay->states[0].start_length = 9;
106 | gen_relay->states[0].max_length = 166;
107 | gen_relay->states[0].iat_dist.type = CIRCPAD_DIST_NONE;
108 | gen_relay->states[0].iat_dist.param1 = 5.140798889226186;
109 | gen_relay->states[0].iat_dist.param2 = 3.7189363424246693;
110 | gen_relay->states[0].start_length = 8;
111 | gen_relay->states[0].dist_max_sample_usec = 19688;
112 | gen_relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1;
113 | gen_relay->states[1].length_dist.type = CIRCPAD_DIST_UNIFORM;
114 | gen_relay->states[1].length_dist.param1 = 7.4552261639344355;
115 | gen_relay->states[1].length_dist.param2 = 6.5836447477507445;
116 | gen_relay->states[1].start_length = 8;
117 | gen_relay->states[1].max_length = 567;
118 | gen_relay->states[1].iat_dist.type = CIRCPAD_DIST_WEIBULL;
119 | gen_relay->states[1].iat_dist.param1 = 5.028757716771455;
120 | gen_relay->states[1].iat_dist.param2 = 3.6175408250793497;
121 | gen_relay->states[1].start_length = 2;
122 | gen_relay->states[1].dist_max_sample_usec = 63563;
123 | gen_relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 0;
124 | gen_relay->states[2].length_dist.type = CIRCPAD_DIST_WEIBULL;
125 | gen_relay->states[2].length_dist.param1 = 5.622586262962072;
126 | gen_relay->states[2].length_dist.param2 = 0.30230478680857154;
127 | gen_relay->states[2].start_length = 1;
128 | gen_relay->states[2].max_length = 391;
129 | gen_relay->states[2].iat_dist.type = CIRCPAD_DIST_GEOMETRIC;
130 | gen_relay->states[2].iat_dist.param1 = 9.494124071150765;
131 | gen_relay->states[2].iat_dist.param2 = 4.857852071000062;
132 | gen_relay->states[2].start_length = 5;
133 | gen_relay->states[2].dist_max_sample_usec = 68729;
134 | gen_relay->states[3].length_dist.type = CIRCPAD_DIST_GEOMETRIC;
135 | gen_relay->states[3].length_dist.param1 = 0.7518053414585135;
136 | gen_relay->states[3].length_dist.param2 = 2.2110771083054215;
137 | gen_relay->states[3].start_length = 1;
138 | gen_relay->states[3].max_length = 141;
139 | gen_relay->states[3].iat_dist.type = CIRCPAD_DIST_GEOMETRIC;
140 | gen_relay->states[3].iat_dist.param1 = 3.7855567949957916;
141 | gen_relay->states[3].iat_dist.param2 = 5.158070632109185;
142 | gen_relay->states[3].dist_max_sample_usec = 77068;
143 | gen_relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
144 | gen_relay->states[4].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
145 | gen_relay->states[4].length_dist.param1 = 7.327935236568187;
146 | gen_relay->states[4].length_dist.param2 = 4.431830291905961;
147 | gen_relay->states[4].start_length = 5;
148 | gen_relay->states[4].max_length = 105;
149 | gen_relay->states[4].iat_dist.type = CIRCPAD_DIST_UNIFORM;
150 | gen_relay->states[4].iat_dist.param1 = 5.256975990732162;
151 | gen_relay->states[4].iat_dist.param2 = 2.2653274630000197;
152 | gen_relay->states[4].dist_max_sample_usec = 35592;
153 | gen_relay->states[4].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2;
154 | gen_relay->machine_num = smartlist_len(relay_padding_machines);
155 | circpad_register_padding_machine(gen_relay, relay_padding_machines);
156 | ```
157 | 
158 | The relay machine is even worse: states 0 and 1 only have transitions to each
159 | other, so states 2-4 are useless.
160 | 
161 | ## Evaluation
162 | Let's see if the above machine does something. Using the circpad simulator we
163 | simulate the goodenough February dataset for the safest security level. We then
164 | pick a random trace and verify that the machine is producing padding:
165 | 
166 | ```
167 | $ cat monitored/1-0.trace | grep circpad_cell_event_padding | wc -l
168 | 241
169 | ```
170 | 
171 | The trace contained 241 padding events, great, it does something! 
172 | 
173 | Let's also see how effective the machine is as a defense against the Deep
174 | Fingerprinting attack by using `/evaluation/once.py` from this repo. 
175 | 
176 | Results with the machine:
177 | ```
178 | threshold  0.0, recall  0.9, precision 0.93, F1 0.91, accuracy 0.92   [tp   898, fpp    17, fnp    53, tn   947, fn    85]
179 | threshold 0.11, recall  0.9, precision 0.93, F1 0.91, accuracy 0.92   [tp   898, fpp    17, fnp    53, tn   947, fn    85]
180 | threshold 0.35, recall  0.9, precision 0.93, F1 0.91, accuracy 0.92   [tp   898, fpp    16, fnp    53, tn   947, fn    86]
181 | threshold 0.53, recall 0.89, precision 0.94, F1 0.91, accuracy 0.92   [tp   887, fpp    12, fnp    45, tn   955, fn   101]
182 | threshold 0.66, recall 0.88, precision 0.96, F1 0.91, accuracy 0.92   [tp   877, fpp     7, fnp    34, tn   966, fn   116]
183 | threshold 0.75, recall 0.86, precision 0.97, F1 0.91, accuracy 0.92   [tp   863, fpp     5, fnp    24, tn   976, fn   132]
184 | threshold 0.82, recall 0.85, precision 0.97, F1 0.91, accuracy 0.92   [tp   854, fpp     4, fnp    20, tn   980, fn   142]
185 | threshold 0.87, recall 0.85, precision 0.98, F1 0.91, accuracy 0.92   [tp   849, fpp     3, fnp    18, tn   982, fn   148]
186 | threshold 0.91, recall 0.84, precision 0.98, F1 0.91, accuracy 0.92   [tp   845, fpp     3, fnp    15, tn   985, fn   152]
187 | threshold 0.93, recall 0.84, precision 0.98, F1  0.9, accuracy 0.91   [tp   838, fpp     2, fnp    14, tn   986, fn   160]
188 | threshold 0.95, recall 0.83, precision 0.98, F1  0.9, accuracy 0.91   [tp   832, fpp     2, fnp    12, tn   988, fn   166]
189 | threshold 0.96, recall 0.83, precision 0.99, F1  0.9, accuracy 0.91   [tp   826, fpp     1, fnp    10, tn   990, fn   173]
190 | threshold 0.97, recall 0.82, precision 0.99, F1  0.9, accuracy  0.9   [tp   818, fpp     1, fnp     9, tn   991, fn   181]
191 | threshold 0.98, recall 0.81, precision 0.99, F1 0.89, accuracy  0.9   [tp   805, fpp     1, fnp     6, tn   994, fn   194]
192 | threshold 0.99, recall  0.8, precision 0.99, F1 0.88, accuracy  0.9   [tp   796, fpp     0, fnp     6, tn   994, fn   204]
193 | threshold 0.99, recall 0.79, precision 0.99, F1 0.88, accuracy 0.89   [tp   787, fpp     0, fnp     5, tn   995, fn   213]
194 | ```
195 | 
196 | And without the machine:
197 | ```
198 | threshold  0.0, recall 0.87, precision 0.91, F1 0.89, accuracy 0.91   [tp   868, fpp    34, fnp    51, tn   949, fn    98]
199 | threshold 0.11, recall 0.87, precision 0.91, F1 0.89, accuracy 0.91   [tp   868, fpp    34, fnp    51, tn   949, fn    98]
200 | threshold 0.35, recall 0.87, precision 0.91, F1 0.89, accuracy 0.91   [tp   868, fpp    32, fnp    50, tn   950, fn   100]
201 | threshold 0.53, recall 0.86, precision 0.93, F1  0.9, accuracy 0.91   [tp   863, fpp    22, fnp    43, tn   957, fn   115]
202 | threshold 0.66, recall 0.85, precision 0.94, F1 0.89, accuracy 0.91   [tp   855, fpp    17, fnp    40, tn   960, fn   128]
203 | threshold 0.75, recall 0.85, precision 0.95, F1 0.89, accuracy 0.91   [tp   846, fpp    14, fnp    34, tn   966, fn   140]
204 | threshold 0.82, recall 0.84, precision 0.96, F1  0.9, accuracy 0.91   [tp   841, fpp    10, fnp    28, tn   972, fn   149]
205 | threshold 0.87, recall 0.83, precision 0.97, F1 0.89, accuracy  0.9   [tp   833, fpp     5, fnp    25, tn   975, fn   162]
206 | threshold 0.91, recall 0.82, precision 0.97, F1 0.89, accuracy  0.9   [tp   822, fpp     2, fnp    24, tn   976, fn   176]
207 | threshold 0.93, recall 0.82, precision 0.98, F1 0.89, accuracy  0.9   [tp   816, fpp     2, fnp    18, tn   982, fn   182]
208 | threshold 0.95, recall 0.81, precision 0.98, F1 0.88, accuracy  0.9   [tp   806, fpp     2, fnp    14, tn   986, fn   192]
209 | threshold 0.96, recall  0.8, precision 0.99, F1 0.88, accuracy 0.89   [tp   796, fpp     1, fnp    11, tn   989, fn   203]
210 | threshold 0.97, recall 0.78, precision 0.99, F1 0.87, accuracy 0.89   [tp   781, fpp     0, fnp     9, tn   991, fn   219]
211 | threshold 0.98, recall 0.77, precision 0.99, F1 0.86, accuracy 0.88   [tp   768, fpp     0, fnp     8, tn   992, fn   232]
212 | threshold 0.99, recall 0.76, precision 0.99, F1 0.86, accuracy 0.88   [tp   761, fpp     0, fnp     7, tn   993, fn   239]
213 | threshold 0.99, recall 0.75, precision 0.99, F1 0.86, accuracy 0.87   [tp   752, fpp     0, fnp     5, tn   995, fn   248]
214 | ```
215 | 
216 | As we can see, a random machine is not always a defense at all: in this case it
217 | made the DF attack better, not worse.
218 | 
219 | ## Documentation
220 | This is the documentation. The closer we get to an efficient and effective
221 | machine, the more time we plan to spend.


--------------------------------------------------------------------------------
/machines/phase2/april-mc:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
 4 | client->conditions.reduced_padding_ok = 1;
 5 | 
 6 | client->name = "evolved";
 7 | client->machine_index = 0;
 8 | client->target_hopnum = 2;
 9 | client->is_origin_side = 1;
10 | client->allowed_padding_count = 1000;
11 | client->max_padding_percent = 50;
12 | circpad_machine_states_init(client, 4);
13 | client->states[0].length_dist.type = CIRCPAD_DIST_NONE;
14 | client->states[0].length_dist.param1 = 8.726140065991641;
15 | client->states[0].length_dist.param2 = 6.239957511844941;
16 | client->states[0].start_length = 9;
17 | client->states[0].max_length = 611;
18 | client->states[0].iat_dist.type = CIRCPAD_DIST_NONE;
19 | client->states[0].iat_dist.param1 = 7.887031292919491;
20 | client->states[0].iat_dist.param2 = 1.948416595933855;
21 | client->states[0].dist_max_sample_usec = 39633;
22 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
23 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
24 | client->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 0;
25 | client->states[1].length_dist.type = CIRCPAD_DIST_NONE;
26 | client->states[1].length_dist.param1 = 6.174064346424731;
27 | client->states[1].length_dist.param2 = 8.169566534772848;
28 | client->states[1].start_length = 8;
29 | client->states[1].max_length = 876;
30 | client->states[1].iat_dist.type = CIRCPAD_DIST_PARETO;
31 | client->states[1].iat_dist.param1 = 6.013166313063582;
32 | client->states[1].iat_dist.param2 = 3.909603771161987;
33 | client->states[1].dist_max_sample_usec = 85193;
34 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
35 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0;
36 | client->states[2].length_dist.type = CIRCPAD_DIST_NONE;
37 | client->states[2].length_dist.param1 = 1.6026451213041193;
38 | client->states[2].length_dist.param2 = 2.932035147480483;
39 | client->states[2].start_length = 1;
40 | client->states[2].max_length = 413;
41 | client->states[2].iat_dist.type = CIRCPAD_DIST_NONE;
42 | client->states[2].iat_dist.param1 = 4.780004981695894;
43 | client->states[2].iat_dist.param2 = 0.5839238235898347;
44 | client->states[2].dist_max_sample_usec = 92193;
45 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
46 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
47 | client->states[3].length_dist.type = CIRCPAD_DIST_PARETO;
48 | client->states[3].length_dist.param1 = 4.776842508009852;
49 | client->states[3].length_dist.param2 = 4.807709366988267;
50 | client->states[3].start_length = 3;
51 | client->states[3].max_length = 494;
52 | client->states[3].iat_dist.type = CIRCPAD_DIST_PARETO;
53 | client->states[3].iat_dist.param1 = 3.3391870088596;
54 | client->states[3].iat_dist.param2 = 7.179045336148708;
55 | client->states[3].dist_max_sample_usec = 9445;
56 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
57 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
58 | 
59 | client->machine_num = smartlist_len(origin_padding_machines);
60 | circpad_register_padding_machine(client, origin_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/april-mr:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | 
 3 | relay->name = "evolved_relay";
 4 | relay->machine_index = 0;
 5 | relay->target_hopnum = 2;
 6 | relay->is_origin_side = 0;
 7 | relay->allowed_padding_count = 1000;
 8 | relay->max_padding_percent = 50;
 9 | circpad_machine_states_init(relay, 4);
10 | relay->states[0].length_dist.type = CIRCPAD_DIST_GEOMETRIC;
11 | relay->states[0].length_dist.param1 = 9.619691383629117;
12 | relay->states[0].length_dist.param2 = 0.9505104524626451;
13 | relay->states[0].start_length = 9;
14 | relay->states[0].max_length = 799;
15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_PARETO;
16 | relay->states[0].iat_dist.param1 = 5.460653840184872;
17 | relay->states[0].iat_dist.param2 = 7.080387541173288;
18 | relay->states[0].dist_max_sample_usec = 94722;
19 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
20 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
21 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2;
22 | relay->states[1].length_dist.type = CIRCPAD_DIST_PARETO;
23 | relay->states[1].length_dist.param1 = 6.620754495941119;
24 | relay->states[1].length_dist.param2 = 0.01028407677243659;
25 | relay->states[1].start_length = 4;
26 | relay->states[1].max_length = 326;
27 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
28 | relay->states[1].iat_dist.param1 = 1.2767765551835941;
29 | relay->states[1].iat_dist.param2 = 0.11492671368700358;
30 | relay->states[1].dist_max_sample_usec = 31443;
31 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
32 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC;
33 | relay->states[2].length_dist.param1 = 4.11964473793041;
34 | relay->states[2].length_dist.param2 = 2.7250362139341764;
35 | relay->states[2].start_length = 5;
36 | relay->states[2].max_length = 693;
37 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
38 | relay->states[2].iat_dist.param1 = 5.232180204916029;
39 | relay->states[2].iat_dist.param2 = 5.469677647300559;
40 | relay->states[2].dist_max_sample_usec = 94733;
41 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
42 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3;
43 | relay->states[2].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2;
44 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
45 | relay->states[3].length_dist.param1 = 1.6167675237934875;
46 | relay->states[3].length_dist.param2 = 6.128003159320049;
47 | relay->states[3].start_length = 5;
48 | relay->states[3].max_length = 383;
49 | relay->states[3].iat_dist.type = CIRCPAD_DIST_UNIFORM;
50 | relay->states[3].iat_dist.param1 = 4.270468437086448;
51 | relay->states[3].iat_dist.param2 = 7.926284402139126;
52 | relay->states[3].dist_max_sample_usec = 55878;
53 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
54 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
55 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
56 | relay->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2;
57 | 
58 | relay->machine_num = smartlist_len(relay_padding_machines);
59 | circpad_register_padding_machine(relay, relay_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/april-nopadding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/april-nopadding.png


--------------------------------------------------------------------------------
/machines/phase2/april.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/april.png


--------------------------------------------------------------------------------
/machines/phase2/february-mc:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
 4 | client->conditions.reduced_padding_ok = 1;
 5 | 
 6 | client->name = "evolved";
 7 | client->machine_index = 0;
 8 | client->target_hopnum = 1;
 9 | client->is_origin_side = 1;
10 | client->allowed_padding_count = 2000;
11 | client->max_padding_percent = 80;
12 | circpad_machine_states_init(client, 4);
13 | client->states[0].length_dist.type = CIRCPAD_DIST_PARETO;
14 | client->states[0].length_dist.param1 = 4.936848093904933;
15 | client->states[0].length_dist.param2 = 3.7363302458109127;
16 | client->states[0].start_length = 4;
17 | client->states[0].max_length = 465;
18 | client->states[0].iat_dist.type = CIRCPAD_DIST_PARETO;
19 | client->states[0].iat_dist.param1 = 8.455977480863071;
20 | client->states[0].iat_dist.param2 = 8.48589377927243;
21 | client->states[0].dist_max_sample_usec = 2405;
22 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
23 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
24 | client->states[1].length_dist.type = CIRCPAD_DIST_LOGISTIC;
25 | client->states[1].length_dist.param1 = 2.3729679932211143;
26 | client->states[1].length_dist.param2 = 3.443389414939797;
27 | client->states[1].start_length = 1;
28 | client->states[1].max_length = 564;
29 | client->states[1].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
30 | client->states[1].iat_dist.param1 = 7.268741849723991;
31 | client->states[1].iat_dist.param2 = 9.17862593564404;
32 | client->states[1].dist_max_sample_usec = 22527;
33 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
34 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3;
35 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1;
36 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0;
37 | client->states[1].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 0;
38 | client->states[2].length_dist.type = CIRCPAD_DIST_UNIFORM;
39 | client->states[2].length_dist.param1 = 8.211004044413276;
40 | client->states[2].length_dist.param2 = 1.9422749133401196;
41 | client->states[2].start_length = 3;
42 | client->states[2].max_length = 420;
43 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO;
44 | client->states[2].iat_dist.param1 = 2.572575044424351;
45 | client->states[2].iat_dist.param2 = 1.8615197791400429;
46 | client->states[2].dist_max_sample_usec = 76185;
47 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
48 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
49 | client->states[3].length_dist.type = CIRCPAD_DIST_GEOMETRIC;
50 | client->states[3].length_dist.param1 = 4.712947245478112;
51 | client->states[3].length_dist.param2 = 7.192401484606177;
52 | client->states[3].start_length = 5;
53 | client->states[3].max_length = 383;
54 | client->states[3].iat_dist.type = CIRCPAD_DIST_GEOMETRIC;
55 | client->states[3].iat_dist.param1 = 7.107152046757118;
56 | client->states[3].iat_dist.param2 = 6.279751058010154;
57 | client->states[3].dist_max_sample_usec = 20836;
58 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
59 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1;
60 | client->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1;
61 | 
62 | client->machine_num = smartlist_len(origin_padding_machines);
63 | circpad_register_padding_machine(client, origin_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/february-mr:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | 
 3 | relay->name = "evolved";
 4 | relay->machine_index = 0;
 5 | relay->target_hopnum = 1;
 6 | relay->is_origin_side = 0;
 7 | relay->allowed_padding_count = 4000;
 8 | relay->max_padding_percent = 80;
 9 | circpad_machine_states_init(relay, 4);
10 | relay->states[0].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
11 | relay->states[0].length_dist.param1 = 9.710256645599623;
12 | relay->states[0].length_dist.param2 = 9.988837974059598;
13 | relay->states[0].start_length = 1;
14 | relay->states[0].max_length = 881;
15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_UNIFORM;
16 | relay->states[0].iat_dist.param1 = 6.232538902597691;
17 | relay->states[0].iat_dist.param2 = 8.443597518961244;
18 | relay->states[0].dist_max_sample_usec = 55847;
19 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1;
20 | relay->states[1].length_dist.type = CIRCPAD_DIST_NONE;
21 | relay->states[1].length_dist.param1 = 5.154223094887426;
22 | relay->states[1].length_dist.param2 = 0.7400120457054027;
23 | relay->states[1].start_length = 4;
24 | relay->states[1].max_length = 149;
25 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
26 | relay->states[1].iat_dist.param1 = 5.303006850136577;
27 | relay->states[1].iat_dist.param2 = 0.6077197613396013;
28 | relay->states[1].dist_max_sample_usec = 41337;
29 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 0;
30 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3;
31 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC;
32 | relay->states[2].length_dist.param1 = 7.677154861253354;
33 | relay->states[2].length_dist.param2 = 8.28859930213646;
34 | relay->states[2].start_length = 7;
35 | relay->states[2].max_length = 912;
36 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
37 | relay->states[2].iat_dist.param1 = 1.6456878789999463;
38 | relay->states[2].iat_dist.param2 = 0.6419054414650316;
39 | relay->states[2].dist_max_sample_usec = 89370;
40 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
41 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0;
42 | relay->states[2].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 0;
43 | relay->states[3].length_dist.type = CIRCPAD_DIST_UNIFORM;
44 | relay->states[3].length_dist.param1 = 8.909655436873711;
45 | relay->states[3].length_dist.param2 = 1.2870903034258951;
46 | relay->states[3].start_length = 7;
47 | relay->states[3].max_length = 720;
48 | relay->states[3].iat_dist.type = CIRCPAD_DIST_NONE;
49 | relay->states[3].iat_dist.param1 = 6.15454437455432;
50 | relay->states[3].iat_dist.param2 = 2.427321350813574;
51 | relay->states[3].dist_max_sample_usec = 94319;
52 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1;
53 | 
54 | relay->machine_num = smartlist_len(relay_padding_machines);
55 | circpad_register_padding_machine(relay, relay_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/february-nopadding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/february-nopadding.png


--------------------------------------------------------------------------------
/machines/phase2/february.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/february.png


--------------------------------------------------------------------------------
/machines/phase2/june-mc:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
 4 | client->conditions.reduced_padding_ok = 1;
 5 | 
 6 | client->name = "evolved";
 7 | client->machine_index = 0;
 8 | client->target_hopnum = 2;
 9 | client->is_origin_side = 1;
10 | client->allowed_padding_count = 1000;
11 | client->max_padding_percent = 50;
12 | circpad_machine_states_init(client, 4);
13 | client->states[0].length_dist.type = CIRCPAD_DIST_LOGISTIC;
14 | client->states[0].length_dist.param1 = 1.3092918377235663;
15 | client->states[0].length_dist.param2 = 4.348612869294878;
16 | client->states[0].start_length = 4;
17 | client->states[0].max_length = 247;
18 | client->states[0].iat_dist.type = CIRCPAD_DIST_PARETO;
19 | client->states[0].iat_dist.param1 = 1.8536172530364303;
20 | client->states[0].iat_dist.param2 = 6.538955098232947;
21 | client->states[0].dist_max_sample_usec = 55368;
22 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
23 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
24 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
25 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
26 | client->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3;
27 | client->states[1].length_dist.type = CIRCPAD_DIST_WEIBULL;
28 | client->states[1].length_dist.param1 = 2.502127504187194;
29 | client->states[1].length_dist.param2 = 3.264058654171975;
30 | client->states[1].start_length = 5;
31 | client->states[1].max_length = 480;
32 | client->states[1].iat_dist.type = CIRCPAD_DIST_GEOMETRIC;
33 | client->states[1].iat_dist.param1 = 3.0612997591093203;
34 | client->states[1].iat_dist.param2 = 1.1631101677415767;
35 | client->states[1].dist_max_sample_usec = 50578;
36 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
37 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
38 | client->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC;
39 | client->states[2].length_dist.param1 = 5.312431698514274;
40 | client->states[2].length_dist.param2 = 8.575598651430298;
41 | client->states[2].start_length = 1;
42 | client->states[2].max_length = 196;
43 | client->states[2].iat_dist.type = CIRCPAD_DIST_UNIFORM;
44 | client->states[2].iat_dist.param1 = 3.743025234693156;
45 | client->states[2].iat_dist.param2 = 4.230923837488635;
46 | client->states[2].dist_max_sample_usec = 24151;
47 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 0;
48 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3;
49 | client->states[2].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3;
50 | client->states[3].length_dist.type = CIRCPAD_DIST_NONE;
51 | client->states[3].length_dist.param1 = 9.837404566063828;
52 | client->states[3].length_dist.param2 = 1.8665675598256148;
53 | client->states[3].start_length = 2;
54 | client->states[3].max_length = 922;
55 | client->states[3].iat_dist.type = CIRCPAD_DIST_GEOMETRIC;
56 | client->states[3].iat_dist.param1 = 3.832715364459043;
57 | client->states[3].iat_dist.param2 = 9.307168953452074;
58 | client->states[3].dist_max_sample_usec = 20206;
59 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
60 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
61 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
62 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0;
63 | client->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 0;
64 | 
65 | client->machine_num = smartlist_len(origin_padding_machines);
66 | circpad_register_padding_machine(client, origin_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/june-mr:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | 
 3 | relay->name = "evolved_relay";
 4 | relay->machine_index = 0;
 5 | relay->target_hopnum = 2;
 6 | relay->is_origin_side = 0;
 7 | relay->allowed_padding_count = 1000;
 8 | relay->max_padding_percent = 50;
 9 | circpad_machine_states_init(relay, 4);
10 | relay->states[0].length_dist.type = CIRCPAD_DIST_UNIFORM;
11 | relay->states[0].length_dist.param1 = 7.541197616744535;
12 | relay->states[0].length_dist.param2 = 4.959358844064398;
13 | relay->states[0].start_length = 8;
14 | relay->states[0].max_length = 321;
15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
16 | relay->states[0].iat_dist.param1 = 6.355669985302768;
17 | relay->states[0].iat_dist.param2 = 4.718433911978695;
18 | relay->states[0].dist_max_sample_usec = 66344;
19 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
20 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
21 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
22 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
23 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3;
24 | relay->states[1].length_dist.type = CIRCPAD_DIST_LOGISTIC;
25 | relay->states[1].length_dist.param1 = 3.4705572928922788;
26 | relay->states[1].length_dist.param2 = 9.861608037705961;
27 | relay->states[1].start_length = 9;
28 | relay->states[1].max_length = 315;
29 | relay->states[1].iat_dist.type = CIRCPAD_DIST_PARETO;
30 | relay->states[1].iat_dist.param1 = 5.519982726461615;
31 | relay->states[1].iat_dist.param2 = 5.978017042155093;
32 | relay->states[1].dist_max_sample_usec = 85748;
33 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
34 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
35 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
36 | relay->states[2].length_dist.type = CIRCPAD_DIST_PARETO;
37 | relay->states[2].length_dist.param1 = 7.676818011827004;
38 | relay->states[2].length_dist.param2 = 2.235768039871122;
39 | relay->states[2].start_length = 10;
40 | relay->states[2].max_length = 202;
41 | relay->states[2].iat_dist.type = CIRCPAD_DIST_UNIFORM;
42 | relay->states[2].iat_dist.param1 = 8.420468407681655;
43 | relay->states[2].iat_dist.param2 = 8.874744130401139;
44 | relay->states[2].dist_max_sample_usec = 49042;
45 | relay->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
46 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
47 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
48 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOGISTIC;
49 | relay->states[3].length_dist.param1 = 6.0176848403977425;
50 | relay->states[3].length_dist.param2 = 8.745793617904283;
51 | relay->states[3].start_length = 2;
52 | relay->states[3].max_length = 158;
53 | relay->states[3].iat_dist.type = CIRCPAD_DIST_NONE;
54 | relay->states[3].iat_dist.param1 = 9.914906787243591;
55 | relay->states[3].iat_dist.param2 = 3.0423641201110243;
56 | relay->states[3].dist_max_sample_usec = 26650;
57 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
58 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
59 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1;
60 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
61 | relay->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2;
62 | 
63 | relay->machine_num = smartlist_len(relay_padding_machines);
64 | circpad_register_padding_machine(relay, relay_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/june-nopadding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/june-nopadding.png


--------------------------------------------------------------------------------
/machines/phase2/june.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/june.png


--------------------------------------------------------------------------------
/machines/phase2/march-mc:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
 4 | client->conditions.reduced_padding_ok = 1;
 5 | 
 6 | client->name = "evolved";
 7 | client->machine_index = 0;
 8 | client->target_hopnum = 1;
 9 | client->is_origin_side = 1;
10 | client->allowed_padding_count = 2000;
11 | client->max_padding_percent = 80;
12 | circpad_machine_states_init(client, 4);
13 | client->states[0].length_dist.type = CIRCPAD_DIST_GEOMETRIC;
14 | client->states[0].length_dist.param1 = 9.29916383223583;
15 | client->states[0].length_dist.param2 = 8.563192209023997;
16 | client->states[0].start_length = 2;
17 | client->states[0].max_length = 844;
18 | client->states[0].iat_dist.type = CIRCPAD_DIST_WEIBULL;
19 | client->states[0].iat_dist.param1 = 8.313382688571522;
20 | client->states[0].iat_dist.param2 = 8.742989045383267;
21 | client->states[0].dist_max_sample_usec = 61818;
22 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
23 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
24 | client->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3;
25 | client->states[1].length_dist.type = CIRCPAD_DIST_PARETO;
26 | client->states[1].length_dist.param1 = 2.8017758499031;
27 | client->states[1].length_dist.param2 = 9.810981459295506;
28 | client->states[1].start_length = 3;
29 | client->states[1].max_length = 470;
30 | client->states[1].iat_dist.type = CIRCPAD_DIST_PARETO;
31 | client->states[1].iat_dist.param1 = 2.7147215785606016;
32 | client->states[1].iat_dist.param2 = 3.043038828426642;
33 | client->states[1].dist_max_sample_usec = 23669;
34 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3;
35 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1;
36 | client->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC;
37 | client->states[2].length_dist.param1 = 5.002464597348427;
38 | client->states[2].length_dist.param2 = 8.828389483663672;
39 | client->states[2].max_length = 287;
40 | client->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
41 | client->states[2].iat_dist.param1 = 0.6190142731130654;
42 | client->states[2].iat_dist.param2 = 8.4487983787332;
43 | client->states[2].dist_max_sample_usec = 12511;
44 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
45 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
46 | client->states[3].length_dist.type = CIRCPAD_DIST_WEIBULL;
47 | client->states[3].length_dist.param1 = 1.115301047557461;
48 | client->states[3].length_dist.param2 = 2.5638990818318996;
49 | client->states[3].start_length = 8;
50 | client->states[3].max_length = 847;
51 | client->states[3].iat_dist.type = CIRCPAD_DIST_GEOMETRIC;
52 | client->states[3].iat_dist.param1 = 8.374452627421187;
53 | client->states[3].iat_dist.param2 = 5.928675332086;
54 | client->states[3].dist_max_sample_usec = 16086;
55 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1;
56 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
57 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3;
58 | 
59 | client->machine_num = smartlist_len(origin_padding_machines);
60 | circpad_register_padding_machine(client, origin_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/march-mr:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | 
 3 | relay->name = "evolved";
 4 | relay->machine_index = 0;
 5 | relay->target_hopnum = 1;
 6 | relay->is_origin_side = 0;
 7 | relay->allowed_padding_count = 4000;
 8 | relay->max_padding_percent = 80;
 9 | circpad_machine_states_init(relay, 4);
10 | relay->states[0].length_dist.type = CIRCPAD_DIST_WEIBULL;
11 | relay->states[0].length_dist.param1 = 7.654991320082258;
12 | relay->states[0].length_dist.param2 = 4.456949700895323;
13 | relay->states[0].start_length = 6;
14 | relay->states[0].max_length = 603;
15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
16 | relay->states[0].iat_dist.param1 = 8.322240372052494;
17 | relay->states[0].iat_dist.param2 = 0.8362284595566694;
18 | relay->states[0].dist_max_sample_usec = 69469;
19 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
20 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2;
21 | relay->states[1].length_dist.type = CIRCPAD_DIST_PARETO;
22 | relay->states[1].length_dist.param1 = 9.196497877425955;
23 | relay->states[1].length_dist.param2 = 4.086797911814807;
24 | relay->states[1].start_length = 3;
25 | relay->states[1].max_length = 850;
26 | relay->states[1].iat_dist.type = CIRCPAD_DIST_NONE;
27 | relay->states[1].iat_dist.param1 = 1.5091988102682985;
28 | relay->states[1].iat_dist.param2 = 3.7171557072557784;
29 | relay->states[1].dist_max_sample_usec = 93752;
30 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
31 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 0;
32 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
33 | relay->states[2].length_dist.type = CIRCPAD_DIST_NONE;
34 | relay->states[2].length_dist.param1 = 5.558986316465104;
35 | relay->states[2].length_dist.param2 = 7.198858580257309;
36 | relay->states[2].start_length = 2;
37 | relay->states[2].max_length = 690;
38 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
39 | relay->states[2].iat_dist.param1 = 5.765567627587263;
40 | relay->states[2].iat_dist.param2 = 7.6815241245846755;
41 | relay->states[2].dist_max_sample_usec = 17978;
42 | relay->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
43 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
44 | relay->states[3].length_dist.type = CIRCPAD_DIST_UNIFORM;
45 | relay->states[3].length_dist.param1 = 2.8821373596173516;
46 | relay->states[3].length_dist.param2 = 0.6470170172573608;
47 | relay->states[3].start_length = 1;
48 | relay->states[3].max_length = 146;
49 | relay->states[3].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
50 | relay->states[3].iat_dist.param1 = 1.5273906057439013;
51 | relay->states[3].iat_dist.param2 = 3.326047013501766;
52 | relay->states[3].dist_max_sample_usec = 19309;
53 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0;
54 | 
55 | relay->machine_num = smartlist_len(relay_padding_machines);
56 | circpad_register_padding_machine(relay, relay_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/march-nopadding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/march-nopadding.png


--------------------------------------------------------------------------------
/machines/phase2/march.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/march.png


--------------------------------------------------------------------------------
/machines/phase2/may-mc:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
 4 | client->conditions.reduced_padding_ok = 1;
 5 | 
 6 | client->name = "evolved";
 7 | client->machine_index = 0;
 8 | client->target_hopnum = 2;
 9 | client->is_origin_side = 1;
10 | client->allowed_padding_count = 1000;
11 | client->max_padding_percent = 50;
12 | circpad_machine_states_init(client, 4);
13 | client->states[0].length_dist.type = CIRCPAD_DIST_LOGISTIC;
14 | client->states[0].length_dist.param1 = 0.9054657170837088;
15 | client->states[0].length_dist.param2 = 8.395721233310635;
16 | client->states[0].start_length = 5;
17 | client->states[0].max_length = 434;
18 | client->states[0].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
19 | client->states[0].iat_dist.param1 = 2.697620362868335;
20 | client->states[0].iat_dist.param2 = 4.536505160992885;
21 | client->states[0].dist_max_sample_usec = 31237;
22 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
23 | client->states[0].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 3;
24 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
25 | client->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2;
26 | client->states[1].length_dist.type = CIRCPAD_DIST_PARETO;
27 | client->states[1].length_dist.param1 = 4.674003575761336;
28 | client->states[1].length_dist.param2 = 5.9049600823910176;
29 | client->states[1].start_length = 5;
30 | client->states[1].max_length = 594;
31 | client->states[1].iat_dist.type = CIRCPAD_DIST_GEOMETRIC;
32 | client->states[1].iat_dist.param1 = 5.850427721566305;
33 | client->states[1].iat_dist.param2 = 5.4776798413515335;
34 | client->states[1].dist_max_sample_usec = 93024;
35 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
36 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3;
37 | client->states[2].length_dist.type = CIRCPAD_DIST_NONE;
38 | client->states[2].length_dist.param1 = 1.6026451213041193;
39 | client->states[2].length_dist.param2 = 2.932035147480483;
40 | client->states[2].start_length = 1;
41 | client->states[2].max_length = 425;
42 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO;
43 | client->states[2].iat_dist.param1 = 0.6272619534018797;
44 | client->states[2].iat_dist.param2 = 0.5031279263654487;
45 | client->states[2].dist_max_sample_usec = 88570;
46 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
47 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
48 | client->states[3].length_dist.type = CIRCPAD_DIST_PARETO;
49 | client->states[3].length_dist.param1 = 4.776842508009852;
50 | client->states[3].length_dist.param2 = 4.807709366988267;
51 | client->states[3].start_length = 3;
52 | client->states[3].max_length = 494;
53 | client->states[3].iat_dist.type = CIRCPAD_DIST_PARETO;
54 | client->states[3].iat_dist.param1 = 3.3391870088596;
55 | client->states[3].iat_dist.param2 = 7.179045336148708;
56 | client->states[3].dist_max_sample_usec = 9445;
57 | client->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
58 | client->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
59 | 
60 | client->machine_num = smartlist_len(origin_padding_machines);
61 | circpad_register_padding_machine(client, origin_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/may-mr:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | 
 3 | relay->name = "evolved_relay";
 4 | relay->machine_index = 0;
 5 | relay->target_hopnum = 2;
 6 | relay->is_origin_side = 0;
 7 | relay->allowed_padding_count = 1000;
 8 | relay->max_padding_percent = 50;
 9 | circpad_machine_states_init(relay, 4);
10 | relay->states[0].length_dist.type = CIRCPAD_DIST_NONE;
11 | relay->states[0].length_dist.param1 = 5.378318948177472;
12 | relay->states[0].length_dist.param2 = 2.151729097089823;
13 | relay->states[0].start_length = 10;
14 | relay->states[0].max_length = 583;
15 | relay->states[0].iat_dist.type = CIRCPAD_DIST_PARETO;
16 | relay->states[0].iat_dist.param1 = 5.460653840184872;
17 | relay->states[0].iat_dist.param2 = 7.080387541173288;
18 | relay->states[0].dist_max_sample_usec = 94722;
19 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
20 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 2;
21 | relay->states[1].length_dist.type = CIRCPAD_DIST_WEIBULL;
22 | relay->states[1].length_dist.param1 = 4.131784111285114;
23 | relay->states[1].length_dist.param2 = 5.676344743391601;
24 | relay->states[1].start_length = 4;
25 | relay->states[1].max_length = 185;
26 | relay->states[1].iat_dist.type = CIRCPAD_DIST_PARETO;
27 | relay->states[1].iat_dist.param1 = 3.0151010507095166;
28 | relay->states[1].iat_dist.param2 = 9.877753111650684;
29 | relay->states[1].dist_max_sample_usec = 15782;
30 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
31 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
32 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC;
33 | relay->states[2].length_dist.param1 = 4.11964473793041;
34 | relay->states[2].length_dist.param2 = 2.7250362139341764;
35 | relay->states[2].start_length = 5;
36 | relay->states[2].max_length = 375;
37 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
38 | relay->states[2].iat_dist.param1 = 9.596309409594099;
39 | relay->states[2].iat_dist.param2 = 0.4682935207787442;
40 | relay->states[2].dist_max_sample_usec = 94733;
41 | relay->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
42 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3;
43 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
44 | relay->states[3].length_dist.param1 = 1.6167675237934875;
45 | relay->states[3].length_dist.param2 = 6.128003159320049;
46 | relay->states[3].start_length = 5;
47 | relay->states[3].max_length = 383;
48 | relay->states[3].iat_dist.type = CIRCPAD_DIST_UNIFORM;
49 | relay->states[3].iat_dist.param1 = 4.270468437086448;
50 | relay->states[3].iat_dist.param2 = 7.926284402139126;
51 | relay->states[3].dist_max_sample_usec = 55878;
52 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
53 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
54 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
55 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
56 | relay->states[3].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 3;
57 | 
58 | relay->machine_num = smartlist_len(relay_padding_machines);
59 | circpad_register_padding_machine(relay, relay_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/may-nopadding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/may-nopadding.png


--------------------------------------------------------------------------------
/machines/phase2/may.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/may.png


--------------------------------------------------------------------------------
/machines/phase2/strawman-mc:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
 4 | client->conditions.reduced_padding_ok = 1;
 5 | 
 6 | client->name = "evolved"; // because lazy
 7 | client->machine_index = 0;
 8 | client->target_hopnum = 2;
 9 | client->is_origin_side = 1;
10 | client->allowed_padding_count = 3000;
11 | client->max_padding_percent = 90;
12 | circpad_machine_states_init(client, 3);
13 | 
14 | // state 0 just waits until CIRCPAD_EVENT_PADDING_RECV
15 | client->states[0].length_dist.type = CIRCPAD_DIST_NONE; // infinite length
16 | client->states[0].iat_dist.type = CIRCPAD_DIST_NONE; // infinite length
17 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
18 | 
19 | // state 1 waits for receiving something
20 | client->states[1].length_dist.type = CIRCPAD_DIST_NONE; // infinite length
21 | client->states[1].iat_dist.type = CIRCPAD_DIST_NONE; // infinite delay
22 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
23 | client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
24 | 
25 | // state 2 sends its sampled number of padding cells quickly, unless we send some non-padding
26 | client->states[2].length_dist.type = CIRCPAD_DIST_UNIFORM;
27 | client->states[2].length_dist.param1 = 5;
28 | client->states[2].length_dist.param2 = 15;
29 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO;
30 | client->states[2].iat_dist.param1 = 3.3;
31 | client->states[2].iat_dist.param2 = 7.1;
32 | client->states[2].dist_added_shift_usec = 0;
33 | client->states[2].dist_max_sample_usec = 9445;
34 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1;
35 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
36 | 
37 | client->machine_num = smartlist_len(origin_padding_machines);
38 | circpad_register_padding_machine(client, origin_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/strawman-mr:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | 
 3 | relay->name = "evolved_relay";  // because lazy
 4 | relay->machine_index = 0;
 5 | relay->target_hopnum = 2;
 6 | relay->is_origin_side = 0;
 7 | relay->allowed_padding_count = 100;
 8 | relay->max_padding_percent = 95;
 9 | circpad_machine_states_init(relay, 2);
10 | 
11 | relay->states[0].length_dist.type = CIRCPAD_DIST_GEOMETRIC;
12 | relay->states[0].length_dist.param1 = 9.6;
13 | relay->states[0].length_dist.param2 = 0.9;
14 | relay->states[0].start_length = 10;
15 | relay->states[0].max_length = 1000;
16 | relay->states[0].iat_dist.type = CIRCPAD_DIST_PARETO;
17 | relay->states[0].iat_dist.param1 = 5.5;
18 | relay->states[0].iat_dist.param2 = 7.1;
19 | relay->states[0].start_length = 10;
20 | relay->states[0].dist_max_sample_usec = 94722;
21 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
22 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
23 | relay->states[0].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1;
24 | 
25 | relay->states[1].length_dist.type = CIRCPAD_DIST_LOGISTIC;
26 | relay->states[1].length_dist.param1 = 4.1;
27 | relay->states[1].length_dist.param2 = 2.7;
28 | relay->states[1].start_length = 20;
29 | relay->states[1].max_length = 693;
30 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
31 | relay->states[1].iat_dist.param1 = 5.2;
32 | relay->states[1].iat_dist.param2 = 5.5;
33 | relay->states[1].dist_added_shift_usec = 0;
34 | relay->states[1].dist_max_sample_usec = 10000;
35 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 1;
36 | relay->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 0;
37 | relay->states[1].next_state[CIRCPAD_EVENT_LENGTH_COUNT] = 1;
38 | 
39 | relay->machine_num = smartlist_len(relay_padding_machines);
40 | circpad_register_padding_machine(relay, relay_padding_machines);


--------------------------------------------------------------------------------
/machines/phase2/strawman-nopadding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/strawman-nopadding.png


--------------------------------------------------------------------------------
/machines/phase2/strawman.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase2/strawman.png


--------------------------------------------------------------------------------
/machines/phase3/README.md:
--------------------------------------------------------------------------------
 1 | # Phase 3
 2 | Here are the final two machines: Spring and Interspace. Spring is based on
 3 | manually cleaning up the April machine from phase 2. Interspace builds upon
 4 | Spring and is manually tuned. See the full pre-print (to appear) for many more
 5 | details. Below you find the basic evaluation in the same format as for phase 2.
 6 | 
 7 | ## Spring
 8 | 
 9 | Computed overhead (efficiency):
10 | ```
11 | in total for 20000 traces:
12 | 	- 95773038 cells
13 | 	- 210% average total bandwidth
14 | 	- 18494898 sent cells (19%)
15 | 		- 4534842 nonpadding
16 | 		- 13960056 padding
17 | 		- 408% average sent bandwidth
18 | 	- 77278140 recv cells (81%)
19 | 		- 40967808 nonpadding
20 | 		- 36310332 padding
21 | 		- 189% average recv bandwidth
22 | ```
23 | 
24 | Evaluated effectiveness:
25 | ```
26 | made 2000 predictions with 2000 labels
27 | 	threshold  0.0, recall 0.51, precision 0.46, F1 0.48, accuracy 0.58   [tp   508, fpp   240, fnp   349, tn   651, fn   252]
28 | 	threshold 0.11, recall 0.51, precision 0.46, F1 0.48, accuracy 0.58   [tp   508, fpp   240, fnp   349, tn   651, fn   252]
29 | 	threshold 0.35, recall  0.5, precision 0.49, F1 0.49, accuracy 0.59   [tp   500, fpp   205, fnp   316, tn   684, fn   295]
30 | 	threshold 0.53, recall 0.45, precision 0.59, F1 0.51, accuracy 0.62   [tp   445, fpp   116, fnp   199, tn   801, fn   439]
31 | 	threshold 0.66, recall 0.39, precision 0.65, F1 0.49, accuracy 0.63   [tp   394, fpp    73, fnp   142, tn   858, fn   533]
32 | 	threshold 0.75, recall 0.35, precision 0.71, F1 0.47, accuracy 0.63   [tp   351, fpp    45, fnp    98, tn   902, fn   604]
33 | 	threshold 0.82, recall 0.32, precision 0.76, F1 0.45, accuracy 0.63   [tp   321, fpp    31, fnp    68, tn   932, fn   648]
34 | 	threshold 0.87, recall 0.29, precision 0.82, F1 0.43, accuracy 0.62   [tp   289, fpp    20, fnp    43, tn   957, fn   691]
35 | 	threshold 0.91, recall 0.26, precision 0.86, F1  0.4, accuracy 0.61   [tp   257, fpp    14, fnp    28, tn   972, fn   729]
36 | 	threshold 0.93, recall 0.23, precision 0.88, F1 0.36, accuracy  0.6   [tp   228, fpp    12, fnp    19, tn   981, fn   760]
37 | 	threshold 0.95, recall 0.19, precision 0.92, F1 0.32, accuracy 0.59   [tp   194, fpp     6, fnp    11, tn   989, fn   800]
38 | 	threshold 0.96, recall 0.17, precision 0.93, F1 0.29, accuracy 0.58   [tp   173, fpp     6, fnp     8, tn   992, fn   821]
39 | 	threshold 0.97, recall 0.15, precision 0.94, F1 0.26, accuracy 0.57   [tp   151, fpp     2, fnp     7, tn   993, fn   847]
40 | 	threshold 0.98, recall 0.13, precision 0.95, F1 0.23, accuracy 0.56   [tp   131, fpp     1, fnp     6, tn   994, fn   868]
41 | 	threshold 0.99, recall 0.11, precision 0.96, F1  0.2, accuracy 0.56   [tp   114, fpp     1, fnp     4, tn   996, fn   885]
42 | 	threshold 0.99, recall 0.098, precision 0.97, F1 0.18, accuracy 0.55   [tp    98, fpp     0, fnp     3, tn   997, fn   902]
43 | ```
44 | 
45 | Visualized (black/white = received/sent nonpadding, red/green = received/sent padding):
46 | ![spring-noppading](spring-nopadding.png)
47 | ![spring](spring.png)
48 | 
49 | ## Interspace
50 | 
51 | Computed overhead (efficiency):
52 | ```
53 | in total for 20000 traces:
54 | 	- 96245339 cells
55 | 	- 229% average total bandwidth
56 | 	- 25306497 sent cells (26%)
57 | 		- 4240612 nonpadding
58 | 		- 21065885 padding
59 | 		- 597% average sent bandwidth
60 | 	- 70938842 recv cells (74%)
61 | 		- 37877812 nonpadding
62 | 		- 33061030 padding
63 | 		- 187% average recv bandwidth
64 | 
65 | ```
66 | 
67 | Evaluated effectiveness:
68 | ```
69 | made 2000 predictions with 2000 labels
70 | 	threshold  0.0, recall 0.35, precision  0.4, F1 0.37, accuracy 0.53   [tp   351, fpp   229, fnp   297, tn   703, fn   420]
71 | 	threshold 0.11, recall 0.35, precision  0.4, F1 0.37, accuracy 0.53   [tp   351, fpp   229, fnp   297, tn   703, fn   420]
72 | 	threshold 0.35, recall 0.33, precision 0.44, F1 0.38, accuracy 0.54   [tp   331, fpp   180, fnp   242, tn   758, fn   489]
73 | 	threshold 0.53, recall 0.26, precision 0.58, F1 0.36, accuracy 0.57   [tp   258, fpp    81, fnp   109, tn   891, fn   661]
74 | 	threshold 0.66, recall 0.21, precision 0.68, F1 0.32, accuracy 0.57   [tp   206, fpp    37, fnp    59, tn   941, fn   757]
75 | 	threshold 0.75, recall 0.16, precision 0.74, F1 0.26, accuracy 0.56   [tp   159, fpp    19, fnp    36, tn   964, fn   822]
76 | 	threshold 0.82, recall 0.13, precision 0.82, F1 0.23, accuracy 0.56   [tp   131, fpp    11, fnp    17, tn   983, fn   858]
77 | 	threshold 0.87, recall  0.1, precision 0.84, F1 0.18, accuracy 0.55   [tp   103, fpp     7, fnp    12, tn   988, fn   890]
78 | 	threshold 0.91, recall 0.081, precision 0.91, F1 0.15, accuracy 0.54   [tp    81, fpp     3, fnp     5, tn   995, fn   916]
79 | 	threshold 0.93, recall 0.067, precision 0.97, F1 0.13, accuracy 0.53   [tp    67, fpp     2, fnp     0, tn  1000, fn   931]
80 | 	threshold 0.95, recall 0.056, precision 0.97, F1 0.11, accuracy 0.53   [tp    56, fpp     2, fnp     0, tn  1000, fn   942]
81 | 	threshold 0.96, recall 0.046, precision 0.96, F1 0.088, accuracy 0.52   [tp    46, fpp     2, fnp     0, tn  1000, fn   952]
82 | 	threshold 0.97, recall 0.041, precision  1.0, F1 0.079, accuracy 0.52   [tp    41, fpp     0, fnp     0, tn  1000, fn   959]
83 | 	threshold 0.98, recall 0.037, precision  1.0, F1 0.071, accuracy 0.52   [tp    37, fpp     0, fnp     0, tn  1000, fn   963]
84 | 	threshold 0.99, recall 0.032, precision  1.0, F1 0.062, accuracy 0.52   [tp    32, fpp     0, fnp     0, tn  1000, fn   968]
85 | 	threshold 0.99, recall 0.026, precision  1.0, F1 0.051, accuracy 0.51   [tp    26, fpp     0, fnp     0, tn  1000, fn   974]
86 | 
87 | ```
88 | Visualized (black/white = received/sent nonpadding, red/green = received/sent padding):
89 | ![interspace-noppading](interspace-nopadding.png)
90 | ![interspace](interspace.png)


--------------------------------------------------------------------------------
/machines/phase3/interspace-mc.c:
--------------------------------------------------------------------------------
 1 | const struct uniform_t my_uniform = {
 2 |     .base = UNIFORM(my_uniform),
 3 |     .a = 0.0,
 4 |     .b = 1.0,
 5 | };
 6 | 
 7 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 8 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 9 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
10 | client->conditions.reduced_padding_ok = 1;
11 | 
12 | client->name = "interspace_client";
13 | client->machine_index = 0;
14 | client->target_hopnum = 2;
15 | client->is_origin_side = 1;
16 | client->allowed_padding_count = 1500;
17 | client->max_padding_percent = 50;
18 | circpad_machine_states_init(client, 3);
19 | 
20 | // wait until the relay is active
21 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
22 | 
23 | // wait for something to either mask the length of or inject a fake request
24 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
25 | if (dist_sample(&my_uniform.base) < 0.5) {
26 |    client->states[1].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
27 | }
28 | 
29 | client->states[2].length_dist.type = CIRCPAD_DIST_PARETO;
30 | client->states[2].length_dist.param1 = 4.7;
31 | client->states[2].length_dist.param2 = 4.8;
32 | client->states[2].start_length = 1;
33 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO; // tweak for log-logistic?
34 | client->states[2].iat_dist.param1 = 3.3;
35 | client->states[2].iat_dist.param2 = 7.2;
36 | client->states[2].dist_max_sample_usec = 9445;
37 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1;
38 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
39 | if (dist_sample(&my_uniform.base) < 0.5) {
40 |    client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
41 | }
42 | 
43 | client->machine_num = smartlist_len(origin_padding_machines);
44 | circpad_register_padding_machine(client, origin_padding_machines);
45 | 


--------------------------------------------------------------------------------
/machines/phase3/interspace-mr.c:
--------------------------------------------------------------------------------
  1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
  2 | 
  3 | // short define for sampling uniformly random [0,1.0]
  4 | const struct uniform_t my_uniform = {
  5 |     .base = UNIFORM(my_uniform),
  6 |     .a = 0.0,
  7 |     .b = 1.0,
  8 | };
  9 | #define CIRCPAD_UNI_RAND (dist_sample(&my_uniform.base))
 10 | 
 11 | // uniformly random select a distribution parameters between [0,10]
 12 | #define CIRCPAD_RAND_DIST_PARAM1 (CIRCPAD_UNI_RAND*10)
 13 | #define CIRCPAD_RAND_DIST_PARAM2 (CIRCPAD_UNI_RAND*10)
 14 | 
 15 | relay->name = "interspace_relay";
 16 | relay->machine_index = 0;
 17 | relay->target_hopnum = 2;
 18 | relay->is_origin_side = 0;
 19 | relay->allowed_padding_count = 1500;
 20 | relay->max_padding_percent = 50;
 21 | circpad_machine_states_init(relay, 4);
 22 | 
 23 | if (CIRCPAD_UNI_RAND < 0.5) {
 24 |     /*
 25 |     machine has following states:
 26 |     0. init: don't waste time early
 27 |     1. wait: either extend or fake
 28 |     2. extend: obfuscate length of existing bursts
 29 |     3. fake: inject fake bursts
 30 |     */
 31 | 
 32 |     // wait for client to send something, no point in doing stuff too early
 33 |     relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
 34 | 
 35 |     if (CIRCPAD_UNI_RAND < 0.5) {
 36 |         // wait: extend real burst
 37 |         relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
 38 |     } else {
 39 |         // wait: inject a fake burst after a while (FIXME: too long below)
 40 |         relay->states[1].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
 41 |         relay->states[1].iat_dist.param1 = CIRCPAD_UNI_RAND*1000; // alpha, scale and mean
 42 |         relay->states[1].iat_dist.param2 = CIRCPAD_UNI_RAND*10000; // shape, when > 1 larger reduces dispersion
 43 |         relay->states[1].dist_max_sample_usec = 100000;
 44 |         relay->states[1].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
 45 |     }
 46 | 
 47 |     // extend: add fake padding for real bursts
 48 |     relay->states[2].length_dist.type = CIRCPAD_DIST_PARETO;
 49 |     relay->states[2].length_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 50 |     relay->states[2].length_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 51 |     relay->states[2].start_length = 1;
 52 |     relay->states[2].iat_dist.type = CIRCPAD_DIST_PARETO;
 53 |     relay->states[2].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 54 |     relay->states[2].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 55 |     relay->states[2].dist_max_sample_usec = 10000;
 56 |     relay->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1;
 57 |     relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
 58 | 
 59 |     // fake: inject completely fake bursts
 60 |     relay->states[3].length_dist.type = CIRCPAD_DIST_PARETO;
 61 |     relay->states[3].length_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 62 |     relay->states[3].length_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 63 |     relay->states[3].start_length = 4;
 64 |     relay->states[3].iat_dist.type = CIRCPAD_DIST_PARETO;
 65 |     relay->states[3].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 66 |     relay->states[3].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 67 |     relay->states[3].dist_max_sample_usec = 10000;
 68 |     relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1;
 69 |     relay->states[3].next_state[CIRCPAD_EVENT_PADDING_SENT] = 3;
 70 | } else {
 71 |     // spring-mr
 72 |     relay->states[0].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
 73 |     relay->states[0].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 74 |     relay->states[0].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 75 |     relay->states[0].dist_max_sample_usec = 10000;
 76 |     relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
 77 |     relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
 78 | 
 79 |     relay->states[1].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
 80 |     relay->states[1].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 81 |     relay->states[1].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 82 |     relay->states[1].dist_max_sample_usec = 31443;
 83 |     relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
 84 | 
 85 |     relay->states[2].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
 86 |     relay->states[2].length_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 87 |     relay->states[2].length_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 88 |     relay->states[2].start_length = 5;
 89 |     relay->states[2].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
 90 |     relay->states[2].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 91 |     relay->states[2].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 92 |     relay->states[2].dist_max_sample_usec = 100000;
 93 |     relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
 94 |     relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3;
 95 | 
 96 |     relay->states[3].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
 97 |     relay->states[3].length_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
 98 |     relay->states[3].length_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
 99 |     relay->states[3].start_length = 5;
100 |     relay->states[3].iat_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
101 |     relay->states[3].iat_dist.param1 = CIRCPAD_RAND_DIST_PARAM1;
102 |     relay->states[3].iat_dist.param2 = CIRCPAD_RAND_DIST_PARAM2;
103 |     relay->states[3].dist_max_sample_usec = 55878;
104 |     relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
105 |     relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
106 |     relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
107 | }
108 | 
109 | relay->machine_num = smartlist_len(relay_padding_machines);
110 | circpad_register_padding_machine(relay, relay_padding_machines);
111 | 


--------------------------------------------------------------------------------
/machines/phase3/interspace-nopadding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase3/interspace-nopadding.png


--------------------------------------------------------------------------------
/machines/phase3/interspace.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase3/interspace.png


--------------------------------------------------------------------------------
/machines/phase3/spring-mc.c:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *client = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | client->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
 3 | client->conditions.purpose_mask = CIRCPAD_PURPOSE_ALL;
 4 | client->conditions.reduced_padding_ok = 1;
 5 | 
 6 | client->name = "spring_client";
 7 | client->machine_index = 0;
 8 | client->target_hopnum = 2;
 9 | client->is_origin_side = 1;
10 | client->allowed_padding_count = 1500;
11 | client->max_padding_percent = 50;
12 | circpad_machine_states_init(client, 3);
13 | 
14 | client->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
15 | 
16 | client->states[1].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 2;
17 | 
18 | client->states[2].length_dist.type = CIRCPAD_DIST_PARETO;
19 | client->states[2].length_dist.param1 = 4.776842508009852;
20 | client->states[2].length_dist.param2 = 4.807709366988267;
21 | client->states[2].start_length = 1;
22 | client->states[2].iat_dist.type = CIRCPAD_DIST_PARETO;
23 | client->states[2].iat_dist.param1 = 3.3391870088596;
24 | client->states[2].iat_dist.param2 = 7.179045336148708;
25 | client->states[2].dist_max_sample_usec = 9445;
26 | client->states[2].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 1;
27 | client->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
28 | 
29 | client->machine_num = smartlist_len(origin_padding_machines);
30 | circpad_register_padding_machine(client, origin_padding_machines);


--------------------------------------------------------------------------------
/machines/phase3/spring-mr.c:
--------------------------------------------------------------------------------
 1 | circpad_machine_spec_t *relay = tor_malloc_zero(sizeof(circpad_machine_spec_t));
 2 | 
 3 | relay->name = "spring_relay";
 4 | relay->machine_index = 0;
 5 | relay->target_hopnum = 2;
 6 | relay->is_origin_side = 0;
 7 | relay->allowed_padding_count = 1500;
 8 | relay->max_padding_percent = 50;
 9 | circpad_machine_states_init(relay, 4);
10 | relay->states[0].iat_dist.type = CIRCPAD_DIST_PARETO;
11 | relay->states[0].iat_dist.param1 = 5.460653840184872;
12 | relay->states[0].iat_dist.param2 = 7.080387541173288;
13 | relay->states[0].dist_max_sample_usec = 94722;
14 | relay->states[0].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 1;
15 | relay->states[0].next_state[CIRCPAD_EVENT_PADDING_RECV] = 1;
16 | 
17 | relay->states[1].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
18 | relay->states[1].iat_dist.param1 = 1.2767765551835941;
19 | relay->states[1].iat_dist.param2 = 0.11492671368700358;
20 | relay->states[1].dist_max_sample_usec = 31443;
21 | relay->states[1].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 2;
22 | 
23 | relay->states[2].length_dist.type = CIRCPAD_DIST_LOGISTIC;
24 | relay->states[2].length_dist.param1 = 4.11964473793041;
25 | relay->states[2].length_dist.param2 = 2.7250362139341764;
26 | relay->states[2].start_length = 5;
27 | relay->states[2].iat_dist.type = CIRCPAD_DIST_LOGISTIC;
28 | relay->states[2].iat_dist.param1 = 5.232180204916029;
29 | relay->states[2].iat_dist.param2 = 5.469677647300559;
30 | relay->states[2].dist_max_sample_usec = 94733;
31 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_SENT] = 2;
32 | relay->states[2].next_state[CIRCPAD_EVENT_PADDING_RECV] = 3;
33 | 
34 | relay->states[3].length_dist.type = CIRCPAD_DIST_LOG_LOGISTIC;
35 | relay->states[3].length_dist.param1 = 1.6167675237934875;
36 | relay->states[3].length_dist.param2 = 6.128003159320049;
37 | relay->states[3].start_length = 5;
38 | relay->states[3].iat_dist.type = CIRCPAD_DIST_UNIFORM;
39 | relay->states[3].iat_dist.param1 = 4.270468437086448;
40 | relay->states[3].iat_dist.param2 = 7.926284402139126;
41 | relay->states[3].dist_max_sample_usec = 55878;
42 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_RECV] = 3;
43 | relay->states[3].next_state[CIRCPAD_EVENT_NONPADDING_SENT] = 0;
44 | relay->states[3].next_state[CIRCPAD_EVENT_PADDING_RECV] = 2;
45 | 
46 | relay->machine_num = smartlist_len(relay_padding_machines);
47 | circpad_register_padding_machine(relay, relay_padding_machines);


--------------------------------------------------------------------------------
/machines/phase3/spring-nopadding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase3/spring-nopadding.png


--------------------------------------------------------------------------------
/machines/phase3/spring.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pylls/padding-machines-for-tor/18e7797664769a7fc22c58a06e5e011ed4f9105a/machines/phase3/spring.png


--------------------------------------------------------------------------------
/notes/machine-from-scratch.md:
--------------------------------------------------------------------------------
  1 | # A Padding Machine from Scratch
  2 | 
  3 | This document describes the process of building a "padding machine" in tor's new
  4 | circuit padding framework from scratch. Notes were taken as part of porting
  5 | [Adaptive Padding Early
  6 | (APE)](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) from basket2 to the
  7 | circuit padding framework. The goal is just to document the process and provide
  8 | useful pointers along the way, not create a useful machine. 
  9 | 
 10 | The quick and dirty plan is to:
 11 | 1. clone and compile tor
 12 | 2. use newly built tor in TB and at small (non-exit) relay we run
 13 | 3. add a bare-bones APE padding machine
 14 | 4. run the machine, inspect logs for activity
 15 | 5. port APE's state machine without thinking much about parameters
 16 | 
 17 | ## Clone and compile tor
 18 | 
 19 | ```bash
 20 | git clone https://git.torproject.org/tor.git
 21 | cd tor
 22 | git checkout tor-0.4.1.5
 23 | ```
 24 | Above we use the tag for tor-0.4.1.5 where the circuit padding framework was
 25 | released, feel free to use something newer (avoid HEAD though, can have bugs).
 26 | 
 27 | ```bash
 28 | sh autogen.sh 
 29 | ./configure
 30 | make
 31 | ```
 32 | When you run `./configure` you'll be told of missing dependencies and packages
 33 | to install on debian-based distributions. Important: if you plan to run `tor` on
 34 | a relay as part of the real Tor network and your server runs a distribution that
 35 | uses systemd, then I'd recommend that you `apt install dpkg dpkg-dev
 36 | libevent-dev libssl-dev asciidoc quilt dh-apparmor libseccomp-dev dh-systemd
 37 | libsystemd-dev pkg-config dh-autoreconf libfakeroot zlib1g zlib1g-dev automake
 38 | liblzma-dev libzstd-dev` and ensure that tor has systemd support enabled:
 39 | `./configure --enable-systemd`. Without this, on a recent Ubuntu, my tor service
 40 | was forcefully restarted (SIGINT interrupt) by systemd every five minutes.
 41 | 
 42 | If you want to install on your localsystem, run `make install`. For our case we
 43 | just want the tor binary at `src/app/tor`.
 44 | 
 45 | ## Use tor in TB and at a relay
 46 | Download and install a fresh Tor Browser (TB) from torproject.org. Make sure it
 47 | works. From the command line, relative to the folder created when you extracted
 48 | TB, run `./Browser/start-tor-browser --verbose` to get some basic log output.
 49 | Note the version of tor, in my case, `Tor 0.4.0.5 (git-bf071e34aa26e096)` as
 50 | part of TB 8.5.4. Shut down TB, copy the `tor` binary that you compiled earlier
 51 | and replace `Browser/TorBrowser/Tor/tor`. Start TB from the command line again,
 52 | you should see a different version, in my case `Tor 0.4.1.5
 53 | (git-439ca48989ece545)`.
 54 | 
 55 | The relay we run is also on linux, and `tor` is located at `/usr/bin/tor`. To
 56 | view relevant logs since last boot `sudo journalctl -b /usr/bin/tor`, where we
 57 | find `Tor 0.4.0.5 running on Linux`. Copy the locally compiled `tor` to the
 58 | relay at a temporary location and then make sure it's ownership and access
 59 | rights are identical to `/usr/bin/tor`. Next, shut down the running tor service
 60 | with `sudo service tor stop`, wait for it to stop (typically 30s), copy our
 61 | locally compiled tor to replace `/usr/bin/tor` then start the service again.
 62 | Checking the logs we see `or 0.4.1.5 (git-439ca48989ece545)`.
 63 | 
 64 | Repeatedly shutting down a relay is detrimental to the network and should be
 65 | avoided. Sorry about that.
 66 | 
 67 | We have one more step left before we move on the machine: configure TB to always
 68 | use our middle relay. Edit `Browser/TorBrowser/Data/Tor/torrc` and set
 69 | `MiddleNodes <fingerprint>`, where `<fingerprint>` is the fingerprint of the
 70 | relay. Start TB, visit a website, and manually confirm that the middle is used
 71 | by looking at the circuit display. 
 72 | 
 73 | ## Add a bare-bones APE padding machine
 74 | Now the fun part. We have several resources at our disposal (mind that links
 75 | might be broken in the future, just search for the headings):
 76 | - The official [Circuit Padding Developer
 77 |   Documentation](https://storm.torproject.org/shared/ChieH_sLU93313A2gopZYT3x2waJ41hz5Hn2uG1Uuh7).
 78 | - Notes we made on the [implementation of the circuit padding
 79 |   framework](https://github.com/pylls/padding-machines-for-tor/blob/master/notes/circuit-padding-framework.md).
 80 | - The implementation of the current circuit padding machines in tor:
 81 |   [circuitpadding.c](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.c)
 82 |   and
 83 |   [circuitpadding_machines.h](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.h).
 84 | 
 85 | Please consult the above links for details. Moving forward, the focus is to
 86 | describe what was done, not necessarily explaining all the details why. 
 87 | 
 88 | Since we plan to make changes to tor, create a new branch `git checkout -b
 89 | circuit-padding-ape-machine tor-0.4.1.5`. 
 90 | 
 91 | We start with declaring two functions, one for the machine at the client and one
 92 | at the relay, in `circuitpadding_machines.h`:
 93 | 
 94 | ```c
 95 | void circpad_machine_relay_wf_ape(smartlist_t *machines_sl);
 96 | void circpad_machine_client_wf_ape(smartlist_t *machines_sl);
 97 | ```
 98 | 
 99 | The definitions go into `circuitpadding_machines.c`:
100 | 
101 | ```c
102 | /**************** Adaptive Padding Early (APE) machine ****************/
103 | 
104 | /** 
105 |  * Create a relay-side padding machine based on the APE design. 
106 |  */
107 | void
108 | circpad_machine_relay_wf_ape(smartlist_t *machines_sl)
109 | {
110 |   circpad_machine_spec_t *relay_machine
111 |   = tor_malloc_zero(sizeof(circpad_machine_spec_t));
112 | 
113 |   relay_machine->name = "relay_wf_ape";
114 |   relay_machine->is_origin_side = 0; // relay-side
115 | 
116 |   // Pad to/from the middle relay, only when the circuit has streams
117 |   relay_machine->target_hopnum = 2;
118 |   relay_machine->conditions.min_hops = 2;
119 |   relay_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
120 | 
121 |   // limits to help guard against excessive padding
122 |   relay_machine->allowed_padding_count = 1;
123 |   relay_machine->max_padding_percent = 1;
124 | 
125 |   // one state to start with: START (-> END, never takes a slot in states)
126 |   circpad_machine_states_init(relay_machine, 1);
127 |   relay_machine->states[CIRCPAD_STATE_START].
128 |     next_state[CIRCPAD_EVENT_NONPADDING_SENT] =
129 |     CIRCPAD_STATE_END;
130 | 
131 |   // register the machine
132 |   relay_machine->machine_num = smartlist_len(machines_sl);
133 |   circpad_register_padding_machine(relay_machine, machines_sl);
134 |   
135 |   log_info(LD_CIRC,
136 |            "Registered relay WF APE padding machine (%u)",
137 |            relay_machine->machine_num);
138 | }
139 | 
140 | /** 
141 |  * Create a client-side padding machine based on the APE design. 
142 |  */
143 | void
144 | circpad_machine_client_wf_ape(smartlist_t *machines_sl)
145 | {
146 |     circpad_machine_spec_t *client_machine
147 |   = tor_malloc_zero(sizeof(circpad_machine_spec_t));
148 | 
149 |   client_machine->name = "client_wf_ape";
150 |   client_machine->is_origin_side = 1; // client-side
151 | 
152 |   /** Pad to/from the middle relay, only when the circuit has streams, and only
153 |   * for general purpose circuits (typical for web browsing)
154 |   */
155 |   client_machine->target_hopnum = 2;
156 |   client_machine->conditions.min_hops = 2;
157 |   client_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
158 |   client_machine->conditions.purpose_mask =
159 |     circpad_circ_purpose_to_mask(CIRCUIT_PURPOSE_C_GENERAL);
160 | 
161 |   // limits to help guard against excessive padding
162 |   client_machine->allowed_padding_count = 1;
163 |   client_machine->max_padding_percent = 1;
164 | 
165 |   // one state to start with: START (-> END, never takes a slot in states)
166 |   circpad_machine_states_init(client_machine, 1);
167 |   client_machine->states[CIRCPAD_STATE_START].
168 |     next_state[CIRCPAD_EVENT_NONPADDING_SENT] =
169 |     CIRCPAD_STATE_END;
170 | 
171 |   client_machine->machine_num = smartlist_len(machines_sl);
172 |   circpad_register_padding_machine(client_machine, machines_sl);
173 |   log_info(LD_CIRC,
174 |            "Registered client WF APE padding machine (%u)",
175 |            client_machine->machine_num);
176 | }
177 | ```
178 | 
179 | We also have to modify `circpad_machines_init()` in `circuitpadding.c` to
180 | register our machines:
181 | 
182 | ```c
183 |   /* Register machines for the APE WF defense */
184 |   circpad_machine_client_wf_ape(origin_padding_machines);
185 |   circpad_machine_relay_wf_ape(relay_padding_machines);
186 | ```
187 | 
188 | We run `make` to get a new `tor` binary and copy it to our local TB. 
189 | 
190 | ## Run the machine
191 | To be able
192 | to view circuit info events in the console as we launch TB, we add `Log
193 | [circ]info notice stdout` to `torrc` of TB. 
194 | 
195 | Running TB to visit example.com we first find in the log:
196 | 
197 | ```
198 | Aug 30 18:36:43.000 [info] circpad_machine_client_hide_intro_circuits(): Registered client intro point hiding padding machine (0)
199 | Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_intro_circuits(): Registered relay intro circuit hiding padding machine (0)
200 | Aug 30 18:36:43.000 [info] circpad_machine_client_hide_rend_circuits(): Registered client rendezvous circuit hiding padding machine (1)
201 | Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_rend_circuits(): Registered relay rendezvous circuit hiding padding machine (1)
202 | Aug 30 18:36:43.000 [info] circpad_machine_client_wf_ape(): Registered client WF APE padding machine (2)
203 | Aug 30 18:36:43.000 [info] circpad_machine_relay_wf_ape(): Registered relay WF APE padding machine (2)
204 | ```
205 | 
206 | All good, our machine is running. Looking further we find:
207 | 
208 | ```
209 | Aug 30 18:36:55.000 [info] circpad_setup_machine_on_circ(): Registering machine client_wf_ape to origin circ 2 (5)
210 | Aug 30 18:36:55.000 [info] circpad_node_supports_padding(): Checking padding: supported
211 | Aug 30 18:36:55.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 2 (5), command 2
212 | Aug 30 18:36:55.000 [info] circpad_machine_spec_transition(): Circuit 2 circpad machine 0 transitioning from 0 to 65535
213 | Aug 30 18:36:55.000 [info] circpad_machine_spec_transitioned_to_end(): Padding machine in end state on circuit 2 (5)
214 | Aug 30 18:36:55.000 [info] circpad_circuit_machineinfo_free_idx(): Freeing padding info idx 0 on circuit 2 (5)
215 | Aug 30 18:36:55.000 [info] circpad_handle_padding_negotiated(): Middle node did not accept our padding request on circuit 2 (5)
216 | ```
217 | We see that our middle support padding (since we upgraded to tor-0.4.1.5), that
218 | we attempt to negotiate, our machine starts on the client, transitions to the
219 | end state, and is freed. The last line shows that the middle doesn't have a
220 | padding machine that can run. 
221 | 
222 | Next, we follow the same steps as earlier and replace the modified `tor` at our
223 | middle relay. We don't update the logging there to avoid logging on the info
224 | level on the live network. Looking at the client log again we see that
225 | negotiation works as before except for the last line: it's missing, so the
226 | machine is running at the middle as well.  
227 | 
228 | ## Implementing the APE state machine
229 | 
230 | Porting is fairly straightforward: define the states for all machines, add two
231 | more machines (for the receive portion of WTFP-PAD, beyond AP), and pick
232 | reasonable parameters for the distributions (I completely winged it now, as when
233 | implementing APE). The [circuit-padding-ape-machine
234 | branch](https://github.com/pylls/tor/tree/circuit-padding-ape-machine) contains
235 | the commits for the full machines with plenty of comments. 
236 | 
237 | Some comments on the process:
238 | 
239 | - `tor-0.4.1.5` does not support two machines on the same circuit, the following
240 |   fix has to be made: https://trac.torproject.org/projects/tor/ticket/31111 .
241 |   The good news is that everything else seems to work after the small change in
242 |   the fix. 
243 | - APE randomizes its distributions. Currently, this can only be done during
244 |   start of `tor`. This makes sense in the censorship circumvention setting
245 |   (`obfs4`), less so for WF defenses: further randomizing each circuit is likely
246 |   a PITA for attackers with few downsides.
247 | - it was annoying to figure out that the lack of systemd support in my compiled
248 |   tor caused systemd to interrupt (SIGINT) my tor process at the middle relay
249 |   every five minutes. Updated build steps above to hopefully save others the
250 |   pain.
251 | - there's for sure some bug on relays when sending padding cells too early (?).
252 |   It can happen with some probability with the APE implementation due to
253 |   `circpad_machine_relay_wf_ape_send()`. Will investigate next.
254 | - Moving the registration of machines from the definition of the machines to
255 |   `circpad_machines_init()` makes sense, as suggested in the circuit padding doc
256 |   draft.
257 | 
258 | Remember that APE is just a proof-of-concept and we make zero claims about its
259 | ability to withstand WF attacks, in particular those based on deep learning.
260 | 


--------------------------------------------------------------------------------