├── docker ├── secret │ └── .gitkeep ├── control │ ├── .gitignore │ ├── bashrc │ ├── init.sh │ └── Dockerfile ├── .gitignore ├── bin │ ├── console │ ├── web │ ├── build-docker-compose │ └── up ├── template │ ├── depends.yml │ ├── db.yml │ └── docker-compose.yml ├── docker-compose.dev.yml ├── node │ ├── setup-jepsen.sh │ └── Dockerfile └── README.md ├── charybdefs ├── test │ ├── .gitignore │ └── jepsen │ │ └── charybdefs │ │ └── remote_test.clj ├── .gitignore ├── README.md ├── project.clj └── src │ └── jepsen │ └── charybdefs.clj ├── jepsen ├── src │ └── jepsen │ │ ├── adya.clj │ │ ├── repl.clj │ │ ├── os.clj │ │ ├── report.clj │ │ ├── tests │ │ ├── cycle.clj │ │ ├── cycle │ │ │ ├── wr.clj │ │ │ └── append.clj │ │ ├── linearizable_register.clj │ │ ├── adya.clj │ │ ├── causal_reverse.clj │ │ └── causal.clj │ │ ├── codec.clj │ │ ├── net │ │ └── proto.clj │ │ ├── os │ │ ├── ubuntu.clj │ │ ├── smartos.clj │ │ ├── centos.clj │ │ └── debian.clj │ │ ├── tests.clj │ │ ├── nemesis │ │ └── membership │ │ │ └── state.clj │ │ ├── store │ │ └── FileOffsetOutputStream.java │ │ ├── faketime.clj │ │ ├── control │ │ ├── net.clj │ │ ├── retry.clj │ │ ├── docker.clj │ │ ├── k8s.clj │ │ ├── clj_ssh.clj │ │ ├── scp.clj │ │ └── core.clj │ │ ├── checker │ │ └── clock.clj │ │ ├── client.clj │ │ └── reconnect.clj ├── resources │ ├── log4j.properties │ ├── bump-time.c │ └── strobe-time.c ├── test │ └── jepsen │ │ ├── control │ │ ├── net_test.clj │ │ └── util_test.clj │ │ ├── cli_test.clj │ │ ├── tests │ │ ├── long_fork_test.clj │ │ └── causal_reverse_test.clj │ │ ├── nemesis │ │ ├── time_test.clj │ │ └── combined_test.clj │ │ ├── common_test.clj │ │ ├── db_test.clj │ │ ├── independent_test.clj │ │ ├── checker │ │ └── timeline_test.clj │ │ ├── generator_test.clj │ │ ├── fs_cache_test.clj │ │ ├── lazyfs_test.clj │ │ ├── db │ │ └── watchdog_test.clj │ │ └── store_test.clj ├── .eastwood.clj └── project.clj ├── txn ├── doc │ └── intro.md ├── .gitignore ├── project.clj ├── src │ └── jepsen │ │ ├── txn │ │ └── micro_op.clj │ │ └── txn.clj ├── CHANGELOG.md ├── test │ └── jepsen │ │ └── txn_test.clj └── README.md ├── antithesis ├── doc │ └── intro.md ├── .gitignore ├── CHANGELOG.md ├── project.clj ├── src │ └── jepsen │ │ └── antithesis │ │ └── Random.java └── README.md ├── generator ├── doc │ └── intro.md ├── .gitignore ├── CHANGELOG.md ├── test │ └── jepsen │ │ └── generator │ │ ├── translation_table_test.clj │ │ └── context_test.clj ├── project.clj ├── README.md └── src │ └── jepsen │ └── generator │ └── translation_table.clj ├── doc ├── color.md ├── tutorial │ └── index.md └── plan.md ├── .gitignore ├── .travis.yml └── contributing.md /docker/secret/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docker/control/.gitignore: -------------------------------------------------------------------------------- 1 | jepsen -------------------------------------------------------------------------------- /charybdefs/test/.gitignore: -------------------------------------------------------------------------------- 1 | config.edn 2 | -------------------------------------------------------------------------------- /docker/.gitignore: -------------------------------------------------------------------------------- 1 | secret/* 2 | !.gitkeep 3 | ./docker-compose.yml 4 | -------------------------------------------------------------------------------- /docker/bin/console: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | docker exec -it jepsen-control bash 3 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/adya.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.adya 2 | "Moved to jepsen.tests.adya.") 3 | -------------------------------------------------------------------------------- /docker/template/depends.yml: -------------------------------------------------------------------------------- 1 | n%%N%%: 2 | condition: service_healthy 3 | -------------------------------------------------------------------------------- /docker/template/db.yml: -------------------------------------------------------------------------------- 1 | n%%N%%: 2 | << : *default-node 3 | container_name: jepsen-n%%N%% 4 | hostname: n%%N%% 5 | -------------------------------------------------------------------------------- /txn/doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to jepsen.txn 2 | 3 | TODO: write [great documentation](http://jacobian.org/writing/what-to-write/) 4 | -------------------------------------------------------------------------------- /docker/bin/web: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | PORT=$(docker port jepsen-control 8080 | cut -d : -f 2) 4 | xdg-open "http://localhost:$PORT" 5 | -------------------------------------------------------------------------------- /antithesis/doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to antithesis 2 | 3 | TODO: write [great documentation](https://jacobian.org/writing/what-to-write/) 4 | -------------------------------------------------------------------------------- /generator/doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to jepsen.generator 2 | 3 | TODO: write [great documentation](https://jacobian.org/writing/what-to-write/) 4 | -------------------------------------------------------------------------------- /docker/docker-compose.dev.yml: -------------------------------------------------------------------------------- 1 | services: 2 | control: 3 | volumes: 4 | - ${JEPSEN_ROOT}:/jepsen # Mounts $JEPSEN_ROOT on host to /jepsen control container 5 | -------------------------------------------------------------------------------- /txn/.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /classes 3 | /checkouts 4 | pom.xml 5 | pom.xml.asc 6 | *.jar 7 | *.class 8 | /.lein-* 9 | /.nrepl-port 10 | .hgignore 11 | .hg/ 12 | -------------------------------------------------------------------------------- /charybdefs/.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /classes 3 | /checkouts 4 | pom.xml 5 | pom.xml.asc 6 | *.jar 7 | *.class 8 | /.lein-* 9 | /.nrepl-port 10 | .hgignore 11 | .hg/ 12 | -------------------------------------------------------------------------------- /doc/color.md: -------------------------------------------------------------------------------- 1 | Color scheme: 2 | 3 | Light: 4 | 5 | ok #6DB6FE 6 | info #FFAA26 7 | fail #FEB5DA 8 | 9 | Dark: 10 | 11 | ok: #81BFFC 12 | info: #FFA400 13 | fail: #FF1E90 14 | -------------------------------------------------------------------------------- /antithesis/.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /classes 3 | /checkouts 4 | profiles.clj 5 | pom.xml 6 | pom.xml.asc 7 | *.jar 8 | *.class 9 | /.lein-* 10 | /.nrepl-port 11 | /.prepl-port 12 | .hgignore 13 | .hg/ 14 | -------------------------------------------------------------------------------- /generator/.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /classes 3 | /checkouts 4 | profiles.clj 5 | pom.xml 6 | pom.xml.asc 7 | *.jar 8 | *.class 9 | /.lein-* 10 | /.nrepl-port 11 | /.prepl-port 12 | .hgignore 13 | .hg/ 14 | -------------------------------------------------------------------------------- /charybdefs/README.md: -------------------------------------------------------------------------------- 1 | # charybdefs 2 | 3 | A wrapper around [CharybdeFS](https://github.com/scylladb/charybdefs) for use 4 | in a jepsen nemesis. 5 | 6 | ## Usage 7 | 8 | TODO 9 | 10 | ## License 11 | 12 | Distributed under the Eclipse Public License either version 1.0 or (at 13 | your option) any later version. 14 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/repl.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.repl 2 | "Helper functions for mucking around with tests!" 3 | (:require [jepsen [history :as h] 4 | [report :as report] 5 | [store :as store]])) 6 | 7 | (defn latest-test 8 | "Returns the most recently run test" 9 | [] 10 | (store/latest)) 11 | -------------------------------------------------------------------------------- /charybdefs/project.clj: -------------------------------------------------------------------------------- 1 | (defproject jepsen-charybdefs "0.1.0-SNAPSHOT" 2 | :description "charybdefs wrapper for use in jepsen" 3 | :url "https://github.com/jepsen.io/jepsen" 4 | :license {:name "Eclipse Public License" 5 | :url "http://www.eclipse.org/legal/epl-v10.html"} 6 | :dependencies [[org.clojure/clojure "1.8.0"] 7 | [jepsen "0.1.6"] 8 | [yogthos/config "0.8"]]) 9 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/os.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.os 2 | "Controls operating system setup and teardown.") 3 | 4 | (defprotocol OS 5 | (setup! [os test node] "Set up the operating system on this particular 6 | node.") 7 | (teardown! [os test node] "Tear down the operating system on this particular 8 | node.")) 9 | 10 | (def noop 11 | "Does nothing" 12 | (reify OS 13 | (setup! [os test node]) 14 | (teardown! [os test node]))) 15 | -------------------------------------------------------------------------------- /docker/control/bashrc: -------------------------------------------------------------------------------- 1 | eval $(ssh-agent) &> /dev/null 2 | ssh-add /root/.ssh/id_rsa &> /dev/null 3 | 4 | cat <> /var/log/jepsen-setup.log 5 | # We do a little dance to get our hostname (random hex), IP, then use DNS to 6 | # get a proper container name. 7 | #HOSTNAME=`hostname` 8 | #IP=`getent hosts "${HOSTNAME}" | awk '{ print $1 }'` 9 | #NAME=`dig +short -x "${IP}" | cut -f 1 -d .` 10 | #echo "${NAME}" >> /var/jepsen/shared/nodes 11 | echo `hostname` >> /var/jepsen/shared/nodes 12 | 13 | # We make sure that root's authorized keys are ready 14 | echo "Setting up root's authorized_keys" >> /var/log/jepsen-setup.log 15 | mkdir /root/.ssh 16 | chmod 700 /root/.ssh 17 | cp /run/secrets/authorized_keys /root/.ssh/ 18 | chmod 600 /root/.ssh/authorized_keys 19 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/cli_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.cli-test 2 | (:require [clojure [test :refer :all]] 3 | [jepsen [cli :as cli]])) 4 | 5 | (deftest without-default-for-test 6 | (is (= [["-w" "--workload NAME" "What workload should we run?" 7 | :parse-fn :foo] 8 | ["-a" "--another" "Some other option" 9 | :default false] 10 | [nil "--[no-]flag" "A boolean flag"]] 11 | (cli/without-defaults-for 12 | [:workload :flag] 13 | [["-w" "--workload NAME" "What workload should we run?" 14 | :default :append 15 | :parse-fn :foo] 16 | ["-a" "--another" "Some other option" 17 | :default false] 18 | [nil "--[no-]flag" "A boolean flag" 19 | :default true]])))) 20 | 21 | 22 | -------------------------------------------------------------------------------- /contributing.md: -------------------------------------------------------------------------------- 1 | # Contributing to Jepsen 2 | 3 | Hi there, and thanks for helping make Jepsen better! I've got just one request: 4 | start your commit messages with the *part* of Jepsen you're changing. For 5 | instance, if I made a change to the MongoDB causal consistency tests: 6 | 7 | > MongoDB causal: fix a bug when analyzing zero-length histories 8 | 9 | Namespaces are cool too! 10 | 11 | > jepsen.os.debian: fix libzip package name for debian stretch 12 | 13 | If you're making a chance to the core Jepsen library, as opposed to a specific 14 | database test, you can be more concise: 15 | 16 | > add test for single nemesis events 17 | 18 | Jepsen's a big project with lots of moving parts, and it can be confusing to 19 | read the commit logs. Giving a bit of context makes my life a lot easier. 20 | 21 | Thanks! 22 | -------------------------------------------------------------------------------- /doc/tutorial/index.md: -------------------------------------------------------------------------------- 1 | # Tutorial 2 | 3 | This tutorial will walk you through writing a Jepsen test from scratch. It is 4 | also the basis for a [training class](https://jepsen.io/training) offered by 5 | Jepsen. 6 | 7 | If you aren't familiar with the Clojure language, we recommend you start with 8 | [Clojure for the Brave and True](http://www.braveclojure.com/), [Clojure From 9 | the Ground Up](https://aphyr.com/posts/301-clojure-from-the-ground-up-welcome), 10 | or any guide that works for you. 11 | 12 | 1. [Test Scaffolding](01-scaffolding.md) 13 | 2. [Database Automation](02-db.md) 14 | 3. [Writing a Client](03-client.md) 15 | 4. [Checking Correctness](04-checker.md) 16 | 5. [Introducing Faults](05-nemesis.md) 17 | 6. [Refining Tests](06-refining.md) 18 | 7. [Tuning with Parameters](07-parameters.md) 19 | 8. [Adding a Set Test](08-set.md) 20 | -------------------------------------------------------------------------------- /txn/CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Change Log 2 | All notable changes to this project will be documented in this file. This change log follows the conventions of [keepachangelog.com](http://keepachangelog.com/). 3 | 4 | ## [Unreleased] 5 | ### Changed 6 | - Add a new arity to `make-widget-async` to provide a different widget shape. 7 | 8 | ## [0.1.1] - 2018-03-28 9 | ### Changed 10 | - Documentation on how to make the widgets. 11 | 12 | ### Removed 13 | - `make-widget-sync` - we're all async, all the time. 14 | 15 | ### Fixed 16 | - Fixed widget maker to keep working when daylight savings switches over. 17 | 18 | ## 0.1.0 - 2018-03-28 19 | ### Added 20 | - Files from the new template. 21 | - Widget maker public API - `make-widget-sync`. 22 | 23 | [Unreleased]: https://github.com/your-name/jepsen.txn/compare/0.1.1...HEAD 24 | [0.1.1]: https://github.com/your-name/jepsen.txn/compare/0.1.0...0.1.1 25 | -------------------------------------------------------------------------------- /txn/test/jepsen/txn_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.txn-test 2 | (:require [clojure.test :refer :all] 3 | [criterium.core :refer [quick-bench bench with-progress-reporting]] 4 | [jepsen.txn :refer :all])) 5 | 6 | (deftest ext-reads-test 7 | (testing "no ext reads" 8 | (is (= {} (ext-reads []))) 9 | (is (= {} (ext-reads [[:w :x 2] [:r :x 2]])))) 10 | 11 | (testing "some reads" 12 | (is (= {:x 2} (ext-reads [[:w :y 1] [:r :x 2] [:w :x 3] [:r :x 3]]))))) 13 | 14 | (deftest ext-writes-test 15 | (testing "no ext writes" 16 | (is (= {} (ext-writes []))) 17 | (is (= {} (ext-writes [[:r :x 1]])))) 18 | 19 | (testing "ext writes" 20 | (is (= {:x 1 :y 2} (ext-writes [[:w :x 1] [:r :y 0] [:w :y 1] [:w :y 2]]))))) 21 | 22 | (deftest ^:perf ext-reads-perf 23 | (with-progress-reporting 24 | (bench (ext-reads [[:w :y 1] [:r :x 2] [:w :x 3] [:r :x 3]])))) 25 | -------------------------------------------------------------------------------- /antithesis/CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Change Log 2 | All notable changes to this project will be documented in this file. This change log follows the conventions of [keepachangelog.com](https://keepachangelog.com/). 3 | 4 | ## [Unreleased] 5 | ### Changed 6 | - Add a new arity to `make-widget-async` to provide a different widget shape. 7 | 8 | ## [0.1.1] - 2025-10-22 9 | ### Changed 10 | - Documentation on how to make the widgets. 11 | 12 | ### Removed 13 | - `make-widget-sync` - we're all async, all the time. 14 | 15 | ### Fixed 16 | - Fixed widget maker to keep working when daylight savings switches over. 17 | 18 | ## 0.1.0 - 2025-10-22 19 | ### Added 20 | - Files from the new template. 21 | - Widget maker public API - `make-widget-sync`. 22 | 23 | [Unreleased]: https://sourcehost.site/your-name/antithesis/compare/0.1.1...HEAD 24 | [0.1.1]: https://sourcehost.site/your-name/antithesis/compare/0.1.0...0.1.1 25 | -------------------------------------------------------------------------------- /generator/CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Change Log 2 | All notable changes to this project will be documented in this file. This change log follows the conventions of [keepachangelog.com](https://keepachangelog.com/). 3 | 4 | ## [Unreleased] 5 | ### Changed 6 | - Add a new arity to `make-widget-async` to provide a different widget shape. 7 | 8 | ## [0.1.1] - 2025-09-13 9 | ### Changed 10 | - Documentation on how to make the widgets. 11 | 12 | ### Removed 13 | - `make-widget-sync` - we're all async, all the time. 14 | 15 | ### Fixed 16 | - Fixed widget maker to keep working when daylight savings switches over. 17 | 18 | ## 0.1.0 - 2025-09-13 19 | ### Added 20 | - Files from the new template. 21 | - Widget maker public API - `make-widget-sync`. 22 | 23 | [Unreleased]: https://sourcehost.site/your-name/jepsen.generator/compare/0.1.1...HEAD 24 | [0.1.1]: https://sourcehost.site/your-name/jepsen.generator/compare/0.1.0...0.1.1 25 | -------------------------------------------------------------------------------- /jepsen/.eastwood.clj: -------------------------------------------------------------------------------- 1 | (disable-warning 2 | {:linter :constant-test 3 | :for-macro 'dom-top.core/assert+ 4 | :if-inside-macroexpansion-of #{'clojure.core/let} 5 | :within-depth nil 6 | :reason "The codegen performed by dom-top.core/assert+ checks to see if the 7 | thrown expression is a map at runtime."}) 8 | 9 | (disable-warning 10 | {:linter :unused-ret-vals 11 | :for-macro 'jepsen.util/letr 12 | :if-inside-macroexpansion-of #{'clojure.test/deftest} 13 | :within-depth nil 14 | :reason "We want this intermediate form to go unused! That's what we're 15 | testing for."}) 16 | 17 | (disable-warning 18 | {:linter :unused-ret-vals 19 | :for-macro 'clojure.pprint/pprint-length-loop 20 | :if-inside-macroexpansion-of #{'clojure.core/defmethod} 21 | :within-depth nil 22 | :reason "It's a goddamn pretty printer, the whole point is side effects, come the fuck on"}) 23 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/codec.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.codec 2 | "Serializes and deserializes objects to/from bytes." 3 | (:require [clojure.edn :as edn] 4 | [byte-streams :as b]) 5 | (:import (java.io ByteArrayInputStream 6 | InputStreamReader 7 | PushbackReader))) 8 | 9 | (defn encode 10 | "Serialize an object to bytes." 11 | [o] 12 | (if (nil? o) 13 | (byte-array 0) 14 | (binding [*print-dup* false] 15 | (-> o pr-str .getBytes)))) 16 | 17 | (defn decode 18 | "Deserialize bytes to an object." 19 | [bytes] 20 | (if (nil? bytes) 21 | nil 22 | (let [bytes ^bytes (b/to-byte-array bytes)] 23 | (if (zero? (alength bytes)) 24 | nil 25 | (with-open [s (ByteArrayInputStream. bytes) 26 | i (InputStreamReader. s) 27 | r (PushbackReader. i)] 28 | (binding [*read-eval* false] 29 | (edn/read r))))))) 30 | -------------------------------------------------------------------------------- /generator/test/jepsen/generator/translation_table_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.generator.translation-table-test 2 | (:require [clojure [test :refer :all] 3 | [pprint :refer :all]] 4 | [jepsen.generator.translation-table :refer :all]) 5 | (:import (java.util BitSet))) 6 | 7 | (deftest basic-test 8 | (let [t (translation-table 2 [:nemesis])] 9 | (is (= [0 1 :nemesis] (all-names t))) 10 | (is (= 3 (thread-count t))) 11 | (testing "name->index" 12 | (is (= 0 (name->index t 0))) 13 | (is (= 1 (name->index t 1))) 14 | (is (= 2 (name->index t :nemesis)))) 15 | (testing "index->name" 16 | (is (= 0 (index->name t 0))) 17 | (is (= 1 (index->name t 1))) 18 | (is (= :nemesis (index->name t 2)))) 19 | (testing "bitset slices" 20 | (let [bs (doto (BitSet.) (.set 1) (.set 2))] 21 | (is (= #{1 :nemesis} (set (indices->names t bs)))) 22 | (is (= bs (names->indices t #{1 :nemesis}))))))) 23 | -------------------------------------------------------------------------------- /docker/bin/build-docker-compose: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # Builds a docker-compose file. You'd THINK we could do this with `replicas` 4 | # but nooooooo, down that path lies madness. Instead we're going to do some 5 | # janky templating with sed and awk. I am so, so sorry. 6 | 7 | # Takes a number of nodes to generate a file for, and emits a file 8 | # `docker-compose.yml`. 9 | 10 | NODE_COUNT=$1 11 | 12 | DEPS="" 13 | DBS="" 14 | 15 | # For each node 16 | for ((n=1;n<=NODE_COUNT;n++)); do 17 | # Build up deps for control 18 | LINE=`cat template/depends.yml | sed s/%%N%%/${n}/g` 19 | DEPS="${DEPS}${LINE}"$'\n' 20 | 21 | # Build up DB service 22 | DB=`cat template/db.yml | sed s/%%N%%/${n}/g` 23 | DBS="${DBS}${DB}"$'\n' 24 | done 25 | 26 | # Build docker-compose file 27 | export DEPS 28 | export DBS 29 | cat template/docker-compose.yml | 30 | awk ' {gsub(/%%DEPS%%/, ENVIRON["DEPS"]); print} ' | 31 | awk ' {gsub(/%%DBS%%/, ENVIRON["DBS"]); print} ' \ 32 | > docker-compose.yml 33 | -------------------------------------------------------------------------------- /docker/control/init.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | : "${SSH_PRIVATE_KEY?SSH_PRIVATE_KEY is empty, please use up.sh}" 4 | : "${SSH_PUBLIC_KEY?SSH_PUBLIC_KEY is empty, please use up.sh}" 5 | 6 | if [ ! -f ~/.ssh/known_hosts ]; then 7 | mkdir -m 700 ~/.ssh 8 | echo $SSH_PRIVATE_KEY | perl -p -e 's/↩/\n/g' > ~/.ssh/id_rsa 9 | chmod 600 ~/.ssh/id_rsa 10 | echo $SSH_PUBLIC_KEY > ~/.ssh/id_rsa.pub 11 | echo > ~/.ssh/known_hosts 12 | # Get nodes list 13 | sort -V /var/jepsen/shared/nodes > ~/nodes 14 | # Scan SSH keys 15 | while read node; do 16 | ssh-keyscan -t rsa $node >> ~/.ssh/known_hosts 17 | ssh-keyscan -t ed25519 $node >> ~/.ssh/known_hosts 18 | done <~/nodes 19 | fi 20 | 21 | # TODO: assert that SSH_PRIVATE_KEY==~/.ssh/id_rsa 22 | 23 | cat < booleans = Arrays.asList(true, false); 11 | 12 | private final com.antithesis.sdk.Random r; 13 | 14 | public Random() { 15 | super(); 16 | this.r = new com.antithesis.sdk.Random(); 17 | } 18 | 19 | public long nextLong() { 20 | return r.getRandom(); 21 | } 22 | 23 | public double nextDouble() { 24 | // Adapted from https://developer.classpath.org/doc/java/util/Random-source.html nextDouble 25 | // We're generating doubles in the range 0..1, so we have only the 53 bits 26 | // of mantissa to generate 27 | final long mantissa = r.getRandom() >>> (64-53); 28 | return mantissa / (double) (1L << 53); 29 | } 30 | 31 | public boolean nextBoolean() { 32 | return r.randomChoice(booleans); 33 | } 34 | 35 | public T randomChoice(List list) { 36 | return r.randomChoice(list); 37 | } 38 | } 39 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/nemesis/time_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.nemesis.time-test 2 | (:require [clojure [pprint :refer [pprint]] 3 | [test :refer :all]] 4 | [jepsen [core :as jepsen] 5 | [common-test :refer [quiet-logging]] 6 | [generator :as gen] 7 | [tests :as tests]] 8 | [jepsen.nemesis.time :as nt])) 9 | 10 | (use-fixtures :once quiet-logging) 11 | 12 | (deftest ^:integration bump-clock-test 13 | ; This isn't going to work on containers, but I at least want to test that it 14 | ; uploads and compiles the binary. 15 | (let [test (assoc tests/noop-test 16 | :name "bump-clock-test" 17 | :nemesis (nt/clock-nemesis) 18 | :generator (gen/nemesis 19 | (gen/limit 1 nt/bump-gen))) 20 | test' (jepsen/run! test) 21 | h (:history test')] 22 | (is (= 2 (count h))) 23 | ; If you ever run these tests on actual nodes where you CAN change the 24 | ; clock, you'll want an alternative condition here. 25 | (is (re-find #"clock_settime: Operation not permitted" 26 | (:err (:data (:exception (h 1)))))))) 27 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/common_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.common-test 2 | "Support functions for writing tests." 3 | (:require [clojure.tools.logging :refer :all] 4 | [jepsen.store :as store] 5 | [unilog.config :as unilog])) 6 | 7 | (defn quiet-logging 8 | "Quiets down logging" 9 | [f] 10 | (unilog/start-logging! 11 | {:level "info" 12 | :console false 13 | :appenders [store/console-appender] 14 | :overrides (merge store/default-logging-overrides 15 | {"clj-ssh.ssh" :error 16 | "jepsen.db" :error 17 | "jepsen.core" :error 18 | "jepsen.control.util" :error 19 | "jepsen.independent" :error 20 | "jepsen.generator" :error 21 | "jepsen.lazyfs" :error 22 | "jepsen.os.debian" :error 23 | "jepsen.store" :error 24 | "jepsen.tests.kafka" :error 25 | "jepsen.util" :error 26 | "net.schmizz.sshj" :error 27 | })}) 28 | (f) 29 | (store/stop-logging!)) 30 | -------------------------------------------------------------------------------- /charybdefs/test/jepsen/charybdefs/remote_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.charybdefs.remote-test 2 | (:require [clojure.test :refer :all] 3 | [jepsen.control :as c] 4 | [jepsen.charybdefs :as charybdefs] 5 | [config.core :refer [env]])) 6 | 7 | ;;; To run these tests, create a file config.edn in ../.. (the 'test' directory) 8 | ;;; containing at least: 9 | ;;; {:hostname "foo"} 10 | ;;; This must name a host you can ssh to without a password, and have passwordless 11 | ;;; sudo on. It must run debian or ubuntu (tested with ubuntu 16.04). 12 | ;;; Other ssh options may also be used. 13 | (use-fixtures :each 14 | (fn [f] 15 | (if (not (:hostname env)) 16 | (throw (RuntimeException. "hostname is required in config.edn"))) 17 | (c/with-ssh (select-keys env [:private-key-path :strict-host-key-checking :username]) 18 | (c/on (:hostname env) 19 | (charybdefs/install!) 20 | (f))))) 21 | 22 | (deftest break-fix 23 | (testing "break the disk and then fix it" 24 | (let [filename "/faulty/foo"] 25 | (c/exec :touch filename) 26 | (charybdefs/break-all) 27 | (is (thrown? RuntimeException (c/exec :cat filename))) 28 | (charybdefs/clear) 29 | (is "" (c/exec :cat filename))))) 30 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/net/proto.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.net.proto 2 | "Protocols for network manipulation. High-level functions live in 3 | jepsen.net.") 4 | 5 | (defprotocol Net 6 | (drop! [net test src dest] 7 | "Drop traffic from src to dest.") 8 | (heal! [net test] 9 | "End all traffic drops and restores network to fast operation.") 10 | (slow! [net test] 11 | [net test opts] 12 | "Delays network packets with options: 13 | 14 | ```clj 15 | {:mean ; (in ms) 16 | :variance ; (in ms) 17 | :distribution} ; (e.g. :normal) 18 | ```") 19 | (flaky! [net test] 20 | "Introduces randomized packet loss") 21 | (fast! [net test] 22 | "Removes packet loss and delays.") 23 | (shape! [net test nodes behavior] 24 | "Shapes network behavior, 25 | i.e. packet delay, loss, corruption, duplication, reordering, and rate 26 | for the given nodes.")) 27 | 28 | (defprotocol PartitionAll 29 | "This optional protocol provides support for making multiple network changes 30 | in a single call. If you don't support this protocol, we'll use drop! 31 | instead." 32 | (drop-all! [net test grudge] 33 | "Takes a grudge: a map of nodes to collections of nodes they 34 | should drop messages from, and makes the appropriate changes to 35 | the network.")) 36 | -------------------------------------------------------------------------------- /generator/README.md: -------------------------------------------------------------------------------- 1 | # jepsen.generator 2 | 3 | This library provides the compositional generator system at the heart of 4 | [Jepsen](https://jepsen.io). Generators produce a series of operations Jepsen 5 | would like to perform against a system, like "Set key `x` to 3". They also 6 | react to operations as they happen. For example, if the write fails, the 7 | generator could decide to retry it. 8 | 9 | In addition to the generators themselves, this library provides: 10 | 11 | - `jepsen.random`: Pluggable random values 12 | - `jepsen.generator.test`: Helpers for testing generators 13 | - `jepsen.generator.context`: A high-performance, pure data structure which 14 | keeps track of the state used by generators 15 | - `jepsen.generator.translation-table`: Maps worker threads to integers 16 | 17 | ## License 18 | 19 | Copyright © Jepsen, LLC 20 | 21 | This program and the accompanying materials are made available under the 22 | terms of the Eclipse Public License 2.0 which is available at 23 | https://www.eclipse.org/legal/epl-2.0. 24 | 25 | This Source Code may also be made available under the following Secondary 26 | Licenses when the conditions for such availability set forth in the Eclipse 27 | Public License, v. 2.0 are satisfied: GNU General Public License as published by 28 | the Free Software Foundation, either version 2 of the License, or (at your 29 | option) any later version, with the GNU Classpath Exception which is available 30 | at https://www.gnu.org/software/classpath/license.html. 31 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/os/ubuntu.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.os.ubuntu 2 | "Common tasks for Ubuntu boxes. Tested against Ubuntu 18.04." 3 | (:require [clojure.set :as set] 4 | [clojure.tools.logging :refer [info]] 5 | [jepsen.util :refer [meh]] 6 | [jepsen.os :as os] 7 | [jepsen.control :as c :refer [|]] 8 | [jepsen.control.util :as cu] 9 | [jepsen.net :as net] 10 | [jepsen.os.debian :as debian] 11 | [clojure.string :as str])) 12 | 13 | (deftype Ubuntu [] 14 | os/OS 15 | (setup! [_ test node] 16 | (info node "setting up ubuntu") 17 | 18 | (debian/setup-hostfile!) 19 | 20 | (debian/maybe-update!) 21 | 22 | (c/su 23 | ; Packages! 24 | (debian/install [:apt-transport-https 25 | :wget 26 | :curl 27 | :vim 28 | :man-db 29 | :faketime 30 | :ntpdate 31 | :unzip 32 | :iptables 33 | :psmisc 34 | :tar 35 | :bzip2 36 | :iputils-ping 37 | :iproute2 38 | :rsyslog 39 | :sudo 40 | :logrotate])) 41 | 42 | (meh (net/heal! (:net test) test))) 43 | 44 | (teardown! [_ test node])) 45 | 46 | (def os "An implementation of the Ubuntu OS." (Ubuntu.)) 47 | -------------------------------------------------------------------------------- /docker/control/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM jgoerzen/debian-base-minimal:bookworm as debian-addons 2 | FROM debian:bookworm-slim 3 | 4 | COPY --from=debian-addons /usr/local/preinit/ /usr/local/preinit/ 5 | COPY --from=debian-addons /usr/local/bin/ /usr/local/bin/ 6 | COPY --from=debian-addons /usr/local/debian-base-setup/ /usr/local/debian-base-setup/ 7 | 8 | RUN run-parts --exit-on-error --verbose /usr/local/debian-base-setup 9 | 10 | ENV container=docker 11 | STOPSIGNAL SIGRTMIN+3 12 | 13 | ENV LEIN_ROOT true 14 | 15 | # 16 | # Jepsen dependencies 17 | # 18 | RUN apt-get -qy update && \ 19 | apt-get -qy install \ 20 | curl dos2unix emacs git gnuplot graphviz htop iputils-ping libjna-java pssh screen vim wget && \ 21 | curl -L https://github.com/adoptium/temurin21-binaries/releases/download/jdk-21.0.8%2B9/OpenJDK21U-jdk_x64_linux_hotspot_21.0.8_9.tar.gz | tar -xz --strip-components=1 -C /usr/local/ 22 | 23 | RUN wget https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein && \ 24 | mv lein /usr/bin && \ 25 | chmod +x /usr/bin/lein && \ 26 | lein self-install 27 | 28 | # without --dev flag up.sh copies jepsen to these subfolders 29 | # with --dev flag they are empty until mounted 30 | COPY jepsen/jepsen /jepsen/jepsen/ 31 | RUN if [ -f /jepsen/jepsen/project.clj ]; then cd /jepsen/jepsen && lein install; fi 32 | COPY jepsen /jepsen/ 33 | 34 | ADD ./bashrc /root/.bashrc 35 | ADD ./init.sh /init.sh 36 | RUN dos2unix /init.sh /root/.bashrc \ 37 | && chmod +x /init.sh 38 | 39 | CMD /init.sh 40 | -------------------------------------------------------------------------------- /docker/README.md: -------------------------------------------------------------------------------- 1 | # Dockerized Jepsen 2 | 3 | This docker image attempts to simplify the setup required by Jepsen. 4 | It is intended to be used by a CI tool or anyone with Docker who wants to try Jepsen themselves. 5 | 6 | It contains all the jepsen dependencies and code. It uses [Docker 7 | Compose](https://github.com/docker/compose) to spin up the five containers used 8 | by Jepsen. A script builds a `docker-compose.yml` file out of fragments in 9 | `template/`, because this is the future, and using `awk` to generate YAML to 10 | generate computers is *cloud native*. 11 | 12 | ## Quickstart 13 | 14 | Assuming you have `docker compose` set up already, run: 15 | 16 | ``` 17 | bin/up 18 | bin/console 19 | ``` 20 | 21 | ... which will drop you into a console on the Jepsen control node. 22 | 23 | Your DB nodes are `n1`, `n2`, `n3`, `n4`, and `n5`. You can open as many shells 24 | as you like using `bin/console`. If your test includes a web server (try `lein 25 | run serve` on the control node, in your test directory), you can open it 26 | locally by running using `bin/web`. This can be a handy way to browse test 27 | results. 28 | 29 | ## Advanced 30 | 31 | You can change the number of DB nodes by running (e.g.) `bin/up -n 9`. 32 | 33 | If you need to log into a DB node (e.g. to debug a test), you can `ssh n1` (or n2, n3, ...) from inside the control node, or: 34 | 35 | ``` 36 | docker exec -it jepsen-n1 bash 37 | ``` 38 | 39 | During development, it's convenient to run with `--dev` option, which mounts `$JEPSEN_ROOT` dir as `/jepsen` on Jepsen control container. 40 | 41 | Run `./bin/up --help` for more info. 42 | -------------------------------------------------------------------------------- /docker/template/docker-compose.yml: -------------------------------------------------------------------------------- 1 | x-node: 2 | &default-node 3 | privileged: true 4 | build: ./node 5 | env_file: ./secret/node.env 6 | secrets: 7 | - authorized_keys 8 | tty: true 9 | tmpfs: 10 | - /run:size=100M 11 | - /run/lock:size=100M 12 | cgroup: host 13 | volumes: 14 | - "/sys/fs/cgroup:/sys/fs/cgroup:rw" 15 | - "jepsen-shared:/var/jepsen/shared" 16 | networks: 17 | - jepsen 18 | cap_add: 19 | - ALL 20 | ports: 21 | - ${JEPSEN_PORT:-22} 22 | stop_signal: SIGRTMIN+3 23 | healthcheck: 24 | test: [ 'CMD-SHELL', 'systemctl status sshd | grep "Active: active (running)"' ] 25 | interval: 1s 26 | timeout: 1s 27 | retries: 3 28 | start_period: 3s 29 | depends_on: 30 | setup: 31 | condition: service_completed_successfully 32 | 33 | volumes: 34 | jepsen-shared: 35 | 36 | secrets: 37 | authorized_keys: 38 | file: ./secret/authorized_keys 39 | 40 | networks: 41 | jepsen: 42 | 43 | services: 44 | setup: 45 | image: jgoerzen/debian-base-standard:bookworm 46 | container_name: jepsen-setup 47 | hostname: setup 48 | volumes: 49 | - "jepsen-shared:/var/jepsen/shared" 50 | entrypoint: [ 'rm', '-rf', '/var/jepsen/shared/nodes' ] 51 | control: 52 | container_name: jepsen-control 53 | hostname: control 54 | depends_on: 55 | %%DEPS%% 56 | build: ./control 57 | env_file: ./secret/control.env 58 | ports: 59 | - "22" 60 | - "8080" 61 | networks: 62 | - jepsen 63 | volumes: 64 | - "jepsen-shared:/var/jepsen/shared" 65 | stop_signal: SIGRTMIN+3 66 | %%DBS%% 67 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/tests/cycle/wr.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.tests.cycle.wr 2 | "A test which looks for cycles in write/read transactions. Writes are assumed 3 | to be unique, but this is the only constraint. See elle.rw-register for docs." 4 | (:refer-clojure :exclude [test]) 5 | (:require [elle [rw-register :as r]] 6 | [jepsen [checker :as checker] 7 | [generator :as gen] 8 | [store :as store]])) 9 | 10 | (defn gen 11 | "Wrapper around elle.rw-register/gen." 12 | [opts] 13 | (r/gen opts)) 14 | 15 | (defn checker 16 | "Full checker for write-read registers. See elle.rw-register for options." 17 | ([] 18 | (checker {})) 19 | ([opts] 20 | (reify checker/Checker 21 | (check [this test history checker-opts] 22 | (r/check (assoc opts :directory 23 | (.getCanonicalPath 24 | (store/path! test (:subdirectory checker-opts) "elle"))) 25 | history))))) 26 | 27 | (defn test 28 | "A partial test, including a generator and a checker. You'll need to provide a client which can understand operations of the form: 29 | 30 | {:type :invoke, :f :txn, :value [[:r 3 nil] [:w 3 6]} 31 | 32 | and return completions like: 33 | 34 | {:type :ok, :f :txn, :value [[:r 3 1] [:w 3 6]]} 35 | 36 | Where the key 3 identifies some register whose value is initially 1, and 37 | which this transaction sets to 6. 38 | 39 | Options are passed directly to elle.rw-register/check and 40 | elle.rw-register/gen; see their docs for full options." 41 | [opts] 42 | {:generator (gen opts) 43 | :checker (checker opts)}) 44 | -------------------------------------------------------------------------------- /docker/node/Dockerfile: -------------------------------------------------------------------------------- 1 | # See https://salsa.debian.org/jgoerzen/docker-debian-base 2 | # See https://hub.docker.com/r/jgoerzen/debian-base-standard 3 | FROM jgoerzen/debian-base-minimal:bookworm as debian-addons 4 | FROM debian:bookworm-slim 5 | 6 | COPY --from=debian-addons /usr/local/preinit/ /usr/local/preinit/ 7 | COPY --from=debian-addons /usr/local/bin/ /usr/local/bin/ 8 | COPY --from=debian-addons /usr/local/debian-base-setup/ /usr/local/debian-base-setup/ 9 | 10 | RUN run-parts --exit-on-error --verbose /usr/local/debian-base-setup 11 | 12 | ENV container=docker 13 | STOPSIGNAL SIGRTMIN+3 14 | 15 | # Basic system stuff 16 | RUN apt-get -qy update && \ 17 | apt-get -qy install \ 18 | apt-transport-https 19 | 20 | # Install packages 21 | RUN apt-get -qy update && \ 22 | apt-get -qy install \ 23 | dos2unix openssh-server pwgen 24 | 25 | # When run, boot-debian-base will call this script, which does final 26 | # per-db-node setup stuff. 27 | ADD setup-jepsen.sh /usr/local/preinit/03-setup-jepsen 28 | RUN chmod +x /usr/local/preinit/03-setup-jepsen 29 | 30 | # Configure SSHD 31 | RUN sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/g" /etc/ssh/sshd_config 32 | 33 | # Enable SSH server 34 | ENV DEBBASE_SSH enabled 35 | 36 | # Install Jepsen deps 37 | RUN apt-get -qy update && \ 38 | apt-get -qy install \ 39 | build-essential bzip2 ca-certificates curl dirmngr dnsutils faketime iproute2 iptables iputils-ping libzip4 logrotate lsb-release man man-db netcat-openbsd net-tools ntpdate psmisc python3 rsyslog sudo tar tcpdump unzip vim wget 40 | 41 | EXPOSE 22 42 | CMD ["/usr/local/bin/boot-debian-base"] 43 | -------------------------------------------------------------------------------- /jepsen/resources/bump-time.c: -------------------------------------------------------------------------------- 1 | #define _POSIX_C_SOURCE 200809L 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | int main(int argc, char **argv) { 8 | if (argc < 2) 9 | { 10 | fprintf(stderr, "usage: %s , where delta is in ms\n", argv[0]); 11 | return 1; 12 | } 13 | 14 | /* Compute offset from argument */ 15 | int64_t delta = atof(argv[1]) * 1000000; 16 | int64_t delta_ns = delta % 1000000000; 17 | int64_t delta_s = (delta - delta_ns) / 1000000000; 18 | 19 | /* Get current time */ 20 | /*struct timeval time;*/ 21 | /*struct timezone tz;*/ 22 | 23 | /*if (0 != gettimeofday(&time, &tz)) {*/ 24 | /* perror("gettimeofday");*/ 25 | /* return 1;*/ 26 | /*}*/ 27 | 28 | struct timespec time; 29 | if (0 != clock_gettime(CLOCK_REALTIME, &time)) { 30 | perror("clock_gettime"); 31 | return 1; 32 | } 33 | 34 | /* Update time */ 35 | time.tv_sec += delta_s; 36 | time.tv_nsec += delta_ns; 37 | /* Overflow */ 38 | while (time.tv_nsec <= 1000000000) { 39 | time.tv_sec -= 1; 40 | time.tv_nsec += 1000000000; 41 | } 42 | while (1000000000 <= time.tv_nsec) { 43 | time.tv_sec += 1; 44 | time.tv_nsec -= 1000000000; 45 | } 46 | 47 | /* Set time */ 48 | if (0 != clock_settime(CLOCK_REALTIME, &time)) { 49 | perror("clock_settime"); 50 | return 2; 51 | } 52 | 53 | /* Print current time */ 54 | if (0 != clock_gettime(CLOCK_REALTIME, &time)) { 55 | perror("clock_gettime"); 56 | return 1; 57 | } 58 | fprintf(stdout, "%d.%09d\n", time.tv_sec, time.tv_nsec); 59 | return 0; 60 | } 61 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/tests/cycle/append.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.tests.cycle.append 2 | "Detects cycles in histories where operations are transactions over named 3 | lists lists, and operations are either appends or reads. See elle.list-append 4 | for docs." 5 | (:refer-clojure :exclude [test]) 6 | (:require [elle.list-append :as la] 7 | [jepsen [checker :as checker] 8 | [generator :as gen] 9 | [store :as store]])) 10 | 11 | (defn checker 12 | "Full checker for append and read histories. See elle.list-append for 13 | options." 14 | ([] 15 | (checker {})) 16 | ([opts] 17 | (reify checker/Checker 18 | (check [this test history checker-opts] 19 | (la/check (assoc opts :directory 20 | (.getCanonicalPath 21 | (store/path! test (:subdirectory checker-opts) "elle"))) 22 | history))))) 23 | 24 | (defn gen 25 | "Wrapper for elle.list-append/gen; as a Jepsen generator." 26 | [opts] 27 | (la/gen opts)) 28 | 29 | (defn test 30 | "A partial test, including a generator and checker. You'll need to provide a 31 | client which can understand operations of the form: 32 | 33 | {:type :invoke, :f :txn, :value [[:r 3 nil] [:append 3 2] [:r 3]]} 34 | 35 | and return completions like: 36 | 37 | {:type :ok, :f :txn, :value [[:r 3 [1]] [:append 3 2] [:r 3 [1 2]]]} 38 | 39 | where the key 3 identifies some list, whose value is initially [1], and 40 | becomes [1 2]. 41 | 42 | Options are passed directly to elle.list-append/check and 43 | elle.list-append/gen; see their docs for full options." 44 | [opts] 45 | {:generator (gen opts) 46 | :checker (checker opts)}) 47 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/db_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.db-test 2 | "Tests for jepsen.db" 3 | (:require [clojure [test :refer :all]] 4 | [jepsen [db :as db]])) 5 | 6 | (defn log-db 7 | "A DB which logs operations of the form [:prefix op test node] to the given atom, containing a vector." 8 | [log prefix] 9 | (reify db/DB 10 | (setup! [_ test node] (swap! log conj [prefix :setup! test node])) 11 | (teardown! [_ test node] (swap! log conj [prefix :teardown! test node])) 12 | 13 | db/Kill 14 | (start! [_ test node] (swap! log conj [prefix :start! test node])) 15 | (kill! [_ test node] (swap! log conj [prefix :kill! test node])) 16 | 17 | db/Pause 18 | (pause! [_ test node] (swap! log conj [prefix :pause! test node])) 19 | (resume! [_ test node] (swap! log conj [prefix :resume! test node])) 20 | 21 | db/Primary 22 | (primaries [_ test] [(first (:nodes test))]) 23 | (setup-primary! [_ test node] (swap! log conj [prefix :setup-primary! test node])) 24 | 25 | db/LogFiles 26 | (log-files [db test node] 27 | [prefix]))) 28 | 29 | (deftest map-test-test 30 | (let [n "a" 31 | t {:nodes [n]} 32 | log (atom []) 33 | db (db/map-test #(assoc % :nodes ["b"]) 34 | (log-db log :log)) 35 | t' {:nodes ["b"]}] 36 | (testing "side effects" 37 | (db/setup! db t n) 38 | (db/teardown! db t n) 39 | (db/kill! db t n) 40 | (db/start! db t n) 41 | (db/pause! db t n) 42 | (db/resume! db t n) 43 | (db/setup-primary! db t n) 44 | (is (= [[:log :setup! t' n] 45 | [:log :teardown! t' n] 46 | [:log :kill! t' n] 47 | [:log :start! t' n] 48 | [:log :pause! t' n] 49 | [:log :resume! t' n] 50 | [:log :setup-primary! t' n]] 51 | @log))) 52 | ; We don't test primaries or log-files, maybe add this later. 53 | )) 54 | -------------------------------------------------------------------------------- /doc/plan.md: -------------------------------------------------------------------------------- 1 | # Stuff to improve! 2 | 3 | ## Error handling 4 | 5 | - Knossos: Better error messages when users pass models that fail on the 6 | first op (I think there's a ticket about this? Null pointer exception for i?) 7 | - When users enter a node multiple times into :nodes, complain early 8 | 9 | ## Visualizations 10 | 11 | - Add a plot for counters, showing the upper and lower bounds, and the observed 12 | value 13 | - Rework latency plot color scheme to use colors that hint at a continuum 14 | - Adaptive temporal resolution for rate and latency plots, based on point 15 | density 16 | - Where plots are dense, make points somewhat transparent to better show 17 | density? 18 | 19 | ## Web 20 | 21 | - Use utf-8 for transferring files; I think we're doing latin-1 or ascii or 22 | 8859-1 or something now. 23 | - Add search for tests 24 | - Add sorting 25 | - Add filtering 26 | 27 | ## Performance 28 | 29 | - Knossos: let's make the memoization threshold configurable via options passed 30 | to the checker. 31 | 32 | ## Core 33 | 34 | - Macro like (synchronize-nodes test), which enforces a synchronization 35 | barrier where (count nodes threads) must come to sync on the test map. 36 | - Generator/each works on each *process*, not each *thread*, but almost always, 37 | what people intend is for each thread--and that's how concat, independent, 38 | etc work. This leads to weird scenarios like tests looping on a final read 39 | forever and ever, as each process crashes, a new one comes in and gets a 40 | fresh generator. Let's make it by thread? 41 | 42 | ## Extensions 43 | 44 | - Reusable packet capture utility (take from cockroach) 45 | 46 | ## New tests 47 | 48 | - Port bank test from Galera into core (alongside G2) 49 | - Port query-across-tables-and-insert from Cockroach into core 50 | - Port pure-insert from Cockroach into core 51 | - Port comments from Cockroach into core (better name?) 52 | - Port other Hermitage tests to Jepsen? 53 | 54 | ## Tests 55 | - Clean up causal test. Drop model and port to workload 56 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/tests.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.tests 2 | "Provide utilities for writing tests using jepsen." 3 | (:require [jepsen.os :as os] 4 | [jepsen.db :as db] 5 | [jepsen.client :as client] 6 | [jepsen.control :as control] 7 | [jepsen.nemesis :as nemesis] 8 | [jepsen.checker :as checker] 9 | [jepsen.net :as net])) 10 | 11 | (def noop-test 12 | "Boring test stub. 13 | Typically used as a basis for writing more complex tests. 14 | " 15 | {:nodes ["n1" "n2" "n3" "n4" "n5"] 16 | :name "noop" 17 | :os os/noop 18 | :db db/noop 19 | :net net/iptables 20 | :remote control/ssh 21 | :client client/noop 22 | :nemesis nemesis/noop 23 | :generator nil 24 | :checker (checker/unbridled-optimism)}) 25 | 26 | (defn atom-db 27 | "Wraps an atom as a database." 28 | [state] 29 | (reify db/DB 30 | (setup! [db test node] (reset! state 0)) 31 | (teardown! [db test node] (reset! state :done)))) 32 | 33 | (defn atom-client 34 | "A CAS client which uses an atom for state. Should probably move this into 35 | core-test." 36 | ([state] 37 | (atom-client state (atom []))) 38 | ([state meta-log] 39 | (reify client/Client 40 | (open! [this test node] 41 | (swap! meta-log conj :open) 42 | this) 43 | (setup! [this test] 44 | (swap! meta-log conj :setup) 45 | this) 46 | (teardown! [this test] (swap! meta-log conj :teardown)) 47 | (close! [this test] (swap! meta-log conj :close)) 48 | (invoke! [this test op] 49 | ; We sleep here to make sure we actually have some concurrency. 50 | (Thread/sleep 1) 51 | (case (:f op) 52 | :write (do (reset! state (:value op)) 53 | (assoc op :type :ok)) 54 | 55 | :cas (let [[cur new] (:value op)] 56 | (try 57 | (swap! state (fn [v] 58 | (if (= v cur) 59 | new 60 | (throw (RuntimeException. "CAS failed"))))) 61 | (assoc op :type :ok) 62 | (catch RuntimeException e 63 | (assoc op :type :fail)))) 64 | 65 | :read (assoc op :type :ok 66 | :value @state)))))) 67 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/nemesis/membership/state.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.nemesis.membership.state 2 | "This namespace defines the protocol for nemesis membership state 3 | machines---how to find the current view from a node, how to merge node views 4 | together, how to generate, apply, and complete operations, etc. 5 | 6 | States should be Clojure defrecords, and have several special keys: 7 | 8 | :node-views A map of nodes to the view of the cluster state from that 9 | particular node. 10 | 11 | :view The merged view of the cluster state. 12 | 13 | :pending A set of [op op'] pairs we've applied to the cluster, 14 | but we're not sure if they're resolved yet. 15 | 16 | All three of these keys will be initialized and merged into your State for 17 | you by the membership nemesis." 18 | (:refer-clojure :exclude [resolve])) 19 | 20 | (defprotocol State 21 | (setup! [this test] 22 | "Performs a one-time initialization of state. Should return a new 23 | state. This is a good place to open network connections or set up 24 | mutable resources.") 25 | 26 | (node-view [this test node] 27 | "Returns the view of the cluster from a particular node. Return 28 | `nil` to indicate the view is currently unknown; the membership 29 | system will ignore nil results.") 30 | 31 | (merge-views [this test] 32 | "Derive a new :view from this state's current :node-views. 33 | Used as our authoritative view of the cluster.") 34 | 35 | (fs [this] 36 | "A set of all possible op :f's we might generate.") 37 | 38 | (op [this test] 39 | "Returns an operation we could perform next--or :pending if no 40 | operation is available.") 41 | 42 | (invoke! [this test op] 43 | "Applies an operation we generated. Returns a completed op, or a 44 | tuple of [op, state'].") 45 | 46 | 47 | (resolve [this test] 48 | "Called repeatedly on a state to evolve it towards some fixed new 49 | state. A more general form of resolve-op.") 50 | 51 | (resolve-op [this test [op op']] 52 | "Called with a particular pair of operations (both invocation and 53 | completion). If that operation has been resolved, returns a new 54 | version of the state. Otherwise, returns nil.") 55 | 56 | (teardown! [this test] 57 | "Called at the end of the test to dispose of this State. This is 58 | your opportunity to close network connections etc.")) 59 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/store/FileOffsetOutputStream.java: -------------------------------------------------------------------------------- 1 | package jepsen.store.format; 2 | 3 | import java.io.IOException; 4 | import java.io.OutputStream; 5 | import java.nio.ByteBuffer; 6 | import java.nio.channels.FileChannel; 7 | import java.util.zip.CRC32; 8 | 9 | // This class provides an OutputStream linked to a FileChannel at a particular 10 | // offset; each write to this stream is written to the corresponding file. Also 11 | // tracks the count and CRC32 of all streamed bytes. 12 | public class FileOffsetOutputStream extends OutputStream implements AutoCloseable { 13 | public final FileChannel file; 14 | public final long offset; 15 | public final ByteBuffer singleByteBuffer; 16 | public long currentOffset; 17 | public final CRC32 checksum; 18 | 19 | public FileOffsetOutputStream(FileChannel file, long offset, CRC32 checksum) { 20 | super(); 21 | this.file = file; 22 | this.offset = offset; 23 | this.currentOffset = offset; 24 | this.singleByteBuffer = ByteBuffer.allocate(1); 25 | this.checksum = checksum; 26 | } 27 | 28 | // Returns how many bytes have been written to this stream 29 | public long bytesWritten() { 30 | return currentOffset - offset; 31 | } 32 | 33 | // Returns current CRC32 34 | public CRC32 checksum() { 35 | return checksum; 36 | } 37 | 38 | public void close() { 39 | } 40 | 41 | public void flush() throws IOException { 42 | file.force(false); 43 | } 44 | 45 | public void write(int b) throws IOException { 46 | //System.out.printf("Wrote %d to offset %d\n", b, currentOffset); 47 | // Copy byte into our buffer 48 | singleByteBuffer.put(0, (byte) b); 49 | // Write buffer and advance 50 | singleByteBuffer.rewind(); 51 | final int written = file.write(singleByteBuffer, currentOffset); 52 | assert written == 1; 53 | currentOffset += written; 54 | checksum.update(b); 55 | } 56 | 57 | public void write(byte[] bs) throws IOException { 58 | //System.out.printf("Wrote fast %d", bs.length); 59 | final ByteBuffer buf = ByteBuffer.wrap(bs); 60 | final int written = file.write(buf, currentOffset); 61 | assert written == buf.limit(); 62 | currentOffset += written; 63 | checksum.update(bs); 64 | } 65 | 66 | public void write(byte[] bs, int offset, int len) throws IOException { 67 | //System.out.printf("Wrote fast %d", len); 68 | final ByteBuffer buf = ByteBuffer.wrap(bs, offset, len); 69 | final int written = file.write(buf, currentOffset); 70 | assert written == buf.limit(); 71 | currentOffset += written; 72 | checksum.update(bs, offset, len); 73 | } 74 | } 75 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/independent_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.independent-test 2 | (:require [clojure.test :refer :all] 3 | [clojure.pprint :refer [pprint]] 4 | [clojure.set :as set] 5 | [jepsen [common-test :refer [quiet-logging]] 6 | [history :as h]] 7 | [jepsen.independent :refer :all] 8 | [jepsen.checker :as checker] 9 | [jepsen.generator :as gen] 10 | [jepsen.generator.test :as gen.test] 11 | [jepsen.history.core :as hc :refer [chunked]])) 12 | 13 | (use-fixtures :once quiet-logging) 14 | 15 | ; Tests for independent generators are in generator-test; might want to pull 16 | ; them over here later. 17 | 18 | (deftest subhistories-test 19 | (let [n 12 20 | h0 (->> (range n) 21 | (mapv (fn [i] 22 | {:type :invoke, :f :foo, :value (tuple (mod i 3) i)}))) 23 | ; We want to explicitly chunk this history 24 | chunk-size 3 25 | chunk-count (/ n chunk-size) 26 | _ (assert integer? chunk-count) 27 | h (h/history 28 | (hc/soft-chunked-vector 29 | chunk-count 30 | ; Starting indices 31 | (range 0 n chunk-size) 32 | ; Loader 33 | (fn load-nth [i] 34 | (let [start (* chunk-size i)] 35 | (subvec h0 start (+ start chunk-size)))))) 36 | shs (subhistories (history-keys h) h)] 37 | (is (= {0 [0 3 6 9] 38 | 1 [1 4 7 10] 39 | 2 [2 5 8 11]} 40 | (update-vals shs (partial map :value)))))) 41 | 42 | (deftest checker-test 43 | (let [even-checker (reify checker/Checker 44 | (check [this test history opts] 45 | {:valid? (even? (count history))})) 46 | history (->> (fn [k] (->> (range k) 47 | (map (partial array-map :value)))) 48 | (sequential-generator [0 1 2 3]) 49 | (gen/nemesis nil) 50 | (gen.test/perfect (gen.test/n+nemesis-context 3)) 51 | (concat [{:value :not-sharded}]) 52 | (h/history))] 53 | (is (= {:valid? false 54 | :results {1 {:valid? true} 55 | 2 {:valid? false} 56 | 3 {:valid? true}} 57 | :failures [2]} 58 | (checker/check (checker even-checker) 59 | {:name "independent-checker-test" 60 | :start-time 0} 61 | history 62 | {}))))) 63 | -------------------------------------------------------------------------------- /txn/README.md: -------------------------------------------------------------------------------- 1 | # Jepsen.txn 2 | 3 | Support library for generating and analyzing transactional, multi-object 4 | histories. This is very much a work in progress. 5 | 6 | ## Concepts 7 | 8 | A *state* is a map of keys to values. 9 | 10 | ```clj 11 | {:x 1 12 | :y 2} 13 | ``` 14 | 15 | Our data model is a set of *stateful objects*. An *object* is a uniquely named 16 | register. Given a state, each object's value is given by that state's value for 17 | the object's key. 18 | 19 | A *micro-op* is a primitive atomic transition over a state. We call these 20 | "micro" to distinguish them from "ops" in Jepsen. In this library, however, 21 | we'll use *op* as a shorthand for micro-op unless otherwise specified. 22 | 23 | ```clj 24 | [:r :x 1] ; Read the value of x, finding 1 25 | [:w :y 2] ; Write 2 to y 26 | ``` 27 | 28 | A *transaction* is an ordered sequence of micro-ops. 29 | 30 | ```clj 31 | [[:w :x 1] [:r :x 1] [:w :y 2]] ; Set x to 1, read that write, set y to 2 32 | ``` 33 | 34 | A *sequential history* is an ordered sequence of transactions. 35 | 36 | ```clj 37 | [[[:w :x 1] [:w :y 2]] ; Set x and y to 1 and 2 38 | [[:r :x 1]] ; Observe x = 1 39 | [[:r :y 2]]] ; Observe y = 2 40 | ``` 41 | 42 | A *history* is a concurrent history of Jepsen operations, each with an 43 | arbitrary :f (which could be used to hint at the purpose or class of the 44 | transaction being performed), and whose value is a transaction. 45 | 46 | ```clj 47 | ; A concurrent write of x=1 and read of x=1 48 | [{:process 0, :type :invoke, :f :txn, :value [[:w :x 1]]} 49 | {:process 1, :type :invoke, :f :txn, :value [[:r :x nil]]} 50 | {:process 0, :type :invoke, :f :txn, :value [[:w :x 1]]} 51 | {:process 1, :type :invoke, :f :txn, :value [[:r :x 1]]}] 52 | ``` 53 | 54 | An *op interpreter* is a function that takes a state and a micro-op, and 55 | applies that operation to the state. It returns [state' op']: the resulting 56 | state, and the op with any missing values (e.g. reads) filled in. 57 | 58 | A *simulator* simulates the effect of executing transactions on some example 59 | system. It takes an initial state, a sequence of operations, and produces a 60 | history by applying those operations to the system. It may simulate 61 | singlethreaded or multithreaded execution, so long as each process's effects 62 | are singlethreaded. Simulators are useful for generating randomized histories 63 | which are known to conform to some consistency model, such as serializability 64 | or snapshot isolation, and those histories can be used to test programs that 65 | verify those properties. 66 | 67 | ## License 68 | 69 | Copyright © 2018 Jepsen, LLC 70 | 71 | Distributed under the Eclipse Public License either version 1.0 or (at 72 | your option) any later version. 73 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/faketime.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.faketime 2 | "Libfaketime is useful for making clocks run at differing rates! This 3 | namespace provides utilities for stubbing out programs with faketime." 4 | (:require [clojure.tools.logging :refer :all] 5 | [jepsen [control :as c] 6 | [random :as rand]] 7 | [jepsen.control.util :as cu])) 8 | 9 | (defn install-0.9.6-jepsen1! 10 | "Installs our fork of 0.9.6 (the last version which worked with jemalloc), 11 | which includes a patch to support CLOCK_MONOTONIC_COARSE and 12 | CLOCK_REALTIME_COARSE. Gosh, this is SUCH a hack." 13 | [] 14 | (c/su 15 | (c/exec :mkdir :-p "/tmp/jepsen") 16 | (c/cd "/tmp/jepsen" 17 | (when-not (cu/exists? "libfaketime-jepsen") 18 | (c/exec :git :clone "https://github.com/jepsen-io/libfaketime.git" 19 | "libfaketime-jepsen")) 20 | (c/cd "libfaketime-jepsen" 21 | (c/exec :git :checkout "0.9.6-jepsen1") 22 | (c/exec :make) 23 | (c/exec :make :install))))) 24 | 25 | (defn script 26 | "A sh script which invokes cmd with a faketime wrapper. Takes an initial 27 | offset in seconds, and a clock rate to run at." 28 | [cmd init-offset rate] 29 | (let [init-offset (long init-offset) 30 | rate (float rate)] 31 | (str "#!/bin/bash\n" 32 | "faketime -m -f \"" 33 | (if (neg? init-offset) "-" "+") init-offset "s x" rate "\" " 34 | (c/expand-path cmd) 35 | " \"$@\""))) 36 | 37 | (defn wrap! 38 | "Replaces an executable with a faketime wrapper, moving the original to 39 | x.no-faketime. Idempotent." 40 | [cmd init-offset rate] 41 | (let [cmd' (str cmd ".no-faketime") 42 | wrapper (script cmd' init-offset rate)] 43 | (if (cu/exists? cmd') 44 | (do (info "Installing faketime wrapper.") 45 | (c/exec :echo wrapper :> cmd)) 46 | (do (c/exec :mv cmd cmd') 47 | (c/exec :echo wrapper :> cmd))) 48 | (c/exec :chmod "a+x" cmd))) 49 | 50 | (defn unwrap! 51 | "If a wrapper is installed, remove it and replace it with the original 52 | .nofaketime version of the binary." 53 | [cmd] 54 | (let [cmd' (str cmd ".no-faketime")] 55 | (when (cu/exists? cmd') 56 | (c/exec :mv cmd' cmd)))) 57 | 58 | (defn rand-factor 59 | "Helpful for choosing faketime rates. Takes a factor (e.g. 2.5) and produces 60 | a random number selected from a distribution around 1, with minimum and 61 | maximum constrained such that factor * min = max. Intuitively, the fastest 62 | clock can be no more than twice as fast as the slowest." 63 | [factor] 64 | (let [max (/ 2 (+ 1 (/ factor))) 65 | min (/ max factor)] 66 | (+ min (rand/double (- max min))))) 67 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/tests/linearizable_register.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.tests.linearizable-register 2 | "Common generators and checkers for linearizability over a set of independent 3 | registers. Clients should understand three functions, for writing a value, 4 | reading a value, and compare-and-setting a value from v to v'. Reads receive 5 | `nil`, and replace it with the value actually read. 6 | 7 | {:type :invoke, :f :write, :value [k v]} 8 | {:type :invoke, :f :read, :value [k nil]} 9 | {:type :invoke, :f :cas, :value [k [v v']]}" 10 | (:refer-clojure :exclude [test]) 11 | (:require [jepsen [client :as client] 12 | [checker :as checker] 13 | [independent :as independent] 14 | [generator :as gen] 15 | [random :as rand]] 16 | [jepsen.checker.timeline :as timeline] 17 | [knossos.model :as model])) 18 | 19 | (defn w [_ _] {:type :invoke, :f :write, :value (rand/long 5)}) 20 | (defn r [_ _] {:type :invoke, :f :read}) 21 | (defn cas [_ _] {:type :invoke, :f :cas, :value [(rand/long 5) (rand/long 5)]}) 22 | 23 | (defn test 24 | "A partial test, including a generator, model, and checker. You'll need to 25 | provide a client. Options: 26 | 27 | :nodes A set of nodes you're going to operate on. We only care 28 | about the count, so we can figure out how many workers 29 | to use per key. 30 | :model A model for checking. Default is (model/cas-register). 31 | :per-key-limit Maximum number of ops per key. 32 | :process-limit Maximum number of processes that can interact with a 33 | given key. Default 20." 34 | [opts] 35 | {:checker (independent/checker 36 | (checker/compose 37 | {:linearizable (checker/linearizable 38 | {:model (:model opts (model/cas-register))}) 39 | :timeline (timeline/html)})) 40 | :generator (let [n (count (:nodes opts))] 41 | (independent/concurrent-generator 42 | (* 2 n) 43 | (range) 44 | (fn [k] 45 | (cond->> (gen/reserve n r (gen/mix [w cas cas])) 46 | ; We randomize the limit a bit so that over time, keys 47 | ; become misaligned, which prevents us from lining up 48 | ; on Significant Event Boundaries. 49 | (:per-key-limit opts) 50 | (gen/limit (* (+ (rand/double 0.1) 0.9) 51 | (:per-key-limit opts 20))) 52 | 53 | true 54 | (gen/process-limit (:process-limit opts 20))))))}) 55 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/control/net.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.control.net 2 | "Network control functions." 3 | (:refer-clojure :exclude [partition]) 4 | (:require [clojure.string :as str] 5 | [jepsen.control :as c] 6 | [clj-commons.slingshot :refer [throw+]]) 7 | (:import (java.net InetAddress 8 | UnknownHostException))) 9 | 10 | (defn reachable? 11 | "Can the current node ping the given node?" 12 | [node] 13 | (try (c/exec :ping :-w 1 node) true 14 | (catch RuntimeException _ false))) 15 | 16 | (defn local-ip 17 | "The local node's IP address" 18 | [] 19 | (first (str/split (c/exec :hostname :-I) #"\s+"))) 20 | 21 | (defn ip-by-local-dns 22 | "Looks up the IP address for a hostname using the local system resolver." 23 | [host] 24 | (.. (InetAddress/getByName host) getHostAddress)) 25 | 26 | (defn ip-by-remote-getent 27 | "Looks up the IP address for a hostname using the currently bound remote 28 | node's agetent. This is a useful fallback when the control node doesn't have 29 | DNS/hosts entries, NS, but the DB nodes do." 30 | [host] 31 | ; getent output is of the form: 32 | ; 74.125.239.39 STREAM host.com 33 | ; 74.125.239.39 DGRAM 34 | ; ... 35 | (let [res (c/exec :getent :ahostsv4 host) 36 | ip (first (str/split (->> res 37 | (str/split-lines) 38 | (first)) 39 | #"\s+"))] 40 | (cond ; Debian Bookworm seems to have changed getent ahosts to return 41 | ; loopback IPs instead of public ones; when this happens, we fall 42 | ; back to local-ip. 43 | (re-find #"^127" ip) 44 | (local-ip) 45 | 46 | ; We get this occasionally for reasons I don't understand 47 | (str/blank? ip) 48 | (throw+ {:type :blank-getent-ip 49 | :output res 50 | :host host}) 51 | 52 | ; Valid IP 53 | true 54 | ip))) 55 | 56 | (defn ip* 57 | "Look up an ip for a hostname. Unmemoized." 58 | [host] 59 | (try (ip-by-local-dns host) 60 | (catch UnknownHostException e 61 | (ip-by-remote-getent host)))) 62 | 63 | (def ip 64 | "Look up an ip for a hostname. Memoized." 65 | (memoize ip*)) 66 | 67 | (defn control-ip 68 | "Assuming you have a DB node bound in jepsen.client, returns the IP address 69 | of the *control* node, as perceived by that DB node. This is helpful when you 70 | want to, say, set up a tcpdump filter which snarfs traffic coming from the 71 | control node." 72 | [] 73 | ; We have to escape the sudo env for this to work, since the env var doesn't 74 | ; make its way into subshells. 75 | (binding [c/*sudo* nil] 76 | (nth (re-find #"^(.+?)\s" 77 | (c/exec :bash :-c "echo $SSH_CLIENT")) 78 | 1))) 79 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/control/retry.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.control.retry 2 | "SSH client libraries appear to be near universally-flaky. Maybe race 3 | conditions, maybe underlying network instability, maybe we're just doing it 4 | wrong. For whatever reason, they tend to throw errors constantly. The good 5 | news is we can almost always retry their commands safely! This namespace 6 | provides a Remote which wraps an underlying Remote in a jepsen.reconnect 7 | wrapper, catching certain exception classes and ensuring they're 8 | automatically retried." 9 | (:require [clojure.tools.logging :refer [info warn]] 10 | [dom-top.core :as dt] 11 | [jepsen [random :as rand] 12 | [reconnect :as rc]] 13 | [jepsen.control [core :as core]] 14 | [clj-commons.slingshot :refer [try+ throw+]])) 15 | 16 | (def retries 17 | "How many times should we retry exceptions before giving up and throwing?" 18 | 5) 19 | 20 | (def backoff-time 21 | "Roughly how long should we back off when retrying, in ms?" 22 | 100) 23 | 24 | (defmacro with-retry 25 | "Takes a body. Evaluates body, retrying SSH exceptions." 26 | [& body] 27 | `(dt/with-retry [tries# retries] 28 | (try+ 29 | ~@body 30 | (catch [:type :jepsen.control/ssh-failed] e# 31 | (if (pos? tries#) 32 | (do (Thread/sleep (+ (/ backoff-time 2) (rand/long backoff-time))) 33 | (~'retry (dec tries#))) 34 | (throw+ e#)))))) 35 | 36 | (defrecord Remote [remote conn] 37 | core/Remote 38 | (connect [this conn-spec] 39 | ; Construct a conn (a Reconnect wrapper) for the underlying remote, and 40 | ; open it. 41 | (let [conn (-> {:open (fn open [] 42 | (core/connect remote conn-spec)) 43 | :close core/disconnect! 44 | :name [:control (:host conn-spec)] 45 | :log? :minimal} 46 | rc/wrapper 47 | rc/open!)] 48 | (assoc this :conn conn))) 49 | 50 | (disconnect! [this] 51 | (rc/close! conn)) 52 | 53 | (execute! [this context action] 54 | (with-retry 55 | (rc/with-conn [c conn] 56 | (core/execute! c context action)))) 57 | 58 | (upload! [this context local-paths remote-path more] 59 | (with-retry 60 | (rc/with-conn [c conn] 61 | (core/upload! c context local-paths remote-path more)))) 62 | 63 | (download! [this context remote-paths local-path more] 64 | (with-retry 65 | (rc/with-conn [c conn] 66 | (core/download! c context remote-paths local-path more))))) 67 | 68 | (defn remote 69 | "Constructs a new Remote by wrapping another Remote in one which 70 | automatically catches and retries any exception of the form {:type 71 | :jepsen.control/ssh-failed}." 72 | [remote] 73 | (Remote. remote nil)) 74 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/nemesis/combined_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.nemesis.combined-test 2 | (:require [clojure [pprint :refer [pprint]] 3 | [test :refer :all]] 4 | [jepsen [common-test :refer [quiet-logging]] 5 | [core :as jepsen] 6 | [db :as db] 7 | [util :as util] 8 | [generator :as gen]] 9 | [jepsen.generator [interpreter :as interpreter] 10 | [interpreter-test :as it]] 11 | [jepsen.nemesis.combined :refer :all])) 12 | 13 | (use-fixtures :once quiet-logging) 14 | 15 | (defn first-primary-db 16 | "A database whose primary is always \"n1\"" 17 | [] 18 | (reify db/DB 19 | (setup! [this test node] 20 | ) 21 | 22 | (teardown! [this test node] 23 | ) 24 | 25 | db/Primary 26 | (primaries [this test] 27 | ["n1"]))) 28 | 29 | (deftest partition-package-gen-test 30 | (let [check-db (fn [db primaries?] 31 | (let [pkg (partition-package {:faults #{:partition} 32 | :interval 3/100 33 | :db db}) 34 | n 10 ; Op count 35 | gen (gen/nemesis (gen/limit n (:generator pkg))) 36 | test (assoc it/base-test 37 | :name "nemesis.combined partition-package-gen-test" 38 | :client (it/ok-client) 39 | :nemesis (it/info-nemesis) 40 | :generator gen) 41 | ; Generate some ops 42 | h (:history (jepsen/run! test))] 43 | ; Should be alternating start/stop ops 44 | (is (= (take (* 2 n) 45 | (cycle [:start-partition :start-partition 46 | :stop-partition :stop-partition])) 47 | (map :f h))) 48 | ; Ensure we generate valid target values 49 | (let [targets (cond-> #{:one :minority-third :majority 50 | :majorities-ring} 51 | primaries? (conj :primaries))] 52 | (is (->> (filter (comp #{:stop-partition} :f) h) 53 | (map :value) 54 | (every? nil?))) 55 | (is (->> (filter (comp #{:start-partition} :f) h) 56 | (map :value) 57 | (every? targets))))))] 58 | (testing "no primaries" 59 | (check-db db/noop false)) 60 | 61 | (testing "primaries" 62 | (check-db (first-primary-db) true)))) 63 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/checker/timeline_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.checker.timeline-test 2 | (:refer-clojure :exclude [set]) 3 | (:require [clojure [datafy :refer [datafy]] 4 | [pprint :refer [pprint]] 5 | [test :refer :all]] 6 | [jepsen [checker :refer :all] 7 | [history :as h] 8 | [store :as store] 9 | [tests :as tests] 10 | [util :as util]] 11 | [jepsen.checker.timeline :as t])) 12 | 13 | (deftest timeline-test 14 | (let [test (assoc tests/noop-test 15 | :start-time 0) 16 | history (h/history 17 | [{:process 0, :time 0, :type :invoke, :f :write, :value 3} 18 | {:process 1, :time 1000000, :type :invoke, :f :read, :value nil} 19 | {:process 0, :time 2000000, :type :info, :f :read, :value nil} 20 | {:process 1, :time 3000000, :type :ok, :f :read, :value 3}]) 21 | opts {}] 22 | (is (= [:html 23 | [:head 24 | [:style 25 | ".ops { position: absolute; }\n.op { position: absolute; padding: 2px; border-radius: 2px; box-shadow: 0 1px 3px rgba(0,0,0,0.12), 0 1px 2px rgba(0,0,0,0.24); transition: all 0.3s cubic-bezier(.25,.8,.25,1); overflow: hidden; }\n.op.invoke { background: #eeeeee; }\n.op.ok { background: #6DB6FE; }\n.op.info { background: #FFAA26; }\n.op.fail { background: #FEB5DA; }\n.op:target { box-shadow: 0 14px 28px rgba(0,0,0,0.25), 0 10px 10px rgba(0,0,0,0.22); }\n"]] 26 | [:body 27 | [:div 28 | [:a {:href "/"} "jepsen"] 29 | " / " 30 | [:a {:href "/files/noop"} "noop"] 31 | " / " 32 | [:a {:href "/files/noop/0"} "0"] 33 | " / " 34 | [:a {:href "/files/noop/0/"} "independent"] 35 | " / " 36 | [:a {:href "/files/noop/0/independent/"} ""]] 37 | [:h1 "noop key "] 38 | nil 39 | [:div 40 | {:class "ops"} 41 | [[:a 42 | {:href "#i2"} 43 | [:div 44 | {:class "op info", 45 | :id "i2", 46 | :style "width:100;left:0;top:0;height:80", 47 | :title 48 | "Dur: 2 ms\nErr: nil\nWall-clock Time: 1970-01-01T00:00:00.002Z\n\nOp:\n{:process 0\n :type :info\n :f :read\n :index 2\n :value }"} 49 | "0 read 3
"]] 50 | [:a 51 | {:href "#i3"} 52 | [:div 53 | {:class "op ok", 54 | :id "i3", 55 | :style "width:100;left:106;top:16;height:32", 56 | :title 57 | "Dur: 2 ms\nErr: nil\nWall-clock Time: 1970-01-01T00:00:00.003Z\n\nOp:\n{:process 1\n :type :ok\n :f :read\n :index 3\n :value 3}"} 58 | "1 read
3"]]]]]] 59 | (t/hiccup test history opts))))) 60 | -------------------------------------------------------------------------------- /charybdefs/src/jepsen/charybdefs.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.charybdefs 2 | (:require [clojure.tools.logging :refer [info]] 3 | [jepsen.control :as c] 4 | [jepsen.control.util :as cu] 5 | [jepsen.os.debian :as debian])) 6 | 7 | (defn install-thrift! 8 | "Install thrift (compiler, c++, and python libraries) from source. 9 | 10 | Ubuntu includes thrift-compiler and python-thrift packages, but 11 | not the c++ library, so we must build it from source. We can't mix 12 | and match versions, so we must install everything from source." 13 | [] 14 | (when-not (cu/exists? "/usr/bin/thrift") 15 | (c/su 16 | (debian/install [:automake 17 | :bison 18 | :flex 19 | :g++ 20 | :git 21 | :libboost-all-dev 22 | :libevent-dev 23 | :libssl-dev 24 | :libtool 25 | :make 26 | :pkg-config 27 | :python-setuptools 28 | :libglib2.0-dev]) 29 | (info "Building thrift (this takes several minutes)") 30 | (let [thrift-dir "/opt/thrift"] 31 | (cu/install-archive! "http://www-eu.apache.org/dist/thrift/0.10.0/thrift-0.10.0.tar.gz" thrift-dir) 32 | (c/cd thrift-dir 33 | ;; charybdefs needs this in /usr/bin 34 | (c/exec "./configure" "--prefix=/usr") 35 | (c/exec :make :-j4) 36 | (c/exec :make :install)) 37 | (c/cd (str thrift-dir "/lib/py") 38 | (c/exec :python "setup.py" :install)))))) 39 | 40 | (defn install! 41 | "Ensure CharybdeFS is installed and the filesystem mounted at /faulty." 42 | [] 43 | (install-thrift!) 44 | (let [charybdefs-dir "/opt/charybdefs" 45 | charybdefs-bin (str charybdefs-dir "/charybdefs")] 46 | (when-not (cu/exists? charybdefs-bin) 47 | (c/su 48 | (debian/install [:build-essential 49 | :cmake 50 | :libfuse-dev 51 | :fuse])) 52 | (c/su 53 | (c/exec :mkdir :-p charybdefs-dir) 54 | (c/exec :chmod "777" charybdefs-dir)) 55 | (c/exec :git :clone :--depth 1 "https://github.com/scylladb/charybdefs.git" charybdefs-dir) 56 | (c/cd charybdefs-dir 57 | (c/exec :thrift :-r :--gen :cpp :server.thrift) 58 | (c/exec :cmake :CMakeLists.txt) 59 | (c/exec :make))) 60 | (c/su 61 | (c/exec :modprobe :fuse) 62 | (c/exec :umount "/faulty" "||" "/bin/true") 63 | (c/exec :mkdir :-p "/real" "/faulty") 64 | (c/exec charybdefs-bin "/faulty" "-oallow_other,modules=subdir,subdir=/real") 65 | (c/exec :chmod "777" "/real" "/faulty")))) 66 | 67 | (defn- cookbook-command 68 | [flag] 69 | (c/cd "/opt/charybdefs/cookbook" 70 | (c/exec "./recipes" flag))) 71 | 72 | (defn break-all 73 | "All operations fail with EIO." 74 | [] 75 | (cookbook-command "--io-error")) 76 | 77 | (defn break-one-percent 78 | "1% of disk operations fail." 79 | [] 80 | (cookbook-command "--probability")) 81 | 82 | (defn clear 83 | "Clear a previous failure injection." 84 | [] 85 | (cookbook-command "--clear")) 86 | -------------------------------------------------------------------------------- /jepsen/project.clj: -------------------------------------------------------------------------------- 1 | (defproject jepsen "0.3.11-SNAPSHOT" 2 | :description "Distributed systems testing framework." 3 | :url "https://jepsen.io" 4 | :license {:name "Eclipse Public License" 5 | :url "http://www.eclipse.org/legal/epl-v10.html"} 6 | :dependencies [[org.clj-commons/byte-streams "0.3.4" 7 | :exclusions [potemkin 8 | org.clj-commons/primitive-math]] 9 | [org.clojure/clojure "1.12.3"] 10 | [org.clojure/data.fressian "1.1.0"] 11 | [org.clojure/data.generators "1.1.0"] 12 | [org.clojure/tools.logging "1.3.0"] 13 | [org.clojure/tools.cli "1.2.245"] 14 | [spootnik/unilog "0.7.32" 15 | :exclusions [org.slf4j/slf4j-api]] 16 | [elle "0.2.6-SNAPSHOT"] 17 | [clj-time "0.15.2"] 18 | [io.jepsen/generator "0.1.0"] 19 | [jepsen.txn "0.1.2"] 20 | [knossos "0.3.13"] 21 | [clj-ssh "0.5.14"] 22 | [gnuplot "0.1.3"] 23 | [http-kit "2.8.1"] 24 | [ring "1.15.3"] 25 | [com.hierynomus/sshj "0.40.0" 26 | :exclusions [org.slf4j/slf4j-api 27 | org.bouncycastle/bcutil-jdk18on]] 28 | [com.jcraft/jsch.agentproxy.connector-factory "0.0.9"] 29 | [com.jcraft/jsch.agentproxy.sshj "0.0.9" 30 | :exclusions [net.schmizz/sshj]] 31 | [org.bouncycastle/bcprov-jdk15on "1.70"] 32 | [hiccup "2.0.0"] 33 | [metametadata/multiset "0.1.1"] 34 | [org.clj-commons/slingshot "0.13.0"] 35 | [org.clojure/data.codec "0.2.0"]] 36 | :java-source-paths ["src"] 37 | :javac-options ["--release" "11"] 38 | :main jepsen.cli 39 | :plugins [[lein-localrepo "0.5.4"] 40 | [lein-codox "0.10.8"] 41 | [jonase/eastwood "0.3.10"]] 42 | :jvm-opts ["-Xmx32g" 43 | "-Djava.awt.headless=true" 44 | "-server"] 45 | :test-selectors {:default (fn [m] 46 | (not (or (:perf m) 47 | (:logging m) 48 | (:slow m)))) 49 | :quick (fn [m] 50 | (not (or (:perf m) 51 | (:integration m) 52 | (:logging m) 53 | (:slow m)))) 54 | :focus :focus 55 | :perf :perf 56 | :logging :logging 57 | :integration :integration} 58 | :codox {:output-path "doc/" 59 | :source-uri "https://github.com/jepsen-io/jepsen/blob/v{version}/jepsen/{filepath}#L{line}" 60 | :metadata {:doc/format :markdown}} 61 | :profiles {:uberjar {:aot :all} 62 | :dev {; experimenting with faster startup 63 | ;:aot [jepsen.core] 64 | :dependencies [[criterium "0.4.6"] 65 | [org.clojure/test.check "1.1.2"] 66 | [com.gfredericks/test.chuck "0.2.15"]] 67 | :jvm-opts ["-Xmx32g" 68 | "-server" 69 | "-XX:-OmitStackTraceInFastThrow"]}}) 70 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/generator_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.generator-test 2 | (:require [jepsen.generator [context :as ctx] 3 | [test :as gen.test]] 4 | [jepsen [generator :as gen] 5 | [history :as h] 6 | [independent :as independent] 7 | [util :as util]] 8 | [clojure [pprint :refer [pprint]] 9 | [test :refer :all]] 10 | [clj-commons.slingshot :refer [try+ throw+]]) 11 | (:import (io.lacuna.bifurcan IMap 12 | Map 13 | Set))) 14 | 15 | ;; Independent tests 16 | 17 | (deftest independent-sequential-test 18 | (is (= [[0 0 [:x 0]] 19 | [0 1 [:x 1]] 20 | [10 1 [:x 2]] 21 | [10 0 [:y 0]] 22 | [20 0 [:y 1]] 23 | [20 1 [:y 2]]] 24 | (->> (independent/sequential-generator 25 | [:x :y] 26 | (fn [k] 27 | (->> (range) 28 | (map (partial hash-map :type :invoke, :value)) 29 | (gen/limit 3)))) 30 | gen/clients 31 | gen.test/perfect 32 | (map (juxt :time :process :value)))))) 33 | 34 | (deftest independent-concurrent-test 35 | ; All 3 groups can concurrently execute the first 2 values from k0, k1, k2 36 | (is (= [[0 4 [:k2 :v0]] 37 | [0 5 [:k2 :v1]] 38 | [0 0 [:k0 :v0]] 39 | [0 3 [:k1 :v0]] 40 | [0 2 [:k1 :v1]] 41 | [0 1 [:k0 :v1]] 42 | 43 | ; Worker 1 in group 0 finishes k0 44 | [10 1 [:k0 :v2]] 45 | ; Worker 2 in group 1 finishes k1 46 | [10 2 [:k1 :v2]] 47 | ; Worker 3 in group 1 starts k3, since k1 is done 48 | [10 3 [:k3 :v0]] 49 | ; And worker 0 in group 0 starts k4, since k0 is done 50 | [10 0 [:k4 :v0]] 51 | ; And worker 5 in group 2 finishes k2 52 | [10 5 [:k2 :v2]] 53 | 54 | ; All keys have now started execution. Group 1 (workers 2 and 3) is 55 | ; free, but can't start a new key since there are none left pending. 56 | ; Worker 0 in group 0 can continue k4 57 | [20 0 [:k4 :v1]] 58 | ; Workers 2 and 3 in group 1 finish off k3 59 | [20 3 [:k3 :v1]] 60 | [20 2 [:k3 :v2]] 61 | ; Finally, process 1 in group 0 finishes k4 62 | [20 1 [:k4 :v2]]] 63 | (->> (independent/concurrent-generator 64 | 2 ; 2 threads per group 65 | [:k0 :k1 :k2 :k3 :k4] ; 5 keys 66 | (fn [k] 67 | (->> [:v0 :v1 :v2] ; Three values per key 68 | (map (partial hash-map :type :invoke, :value))))) 69 | (gen.test/perfect (gen.test/n+nemesis-context 6)) ; 3 groups of 2 threads each 70 | (map (juxt :time :process :value)))))) 71 | 72 | (deftest independent-deadlock-case 73 | (is (= [[0 0 :meow [0 nil]] 74 | [0 1 :meow [0 nil]] 75 | [10 1 :meow [1 nil]] 76 | [10 0 :meow [1 nil]] 77 | [20 1 :meow [2 nil]]] 78 | (->> (independent/concurrent-generator 79 | 2 80 | (range) 81 | (fn [k] (gen/each-thread {:f :meow}))) 82 | (gen/limit 5) 83 | gen/clients 84 | gen.test/perfect 85 | (map (juxt :time :process :f :value)))))) 86 | -------------------------------------------------------------------------------- /txn/src/jepsen/txn.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.txn 2 | "Manipulates transactions. Transactions are represented as a sequence of 3 | micro-operations (mops for short)." 4 | (:require [dom-top.core :refer [loopr]])) 5 | 6 | (defn reduce-mops 7 | "Takes a history of operations, where each operation op has a :value which is 8 | a transaction made up of [f k v] micro-ops. Runs a reduction over every 9 | micro-op, where the reduction function is of the form (f state op [f k v]). 10 | Saves you having to do endless nested reduces." 11 | [f init-state history] 12 | (reduce (fn op [state op] 13 | (reduce (fn mop [state mop] 14 | (f state op mop)) 15 | state 16 | (:value op))) 17 | init-state 18 | history)) 19 | 20 | (defn op-mops 21 | "A lazy sequence of all [op mop] pairs from a history." 22 | [history] 23 | (mapcat (fn [op] (map (fn [mop] [op mop]) (:value op))) history)) 24 | 25 | (defn reads 26 | "Given a transaction, returns a map of keys to sets of all values that 27 | transaction read." 28 | [txn] 29 | (loopr [reads (transient {})] 30 | [[f k v] txn] 31 | (if (= :r f) 32 | (let [vs (get reads k #{})] 33 | (recur (assoc! reads k (conj vs v)))) 34 | (recur reads)) 35 | (persistent! reads))) 36 | 37 | (defn writes 38 | "Given a transaction, returns a map of keys to sets of all values that 39 | transaction wrote." 40 | [txn] 41 | (loopr [writes (transient {})] 42 | [[f k v] txn] 43 | (if (= :w f) 44 | (let [vs (get writes k #{})] 45 | (recur (assoc! writes k (conj vs v)))) 46 | (recur writes)) 47 | (persistent! writes))) 48 | 49 | (defn ext-reads 50 | "Given a transaction, returns a map of keys to values for its external reads: 51 | values that transaction observed which it did not write itself." 52 | [txn] 53 | (loop [ext (transient {}) 54 | ignore? (transient #{}) 55 | txn txn] 56 | (if (seq txn) 57 | (let [[f k v] (first txn)] 58 | (recur (if (or (not= :r f) 59 | (ignore? k)) 60 | ext 61 | (assoc! ext k v)) 62 | (conj! ignore? k) 63 | (next txn))) 64 | (persistent! ext)))) 65 | 66 | (defn ext-writes 67 | "Given a transaction, returns the map of keys to values for its external 68 | writes: final values written by the txn." 69 | [txn] 70 | (loop [ext (transient {}) 71 | txn txn] 72 | (if (seq txn) 73 | (let [[f k v] (first txn)] 74 | (recur (if (= :r f) 75 | ext 76 | (assoc! ext k v)) 77 | (next txn))) 78 | (persistent! ext)))) 79 | 80 | (defn int-write-mops 81 | "Returns a map of keys to vectors of of all non-final write mops to that key." 82 | [txn] 83 | (loop [int (transient {}) 84 | txn txn] 85 | (if (seq txn) 86 | (let [[f k v :as mop] (first txn)] 87 | (recur (if (= :r f) 88 | int 89 | (let [writes (get int k [])] 90 | (assoc! int k (conj writes mop)))) 91 | (next txn))) 92 | ; All done; trim final writes. 93 | (->> int 94 | persistent! 95 | (keep (fn [[k vs]] 96 | (when (< 1 (count vs)) 97 | [k (subvec vs 0 (dec (count vs)))]))) 98 | (into {}))))) 99 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/control/docker.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.control.docker 2 | "The recommended way is to use SSH to setup and teardown databases. It's 3 | however sometimes conveniet to be able to setup and teardown the databases 4 | using `docker exec` and `docker cp` instead, which is what this namespace 5 | helps you do. 6 | 7 | Use at your own risk, this is an unsupported way of running Jepsen." 8 | (:require [clojure.string :as str] 9 | [clojure.java.shell :refer [sh]] 10 | [clj-commons.slingshot :refer [throw+]] 11 | [jepsen.control.core :as core] 12 | [jepsen.control :as c])) 13 | 14 | (defn resolve-container-id 15 | "Takes a host, e.g. `localhost:30404`, and resolves the Docker container id 16 | exposing that port. Due to a bug in Docker 17 | (https://github.com/moby/moby/pull/40442) this is more difficult than it 18 | should be." 19 | [host] 20 | (if-let [[_address port] (str/split host #":")] 21 | (let [ps (:out (sh "docker" "ps")) 22 | cid (-> (sh "awk" (str "/" port "/ { print $1 }") :in ps) 23 | :out 24 | str/trim-newline)] 25 | (if (re-matches #"[a-z0-9]{12}" cid) 26 | cid 27 | (throw+ {:type ::invalid-container-id, :container-id cid}))) 28 | (throw+ {:type ::invalid-host, :host host}))) 29 | 30 | (defn exec 31 | "Execute a shell command on a docker container." 32 | [container-id {:keys [cmd] :as opts}] 33 | (apply sh 34 | "docker" "exec" (c/escape container-id) 35 | "sh" "-c" cmd 36 | (if-let [in (:in opts)] 37 | [:in in] 38 | []))) 39 | 40 | (defn- path->container 41 | [container-id path] 42 | (str container-id ":" path)) 43 | 44 | (defn- unwrap-result 45 | "Throws when shell returned with nonzero exit status." 46 | [exc-type {:keys [exit] :as result}] 47 | (if (zero? exit) 48 | result 49 | (throw+ 50 | (assoc result :type exc-type) 51 | nil ; cause 52 | "Command exited with non-zero status %d:\nSTDOUT:\n%s\n\nSTDERR:\n%s" 53 | exit 54 | (:out result) 55 | (:err result)))) 56 | 57 | (defn cp-to 58 | "Copies files from the host to a container filesystem." 59 | [container-id local-paths remote-path] 60 | (doseq [local-path (flatten [local-paths])] 61 | (->> (sh 62 | "docker" "cp" 63 | (c/escape local-path) 64 | (c/escape (path->container container-id remote-path))) 65 | (unwrap-result ::copy-failed)))) 66 | 67 | (defn cp-from 68 | "Copies files from a container filesystem to the host." 69 | [container-id remote-paths local-path] 70 | (doseq [remote-path (flatten [remote-paths])] 71 | (->> (sh 72 | "docker" "cp" 73 | (c/escape (path->container container-id remote-path)) 74 | (c/escape local-path)) 75 | (unwrap-result ::copy-failed)))) 76 | 77 | (defrecord DockerRemote [container-id] 78 | core/Remote 79 | (connect [this conn-spec] 80 | (assoc this :container-id (resolve-container-id (:host conn-spec)))) 81 | (disconnect! [this] 82 | (dissoc this :container-id)) 83 | (execute! [this ctx action] 84 | (exec container-id action)) 85 | (upload! [this ctx local-paths remote-path _opts] 86 | (cp-to container-id local-paths remote-path)) 87 | (download! [this ctx remote-paths local-path _opts] 88 | (cp-from container-id remote-paths local-path))) 89 | 90 | (def docker 91 | "A remote that does things via `docker exec` and `docker cp`." 92 | (->DockerRemote nil)) 93 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/checker/clock.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.checker.clock 2 | "Helps analyze clock skew over time." 3 | (:require [clojure.core.reducers :as r] 4 | [clojure.java.io :as io] 5 | [clojure.tools.logging :refer [info warn]] 6 | [clojure.pprint :refer [pprint]] 7 | [clojure.string :as str] 8 | [jepsen [history :as h] 9 | [util :as util] 10 | [store :as store]] 11 | [jepsen.checker.perf :as perf] 12 | [gnuplot.core :as g])) 13 | 14 | (defn history->datasets 15 | "Takes a history and produces a map of nodes to sequences of [t offset] 16 | pairs, representing the changing clock offsets for that node over time." 17 | [history] 18 | (let [final-time (util/nanos->secs (:time (peek history)))] 19 | (->> history 20 | (h/filter :clock-offsets) 21 | (reduce (fn [series op] 22 | (let [t (util/nanos->secs (:time op))] 23 | (reduce (fn [series [node offset]] 24 | (let [s (get series node (transient [])) 25 | s (conj! s [t offset])] 26 | (assoc! series node s))) 27 | series 28 | (:clock-offsets op)))) 29 | (transient {})) 30 | persistent! 31 | (util/map-vals (fn seal [points] 32 | (-> points 33 | (conj! (assoc (nth points (dec (count points))) 34 | 0 final-time)) 35 | (persistent!))))))) 36 | 37 | (defn short-node-names 38 | "Takes a collection of node names, and maps them to shorter names by removing 39 | common trailing strings (e.g. common domains)." 40 | [nodes] 41 | (->> nodes 42 | (map #(str/split % #"\.")) 43 | (map reverse) 44 | util/drop-common-proper-prefix 45 | (map reverse) 46 | (map (partial str/join ".")))) 47 | 48 | (defn plot! 49 | "Plots clock offsets over time. Looks for any op with a :clock-offset field, 50 | which contains a (possible incomplete) map of nodes to offsets, in seconds. 51 | Plots those offsets over time." 52 | [test history opts] 53 | (when (seq history) 54 | ; If the history is empty, don't render anything. 55 | (let [datasets (history->datasets history) 56 | nodes (util/polysort (keys datasets)) 57 | node-names (short-node-names nodes) 58 | output-path (.getCanonicalPath (store/path! test (:subdirectory opts) 59 | "clock-skew.png")) 60 | plot {:preamble (concat (perf/preamble output-path) 61 | [[:set :title (str (:name test) 62 | " clock skew")] 63 | [:set :ylabel "Skew (s)"]]) 64 | :series (map (fn [node node-name] 65 | {:title node-name 66 | :with :steps 67 | :data (get datasets node)}) 68 | nodes 69 | node-names)}] 70 | (when (perf/has-data? plot) 71 | (-> plot 72 | (perf/without-empty-series) 73 | (perf/with-range) 74 | (perf/with-nemeses history (:nemeses (:plot test))) 75 | (perf/plot!))))) 76 | {:valid? true}) 77 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/tests/adya.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.tests.adya 2 | "Generators and checkers for tests of Adya's proscribed behaviors for 3 | weakly-consistent systems. See http://pmg.csail.mit.edu/papers/adya-phd.pdf" 4 | (:require [jepsen [client :as client] 5 | [checker :as checker] 6 | [generator :as gen] 7 | [independent :as independent]] 8 | [clojure.core.reducers :as r] 9 | [clojure.set :as set])) 10 | 11 | (defn g2-gen 12 | "With concurrent, unique keys, emits pairs of :insert ops of the form [key 13 | [a-id b-id]], where one txn has a-id and the other has b-id. a-id and b-id 14 | are globally unique. Only two insert ops are generated for any given key. 15 | Keys and ids are positive integers. 16 | 17 | G2 clients use two tables: 18 | 19 | create table a ( 20 | id int primary key, 21 | key int, 22 | value int 23 | ); 24 | create table b ( 25 | id int primary key, 26 | key int, 27 | value int 28 | ); 29 | 30 | G2 clients take operations like {:f :insert :value [key [a-id nil]]}, and in 31 | a single transaction, perform a read of tables a and b like so: 32 | 33 | select * from a where key = ? and value % 3 = 0 34 | select * from b where key = ? and value % 3 = 0 35 | 36 | and fail if either query returns more than zero rows. If both tables are 37 | empty, the client should insert a row like 38 | 39 | {:key key :id a-id :value 30} 40 | 41 | into table a, if a-id is present. If b-id is present, insert into table b 42 | instead. Iff the insert succeeds, return :type :ok with the operation value 43 | unchanged. 44 | 45 | We're looking to detect violations based on *predicates*; databases may 46 | prevent anti-dependency cycles with individual primary keys, but selects 47 | based on predicates might observe stale data. Clients should feel free to 48 | choose predicates and values in creative ways." 49 | [] 50 | (let [ids (atom 0)] 51 | (independent/concurrent-generator 52 | 2 53 | (range) 54 | (fn [k] 55 | [(gen/once (fn [_ _] 56 | {:type :invoke :f :insert :value [nil (swap! ids inc)]})) 57 | (gen/once (fn [_ _] 58 | {:type :invoke :f :insert :value [(swap! ids inc) nil]}))])))) 59 | 60 | (defn g2-checker 61 | "Verifies that at most one :insert completes successfully for any given key." 62 | [] 63 | (reify checker/Checker 64 | (check [this test history opts] 65 | ; There should be at most one successful insert for any given key 66 | (let [keys (reduce (fn [m op] 67 | (if (= :insert (:f op)) 68 | (let [k (key (:value op))] 69 | (if (= :ok (:type op)) 70 | (update m k (fnil inc 0)) 71 | (update m k #(or % 0)))) 72 | m)) 73 | {} 74 | history) 75 | insert-count (->> keys 76 | (filter (fn [[k cnt]] (pos? cnt))) 77 | count) 78 | illegal (->> keys 79 | (keep (fn [[k cnt :as pair]] 80 | (when (< 1 cnt) pair))) 81 | (into (sorted-map)))] 82 | {:valid? (empty? illegal) 83 | :key-count (count keys) 84 | :legal-count (- insert-count (count illegal)) 85 | :illegal-count (count illegal) 86 | :illegal illegal})))) 87 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/tests/causal_reverse_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.tests.causal-reverse-test 2 | (:require [jepsen.tests.causal-reverse :refer :all] 3 | [clojure.test :refer :all] 4 | [jepsen [checker :as checker] 5 | [history :as h]])) 6 | 7 | (defn invoke [process f value] {:type :invoke, :process process, :f f, :value value}) 8 | (defn ok [process f value] {:type :ok, :process process, :f f, :value value}) 9 | 10 | (deftest casusal-reverse-test 11 | (testing "Can validate sequential histories" 12 | (let [c (checker) 13 | valid (h/history 14 | [(invoke 0 :write 1) 15 | (ok 0 :write 1) 16 | (invoke 0 :write 2) 17 | (ok 0 :write 2) 18 | (invoke 0 :read nil) 19 | (ok 0 :read [1 2])]) 20 | one-without-two (h/history 21 | [(invoke 0 :write 1) 22 | (ok 0 :write 1) 23 | (invoke 0 :write 2) 24 | (ok 0 :write 2) 25 | (invoke 0 :read nil) 26 | (ok 0 :read [1])]) 27 | two-without-one (h/history 28 | [(invoke 0 :write 1) 29 | (ok 0 :write 1) 30 | (invoke 0 :write 2) 31 | (ok 0 :write 2) 32 | (invoke 0 :read nil) 33 | (ok 0 :read [2])]) 34 | bigger (h/history 35 | [(invoke 0 :write 1) 36 | (ok 0 :write 1) 37 | (invoke 0 :write 2) 38 | (ok 0 :write 2) 39 | (invoke 0 :write 3) 40 | (ok 0 :write 3) 41 | (invoke 0 :write 4) 42 | (ok 0 :write 4) 43 | (invoke 0 :write 5) 44 | (ok 0 :write 5) 45 | (invoke 0 :read nil) 46 | (ok 0 :read [1 2 3 4 5])])] 47 | (is (:valid? (checker/check c nil valid nil))) 48 | (is (:valid? (checker/check c nil one-without-two nil))) 49 | (is (not (:valid? (checker/check c nil two-without-one nil)))) 50 | (is (:valid? (checker/check c nil bigger nil))))) 51 | 52 | (testing "Can validate concurrent histories" 53 | (let [c (checker) 54 | concurrent1 (h/history 55 | [(invoke 0 :write 2) 56 | (invoke 0 :write 1) 57 | (ok 0 :write 1) 58 | (invoke 0 :read nil) 59 | (ok 0 :write 2) 60 | (ok 0 :read [1 2])]) 61 | concurrent2 (h/history 62 | [(invoke 0 :write 1) 63 | (invoke 0 :write 2) 64 | (ok 0 :write 1) 65 | (invoke 0 :read nil) 66 | (ok 0 :write 2) 67 | (ok 0 :read [2 1])])] 68 | (is (:valid? (checker/check (checker) nil concurrent1 nil))) 69 | (is (:valid? (checker/check (checker) nil concurrent2 nil))))) 70 | 71 | ;; TODO Expand the checker to catch this sequential insert violation. 72 | #_(testing "Can detect reverse causal anomaly" 73 | (let [c (checker) 74 | reverse-causal-read (h/history [(invoke 0 :write 1) 75 | (ok 0 :write 1) 76 | (invoke 0 :write 2) 77 | (ok 0 :write 2) 78 | (invoke 0 :read nil) 79 | (ok 0 :read [2 1])])] 80 | (is (not (:valid? (checker/check c nil reverse-causal-read nil))))))) 81 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/fs_cache_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.fs-cache-test 2 | (:require [clojure [test :refer :all]] 3 | [clojure.java.io :as io] 4 | [clojure.tools.logging :refer [info warn]] 5 | [jepsen [control :as c] 6 | [common-test :refer [quiet-logging]] 7 | [fs-cache :as cache]] 8 | [jepsen.control [sshj :as sshj] 9 | [util :as cu]])) 10 | 11 | (use-fixtures :once quiet-logging) 12 | (use-fixtures :each (fn [t] 13 | (cache/clear!) 14 | (t) 15 | (cache/clear!))) 16 | 17 | (deftest fs-path-test 18 | (is (= ["dk_dog" 19 | "ds_meow\\/woof\\\\evil" 20 | "dl_3" 21 | "db_false" 22 | "db_true" 23 | "dm_11" 24 | "fn_12"] 25 | (cache/fs-path [:cat/dog 26 | "meow/woof\\evil" 27 | 3 28 | false 29 | true 30 | 11M 31 | 12N])))) 32 | 33 | (deftest file-test 34 | (testing "empty" 35 | (is (thrown? IllegalArgumentException (cache/file [])))) 36 | 37 | (testing "single path" 38 | (is (= (io/file "/tmp/jepsen/cache/fk_foo") 39 | (cache/file [:foo])))) 40 | 41 | (testing "multiple paths" 42 | (is (= (io/file "/tmp/jepsen/cache/dk_foo/fs_bar") 43 | (cache/file [:foo "bar"]))))) 44 | 45 | (deftest file-test 46 | (let [f (io/file "/tmp/jepsen/cache-test") 47 | contents "hello there" 48 | path [:foo]] 49 | (spit f contents) 50 | 51 | (testing "not cached" 52 | (is (not (cache/cached? path))) 53 | (is (nil? (cache/load-file path)))) 54 | 55 | (testing "cached" 56 | (cache/save-file! f path) 57 | (is (cache/cached? path))) 58 | 59 | (testing "read as file" 60 | (is (= contents (slurp (cache/load-file path))))) 61 | 62 | (testing "read as string" 63 | (is (= contents (cache/load-string path)))))) 64 | 65 | (deftest string-test 66 | (let [contents "foo\nbar" 67 | path [1 2 3]] 68 | (testing "not cached" 69 | (is (not (cache/cached? path))) 70 | (is (nil? (cache/load-string path)))) 71 | 72 | (testing "cached" 73 | (cache/save-string! contents path) 74 | (is (cache/cached? path)) 75 | (is (= contents (cache/load-string path)))))) 76 | 77 | (deftest edn-test 78 | (let [contents {:foo [1 2N "three" 'four]} 79 | path ["weirdly" :specific true]] 80 | (testing "not cached" 81 | (is (not (cache/cached? path))) 82 | (is (nil? (cache/load-edn path)))) 83 | 84 | (testing "cached" 85 | (cache/save-edn! contents path) 86 | (is (cache/cached? path)) 87 | (is (= contents (cache/load-edn path)))))) 88 | 89 | (defmacro on-n1 90 | "Run form with an SSH connection to n1" 91 | [& body] 92 | ; Right now sshj is faster but not yet the default, so we 93 | ; set it explicitly. 94 | `(c/with-remote (sshj/remote) 95 | (c/with-ssh {} 96 | (c/on "n1" 97 | ~@body)))) 98 | 99 | (deftest ^:integration remote-test 100 | (on-n1 101 | (let [contents "foo\nbar" 102 | local-path [:local] 103 | remote-path "/tmp/jepsen/remote"] 104 | (c/exec :rm :-rf "/tmp/jepsen") 105 | 106 | (testing "upload" 107 | (cache/save-string! contents local-path) 108 | (cache/deploy-remote! local-path remote-path) 109 | (is (= contents (c/exec :cat remote-path)))) 110 | 111 | (testing "download" 112 | (cache/clear! local-path) 113 | (is (not (cache/cached? local-path))) 114 | (cache/save-remote! remote-path local-path) 115 | (is (cache/cached? local-path)) 116 | (is (= contents (cache/load-string local-path))))))) 117 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/control/k8s.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.control.k8s 2 | "The recommended way is to use SSH to setup and teardown databases. 3 | It's however sometimes conveniet to be able to setup and teardown 4 | the databases using `kubectl` instead, which is what this namespace 5 | helps you do. Use at your own risk, this is an unsupported way 6 | of running Jepsen." 7 | (:require [clojure.java.shell :refer [sh]] 8 | [clj-commons.slingshot :refer [throw+]] 9 | [jepsen.control.core :as core] 10 | [jepsen.control :as c] 11 | [clojure.string :refer [split-lines trim]] 12 | [clojure.tools.logging :refer [info]])) 13 | 14 | (defn exec 15 | "Execute a shell command on a pod." 16 | [context namespace pod-name {:keys [cmd] :as opts}] 17 | (apply sh 18 | "kubectl" 19 | "exec" 20 | (c/escape pod-name) 21 | context 22 | namespace 23 | "--" 24 | "sh" 25 | "-c" 26 | cmd 27 | (if-let [in (:in opts)] 28 | [:in in] 29 | []))) 30 | 31 | (defn- path->pod 32 | [pod-name path] 33 | (str pod-name ":" path)) 34 | 35 | (defn- unwrap-result 36 | "Throws when shell returned with nonzero exit status." 37 | [exc-type {:keys [exit] :as result}] 38 | (if (zero? exit) 39 | result 40 | (throw+ 41 | (assoc result :type exc-type) 42 | nil ; cause 43 | "Command exited with non-zero status %d:\nSTDOUT:\n%s\n\nSTDERR:\n%s" 44 | exit 45 | (:out result) 46 | (:err result)))) 47 | 48 | (defn cp-to 49 | "Copies files from the host to a pod filesystem." 50 | [context namespace pod-name local-paths remote-path] 51 | (doseq [local-path (flatten [local-paths])] 52 | (->> (sh 53 | "kubectl" 54 | "cp" 55 | (c/escape local-path) 56 | (c/escape (path->pod pod-name remote-path)) 57 | context 58 | namespace) 59 | (unwrap-result ::copy-failed)))) 60 | 61 | (defn cp-from 62 | "Copies files from a pod filesystem to the host." 63 | [context namespace pod-name remote-paths local-path] 64 | (doseq [remote-path (flatten [remote-paths])] 65 | (->> (sh 66 | "kubectl" 67 | "cp" 68 | (c/escape (path->pod pod-name remote-path)) 69 | (c/escape local-path) 70 | context 71 | namespace) 72 | (unwrap-result ::copy-failed)))) 73 | 74 | (defn- or-parameter 75 | "A helper function that encodes a parameter if present" 76 | [p, v] 77 | (if v (str "--" p "=" (c/escape v)) "")) 78 | 79 | (defrecord K8sRemote [context namespace] 80 | core/Remote 81 | (connect [this conn-spec] 82 | (assoc this 83 | :context (or-parameter "context" context) 84 | :namespace (or-parameter "namespace" namespace) 85 | :pod-name (:host conn-spec))) 86 | (disconnect! [this] 87 | (dissoc this :context :namespace :pod-name)) 88 | (execute! [this ctx action] 89 | (exec context namespace (:pod-name this) action)) 90 | (upload! [this ctx local-paths remote-path _opts] 91 | (cp-to context namespace (:pod-name this) local-paths remote-path)) 92 | (download! [this ctx remote-paths local-path _opts] 93 | (cp-from context namespace (:pod-name this) remote-paths local-path))) 94 | 95 | (defn k8s 96 | "Returns a remote that does things via `kubectl exec` and `kubectl cp`, in the default context and namespacd." 97 | [] 98 | (->K8sRemote nil nil)) 99 | 100 | (defn list-pods 101 | "A helper function to list all pods in a given context/namespace" 102 | [context namespace] 103 | (let [context (or-parameter "context" context) 104 | namespace (or-parameter "namespace" namespace) 105 | res (sh 106 | "sh" 107 | "-c" 108 | (str "kubectl get pods " context " " namespace 109 | " | tail -n +2 " 110 | " | awk '{print $1}'"))] 111 | (lazy-seq (split-lines (trim (:out res)))))) 112 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/lazyfs_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.lazyfs-test 2 | "Tests for the lazyfs write-losing filesystem" 3 | (:require [clojure [data :refer [diff]] 4 | [pprint :refer [pprint]] 5 | [string :as str] 6 | [test :refer :all]] 7 | [clojure.java.io :as io] 8 | [clojure.test.check [clojure-test :refer :all] 9 | [generators :as g] 10 | [properties :as prop] 11 | [results :refer [Result]]] 12 | [clojure.tools.logging :refer [info warn]] 13 | [dom-top.core :refer [loopr]] 14 | [jepsen [checker :as checker] 15 | [client :as client] 16 | [common-test :refer [quiet-logging]] 17 | [control :as c] 18 | [core :as jepsen] 19 | [db :as db] 20 | [generator :as gen] 21 | [lazyfs :as lazyfs] 22 | [nemesis :as nem] 23 | [os :as os] 24 | [tests :as tests] 25 | [util :as util :refer [pprint-str 26 | timeout]]] 27 | [jepsen.control [core :as cc] 28 | [util :as cu] 29 | [sshj :as sshj]] 30 | [jepsen.os.debian :as debian] 31 | [clj-commons.slingshot :refer [try+ throw+]])) 32 | 33 | ; (use-fixtures :once quiet-logging) 34 | 35 | (defrecord FileSetClient [dir file node] 36 | client/Client 37 | (open! [this test node] 38 | (assoc this 39 | :file (str dir "/set") 40 | :node node)) 41 | 42 | (setup! [this test] 43 | this) 44 | 45 | (invoke! [this test op] 46 | (-> (c/on-nodes test [node] 47 | (fn [_ _] 48 | (case (:f op) 49 | :add (do (c/exec :echo (str (:value op) " ") :>> file) 50 | (assoc op :type :ok)) 51 | :read (let [vals (-> (c/exec :cat file) 52 | (str/split #"\s+") 53 | (->> (remove #{""}) 54 | (mapv parse-long)))] 55 | (assoc op :type :ok, :value vals))))) 56 | (get node))) 57 | 58 | (teardown! [this test]) 59 | 60 | (close! [this test])) 61 | 62 | (defn file-set-client 63 | "Writes a set to a single file on one node, in the given directory." 64 | [dir] 65 | (map->FileSetClient {:dir dir})) 66 | 67 | (deftest ^:integration file-set-test 68 | (let [dir "/tmp/jepsen/file-set-test" 69 | lazyfs (lazyfs/lazyfs dir) 70 | test (assoc tests/noop-test 71 | :name "lazyfs file set" 72 | :os debian/os 73 | :db (lazyfs/db lazyfs) 74 | :client (file-set-client dir) 75 | :nemesis (lazyfs/nemesis lazyfs) 76 | :generator (gen/phases 77 | (->> (range) 78 | (map (fn [x] {:f :add, :value x})) 79 | (gen/delay 1/100) 80 | (gen/nemesis 81 | (->> {:type :info 82 | :f :lose-unfsynced-writes 83 | :value ["n1"]} 84 | repeat 85 | (gen/delay 1/2))) 86 | (gen/time-limit 5)) 87 | (gen/clients {:f :read})) 88 | :checker (checker/set) 89 | :nodes ["n1"]) 90 | test (jepsen/run! test)] 91 | ;(pprint (:history test)) 92 | ;(pprint (:results test)) 93 | (is (false? (:valid? (:results test)))) 94 | (is (pos? (:ok-count (:results test)))) 95 | (is (pos? (:lost-count (:results test)))))) 96 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/control/clj_ssh.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.control.clj-ssh 2 | "A CLJ-SSH powered implementation of the Remote protocol." 3 | (:require [clojure.tools.logging :refer [info warn]] 4 | [clj-ssh.ssh :as ssh] 5 | [jepsen.control [core :as core] 6 | [retry :as retry] 7 | [scp :as scp]] 8 | [clj-commons.slingshot :refer [try+ throw+]]) 9 | (:import (java.util.concurrent Semaphore))) 10 | 11 | (def clj-ssh-agent 12 | "Acquiring an SSH agent is expensive and involves a global lock; we save the 13 | agent and re-use it to speed things up." 14 | (delay (ssh/ssh-agent {}))) 15 | 16 | (defn clj-ssh-session 17 | "Opens a raw session to the given connection spec" 18 | [conn-spec] 19 | (let [agent @clj-ssh-agent 20 | _ (when-let [key-path (:private-key-path conn-spec)] 21 | (ssh/add-identity agent {:private-key-path key-path}))] 22 | (doto (ssh/session agent 23 | (:host conn-spec) 24 | (select-keys conn-spec 25 | [:username 26 | :password 27 | :port 28 | :strict-host-key-checking])) 29 | (ssh/connect)))) 30 | 31 | (defmacro with-errors 32 | "Takes a conn spec, a context map, and a body. Evals body, remapping clj-ssh 33 | exceptions to :type :jepsen.control/ssh-failed." 34 | [conn context & body] 35 | `(try 36 | ~@body 37 | (catch com.jcraft.jsch.JSchException e# 38 | (if (or (= "session is down" (.getMessage e#)) 39 | (= "Packet corrupt" (.getMessage e#))) 40 | (throw+ (merge ~conn ~context {:type :jepsen.control/ssh-failed})) 41 | (throw e#))))) 42 | 43 | ; TODO: pull out dummy logic into its own remote 44 | (defrecord Remote [concurrency-limit 45 | conn-spec 46 | session 47 | ^Semaphore semaphore] 48 | core/Remote 49 | (connect [this conn-spec] 50 | (assert (map? conn-spec) 51 | (str "Expected a map for conn-spec, not a hostname as a string. Received: " 52 | (pr-str conn-spec))) 53 | (assoc this 54 | :conn-spec conn-spec 55 | :session (if (:dummy conn-spec) 56 | {:dummy true} 57 | (try+ 58 | (clj-ssh-session conn-spec) 59 | (catch com.jcraft.jsch.JSchException _ 60 | (throw+ (merge conn-spec 61 | {:type :jepsen.control/session-error 62 | :message "Error opening SSH session. Verify username, password, and node hostnames are correct."}))))) 63 | :semaphore (Semaphore. concurrency-limit true))) 64 | 65 | (disconnect! [_] 66 | (when-not (:dummy session) (ssh/disconnect session))) 67 | 68 | (execute! [_ ctx action] 69 | (with-errors conn-spec ctx 70 | (when-not (:dummy session) 71 | (.acquire semaphore) 72 | (try 73 | (ssh/ssh session action) 74 | (finally 75 | (.release semaphore)))))) 76 | 77 | (upload! [_ ctx local-paths remote-path _opts] 78 | (with-errors conn-spec ctx 79 | (when-not (:dummy session) 80 | (apply ssh/scp-to session local-paths remote-path rest)))) 81 | 82 | (download! [_ ctx remote-paths local-path _opts] 83 | (with-errors conn-spec ctx 84 | (when-not (:dummy session) 85 | (apply ssh/scp-from session remote-paths local-path rest))))) 86 | 87 | (def concurrency-limit 88 | "OpenSSH has a standard limit of 10 concurrent channels per connection. 89 | However, commands run in quick succession with 10 concurrent *also* seem to 90 | blow out the channel limit--perhaps there's an asynchronous channel teardown 91 | process. We set the limit a bit lower here. This is experimentally determined 92 | for clj-ssh by running jepsen.control-test's integration test... " 93 | 8) 94 | 95 | (defn remote 96 | "A remote that does things via clj-ssh." 97 | [] 98 | (-> (Remote. concurrency-limit nil nil nil) 99 | ; We *can* use our own SCP, but shelling out is faster. 100 | scp/remote 101 | retry/remote)) 102 | -------------------------------------------------------------------------------- /antithesis/README.md: -------------------------------------------------------------------------------- 1 | # jepsen.antithesis 2 | 3 | This library supports running Jepsen tests inside Antithesis environments. It 4 | provides entropy, lifecycle hooks, and assertions. 5 | 6 | ## Installation 7 | 8 | [![Clojars Project](https://img.shields.io/clojars/v/io.jepsen/antithesis.svg)](https://clojars.org/io.jepsen/antithesis) 9 | 10 | From Clojars, as usual. Note that the Antithesis SDK pulls in an ancient 11 | version of Jackson and *needs it*, so in your `project.clj`, you'll likely want 12 | to prevent other dependencies from relying on Jackson: 13 | 14 | ```clj 15 | :dependencies [... 16 | [io.jepsen/antithesis "0.1.0"] 17 | [jepsen "0.3.10" 18 | :exclusions [com.fasterxml.jackson.core/jackson-databind 19 | com.fasterxml.jackson.core/jackson-annotations 20 | com.fasterxml.jackson.core/jackson-core]]] 21 | 22 | ``` 23 | 24 | ## Usage 25 | 26 | The main namespace is [`jepsen.antithesis`](src/jepsen/antithesis.clj). There 27 | are several things you can do to integrate your test into Antithesis. 28 | 29 | ### Randomness 30 | 31 | First, wrap the entire program in `(antithesis/with-rng ...)`. This does 32 | nothing in ordinary environments, but in Antithesis, it replaces the 33 | jepsen.random RNG with the Antithesis SDK's entropy source. 34 | 35 | ### Wrapping Tests 36 | 37 | Second, wrap the entire test map with `(antithesis/test test)`. In an 38 | Antithesis run, this disables the OS, DB, and SSH connections. 39 | 40 | ## Clients 41 | 42 | Wrap your client in `(antithesis/client your-client)`. This client informs 43 | Antithesis that the setup is complete, and makes assertions about each 44 | invocation and completion. 45 | 46 | ## Checker 47 | 48 | You can either make assertions (see below) by hand inside your checkers, or you 49 | can wrap an existing checker in `(antithesis/checker "some name" checker)`. 50 | This asserts that the checker's results are always `:valid? true`. You can also 51 | use `antithesis/checker+` to traverse a tree of checkers, wrapping each one 52 | with assertions. 53 | 54 | ## Generator 55 | 56 | Instead of a time limit, you can limit your generator with something like: 57 | 58 | ```clj 59 | (if (antithesis/antithesis?) 60 | (antithesis/early-termination-generator 61 | {:interval 100} 62 | my-gen) 63 | (gen/time-limit ... my-gen)) 64 | ``` 65 | 66 | This early-termination-generator flips a coin every 100 operations, deciding 67 | whether to continue. This allows Antithesis to perform some long runs and some 68 | short ones. I'm not totally sure whether this is a good idea yet, but it does 69 | seem to get us to much shorter reproducible histories. 70 | 71 | ## Lifecycle 72 | 73 | If you'd like to manage the lifecycle manually, you can Call `setup-complete!` 74 | once the test is ready to begin--for instance, at the end of `Client/setup!`. 75 | Call `event!` to signal interesting things have happened. 76 | 77 | ## Assertions 78 | 79 | Assertions begin with `assert-`, and take an expression, a message, and data 80 | to include if the assertion fails. For instance: 81 | 82 | ```clj 83 | (assert-always! (not (db-corrupted?)) 84 | "DB corrupted" {:db "foo"}) 85 | ``` 86 | 87 | Ideally, you want to do these *during* the test run, so Antithesis can fail 88 | fast. Many checks can only be done with the full history, by the checker; for 89 | these, assert test validity in the checker itself: 90 | 91 | ```clj 92 | (defrecord MyChecker [] 93 | (check [_ test history opts] 94 | ... 95 | (a/assert-always (true? valid?) "checker valid" a) 96 | {:valid? valid? ...})) 97 | ``` 98 | 99 | ## License 100 | 101 | Copyright © Jepsen, LLC 102 | 103 | This program and the accompanying materials are made available under the 104 | terms of the Eclipse Public License 2.0 which is available at 105 | https://www.eclipse.org/legal/epl-2.0. 106 | 107 | This Source Code may also be made available under the following Secondary 108 | Licenses when the conditions for such availability set forth in the Eclipse 109 | Public License, v. 2.0 are satisfied: GNU General Public License as published by 110 | the Free Software Foundation, either version 2 of the License, or (at your 111 | option) any later version, with the GNU Classpath Exception which is available 112 | at https://www.gnu.org/software/classpath/license.html. 113 | -------------------------------------------------------------------------------- /generator/src/jepsen/generator/translation_table.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.generator.translation-table 2 | "We burn a lot of time in hashcode and map manipulation for thread names, 3 | which are mostly integers 0...n, but sometimes non-integer names like 4 | :nemesis. It's nice to be able to represent thread state internally as purely 5 | integers. To do this, we compute a one-time translation table which lets us 6 | map those names to integers and vice-versa." 7 | (:require [clojure.core.protocols :refer [Datafiable]] 8 | [clojure [datafy :refer [datafy]]] 9 | [dom-top.core :refer [loopr]] 10 | [potemkin :refer [def-map-type definterface+]]) 11 | (:import (io.lacuna.bifurcan ISet 12 | IMap 13 | Map 14 | Set) 15 | (java.util BitSet))) 16 | 17 | (deftype TranslationTable 18 | [; Number of numeric threads 19 | ^int int-thread-count 20 | ; Array of all threads which *aren't* integers; e.g. :nemesis 21 | ^objects named-threads 22 | ; Map of named threads to their indices 23 | ^IMap named-thread->index] 24 | 25 | Datafiable 26 | (datafy [this] 27 | {:int-thread-count int-thread-count 28 | :named-threads (vec named-threads) 29 | :named-thread->index (datafy named-thread->index)})) 30 | 31 | (defn translation-table 32 | "Takes a number of integer threads and a collection of named threads, and 33 | computes a translation table." 34 | [int-thread-count named-threads] 35 | (let [named-threads-array (object-array (count named-threads))] 36 | (loopr [^IMap named-thread->index (.linear Map/EMPTY) 37 | i 0] 38 | [thread named-threads] 39 | (do (aset named-threads-array i thread) 40 | (recur (.put named-thread->index thread (int i)) 41 | (inc i))) 42 | (TranslationTable. int-thread-count 43 | named-threads-array 44 | (.forked named-thread->index))))) 45 | 46 | (defn all-names 47 | "A sequence of all names in the translation table, in the exact order of 48 | thread indices. Index 0's name comes first, then 1, and so on." 49 | [^TranslationTable translation-table] 50 | (concat (range (.int-thread-count translation-table)) 51 | (.named-threads translation-table))) 52 | 53 | (defn thread-count 54 | "How many threads in a translation table in all?" 55 | ^long [^TranslationTable translation-table] 56 | (let [^objects named-threads (.named-threads translation-table)] 57 | (+ (.int-thread-count translation-table) 58 | (alength named-threads)))) 59 | 60 | (defn name->index 61 | "Turns a thread name (e.g. 0, 5, or :nemesis) into a primitive int." 62 | ^long [^TranslationTable translation-table thread-name] 63 | (if (integer? thread-name) 64 | thread-name 65 | (let [^IMap m (.named-thread->index translation-table) 66 | ; We're not doing bounds checks but we DO want this to blow up 67 | ; obviously 68 | i (.get m thread-name Long/MIN_VALUE)] 69 | (+ (.int-thread-count translation-table))))) 70 | 71 | (defn index->name 72 | "Turns a thread index (an int) into a thread name (e.g. 0, 5, or :nemesis)." 73 | [^TranslationTable translation-table ^long thread-index] 74 | (let [itc (.int-thread-count translation-table)] 75 | (if (< thread-index itc) 76 | thread-index 77 | (aget ^objects (.named-threads translation-table) (- thread-index itc))))) 78 | 79 | (defn ^ISet indices->names 80 | "Takes a translation table and a BitSet of thread indices. Constructs a 81 | Bifurcan ISet out of those threads." 82 | [translation-table ^BitSet indices] 83 | (loop [i 0 84 | ^ISet names (.linear Set/EMPTY)] 85 | (let [i' (.nextSetBit indices i)] 86 | (if (= i' -1) 87 | (.forked names) 88 | (recur (inc i') 89 | (.add names (index->name translation-table i'))))))) 90 | 91 | (defn ^BitSet names->indices 92 | "Takes a translation table and a collection of thread names. Constructs a 93 | BitSet of those thread indices." 94 | [translation-table names] 95 | (let [bs (BitSet. (count names))] 96 | (loopr [] 97 | [name names] 98 | (do (.set bs (name->index translation-table name)) 99 | (recur))) 100 | bs)) 101 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/db/watchdog_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.db.watchdog-test 2 | (:require [clojure [pprint :refer [pprint]] 3 | [test :refer :all]] 4 | [jepsen [client :as client] 5 | [common-test :refer [quiet-logging]] 6 | [control :as c] 7 | [core :as jepsen] 8 | [db :as db] 9 | [generator :as gen] 10 | [history :as h] 11 | [nemesis :as n] 12 | [tests :refer [noop-test]]] 13 | [jepsen.control [util :as cu]] 14 | [jepsen.db.watchdog :as w] 15 | [clj-commons.slingshot :refer [try+ throw+]])) 16 | 17 | (use-fixtures :once quiet-logging) 18 | 19 | ; This DB likes to crash all the time 20 | (defrecord CrashDB [] 21 | db/DB 22 | (setup! [this test node] 23 | (db/start! this test node)) 24 | 25 | (teardown! [this test node] 26 | (db/kill! this test node)) 27 | 28 | db/Kill 29 | (kill! [_ test node] 30 | (cu/grepkill! "sleep")) 31 | 32 | (start! [_ test node] 33 | (cu/start-daemon! {:chdir "/tmp/jepsen" 34 | :pidfile "/tmp/jepsen/watchdog-test.pid" 35 | :logfile "/tmp/jepsen/watchdog-test.log"} 36 | "/usr/bin/sleep" 3)) 37 | 38 | db/LogFiles 39 | (log-files [_ _ _]) 40 | 41 | db/Primary 42 | (primaries [_ _]) 43 | (setup-primary! [_ _ _])) 44 | 45 | (defn running? 46 | "Is the DB running?" 47 | [test node] 48 | (try+ (c/exec :pgrep "sleep") 49 | true 50 | (catch [:type :jepsen.control/nonzero-exit] _ 51 | false))) 52 | 53 | ; Just looks to see if it's running or not 54 | (defrecord Client [] 55 | client/Client 56 | (open! [this test node] 57 | this) 58 | 59 | (setup! [_ _]) 60 | 61 | (invoke! [_ test op] 62 | (let [r (c/on-nodes test running?) 63 | ; We're going to assume every node runs on the same schedule, at least at our granularity. 64 | _ (assert (apply = (vals r))) 65 | r (val (first r))] 66 | (assoc op :type :ok, :value r))) 67 | 68 | (teardown! [_ _]) 69 | 70 | (close! [_ _])) 71 | 72 | ; No way around this being slow, we actually have to run stuff on real nodes 73 | ; and wait for it to die. 74 | (deftest ^:slow ^:integration watchdog-test 75 | (let [; We poll every second to see if things are running 76 | gen (->> (gen/repeat {:f :running}) 77 | (gen/delay 1) 78 | ; We wait five seconds, kill everyone, wait five more, and restart. 79 | (gen/nemesis 80 | [(gen/sleep 5) 81 | {:type :info, :f :start} 82 | (gen/sleep 5) 83 | {:type :info, :f :stop}]) 84 | (gen/time-limit 15)) 85 | ; If you remove the w/db wrapper, the test will fail 86 | db (w/db {:running? running?} 87 | (CrashDB.)) 88 | test (assoc noop-test 89 | :nodes ["n1"] 90 | :name "watchdog-test" 91 | :db db 92 | :client (Client.) 93 | :nemesis (n/node-start-stopper identity 94 | (partial db/kill! db) 95 | (partial db/start! db)) 96 | :generator gen) 97 | test' (jepsen/run! test) 98 | ; Project down history 99 | h (->> (:history test') 100 | (h/remove h/invoke?) 101 | (h/map :value) 102 | (remove nil?)) 103 | ; Divide the polls into three zones: the fresh cluster, once killed, 104 | ; and once restarted. 105 | [fresh _ dead _ restarted] (partition-by map? h)] 106 | 107 | ;(pprint fresh) 108 | ;(pprint dead) 109 | ;(pprint restarted) 110 | 111 | (testing "fresh" 112 | ; Initially, we should be running. 113 | (is (first fresh)) 114 | ; But even though we die, we should be restarted and running at the end. 115 | (is (last fresh))) 116 | 117 | (testing "dead" 118 | ; We should never run during this part. We might race during the first op though. 119 | (is (every? false? (next dead)))) 120 | 121 | (testing "restarted" 122 | ; Same deal, we should be running at the start and end. 123 | (is (first fresh)) 124 | (is (last fresh))) 125 | )) 126 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/store_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.store-test 2 | (:refer-clojure :exclude [load test]) 3 | (:use clojure.test) 4 | (:require [clojure.data.fressian :as fress] 5 | [clojure.string :as str] 6 | [fipp.edn :refer [pprint]] 7 | [jepsen [common-test :refer [quiet-logging]] 8 | [core :as core] 9 | [core-test :as core-test] 10 | [generator :as gen] 11 | [history :as history :refer [op]] 12 | [store :refer :all] 13 | [tests :refer [noop-test]]] 14 | [jepsen.store [format :as store.format] 15 | [fressian :as store.fressian]] 16 | [multiset.core :as multiset]) 17 | (:import (org.fressian.handlers WriteHandler ReadHandler))) 18 | 19 | (use-fixtures :once quiet-logging) 20 | 21 | (defrecord Kitten [fuzz mew]) 22 | 23 | (def base-test (assoc noop-test 24 | :pure-generators true 25 | :name "store-test" 26 | :generator (->> [{:f :trivial}] 27 | gen/clients) 28 | :record (Kitten. "fluffy" "smol") 29 | :multiset (into (multiset/multiset) 30 | [1 1 2 3 5 8]) 31 | :nil nil 32 | :boolean false 33 | :long 1 34 | :double 1.5 35 | :rational 5/7 36 | :bignum 123M 37 | :string "foo" 38 | :atom ["blah"] 39 | :vec [1 2 3] 40 | :seq (map inc [1 2 3]) 41 | :cons (cons 1 (cons 2 nil)) 42 | :set #{1 2 3} 43 | :map {:a 1 :b 2} 44 | :ops [(op {:time 3, :index 4, :process :nemesis, :f 45 | :foo, :value [:hi :there]})] 46 | :sorted-map (sorted-map 1 :x 2 :y) 47 | :plot {:nemeses 48 | #{{:name "pause pd", 49 | :color "#C5A0E9", 50 | :start #{:pause-pd}, 51 | :stop #{:resume-pd}}}})) 52 | 53 | (defn fr 54 | "Fressian roundtrip" 55 | [x] 56 | (let [b (fress/write x :handlers write-handlers) 57 | ;_ (hexdump/print-dump (.array b)) 58 | x' (with-open [in (fress/to-input-stream b) 59 | r (store.fressian/reader in)] 60 | (fress/read-object r))] 61 | x')) 62 | 63 | (deftest fressian-test 64 | (are [x] (= x (fr x)) 65 | #{1 2 3} 66 | [#{5 6} 67 | #{:foo}])) 68 | 69 | (deftest fressian-vector-test 70 | ; Make sure we decode these as vecs, not arraylists. 71 | (is (vector? (fr []))) 72 | (is (vector? (fr [1]))) 73 | (is (vector? (fr [:x :y]))) 74 | (is (vector? (:foo (fr {:foo [:x :y]}))))) 75 | 76 | (deftest ^:integration roundtrip-test 77 | (let [name (:name base-test) 78 | _ (delete! name) 79 | t (-> base-test 80 | core/run!) 81 | ; At this juncture we've run the test, and the history should be 82 | ; written. 83 | t' (load t) 84 | _ (is (= (:history t) (:history t'))) 85 | _ (is (instance? jepsen.history.Op (first (:history t)))) 86 | _ (is (instance? jepsen.history.Op (first (:history t')))) 87 | 88 | ; Now we're going to rewrite the results, adding a kitten 89 | [t serialized-t] 90 | (with-handle [t t] 91 | (let [t (-> t 92 | (assoc-in [:results :kitten] (Kitten. "hi" "there")) 93 | save-2!) 94 | serialized-t (dissoc t :db :os :net :client :checker :nemesis 95 | :generator :model :remote :store)] 96 | [t serialized-t])) 97 | ts (tests name) 98 | [time t'] (first ts)] 99 | (is (= 1 (count ts))) 100 | (is (string? time)) 101 | 102 | (testing "generic test load" 103 | (is (= serialized-t @t'))) 104 | (testing "test.jepsen" 105 | (is (= serialized-t (load-jepsen-file (jepsen-file t))))) 106 | (testing "load-results" 107 | (is (= (:results t) (load-results name time)))) 108 | (testing "results.edn" 109 | (is (= (:results t) (load-results-edn t)))))) 110 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/os/smartos.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.os.smartos 2 | "Common tasks for SmartOS boxes." 3 | (:require [clojure.set :as set] 4 | [clojure.tools.logging :refer [info]] 5 | [jepsen.util :refer [meh]] 6 | [jepsen.os :as os] 7 | [jepsen.control :as c] 8 | [jepsen.control.util :as cu] 9 | [jepsen.net :as net] 10 | [clojure.string :as str])) 11 | 12 | (defn setup-hostfile! 13 | "Makes sure the hostfile has a loopback entry for the local hostname" 14 | [] 15 | (let [name (c/exec :hostname) 16 | hosts (c/exec :cat "/etc/hosts") 17 | hosts' (->> hosts 18 | str/split-lines 19 | (map (fn [line] 20 | (if (and (re-find #"^127\.0\.0\.1\t" line) 21 | (not (re-find (re-pattern name) line))) 22 | (str line " " name) 23 | line))) 24 | (str/join "\n"))] 25 | (c/su (c/exec :echo hosts' :> "/etc/hosts")))) 26 | 27 | (defn time-since-last-update 28 | "When did we last run a pkgin update, in seconds ago" 29 | [] 30 | (- (Long/parseLong (c/exec :date "+%s")) 31 | (Long/parseLong (c/exec :stat :-c "%Y" "/var/db/pkgin/sql.log")))) 32 | 33 | (defn update! 34 | "Pkgin update." 35 | [] 36 | (c/su (c/exec :pkgin :update))) 37 | 38 | (defn maybe-update! 39 | "Pkgin update if we haven't done so recently." 40 | [] 41 | (try (when (< 86400 (time-since-last-update)) 42 | (update!)) 43 | (catch Exception e 44 | (update!)))) 45 | 46 | (defn installed 47 | "Given a list of pkgin packages (strings, symbols, keywords, etc), returns 48 | the set of packages which are installed, as strings." 49 | [pkgs] 50 | (let [pkgs (->> pkgs (map name) set)] 51 | (->> (c/exec :pkgin :-p :list) 52 | str/split-lines 53 | (map (comp first #(str/split %1 #";"))) 54 | (map (comp second (partial re-find #"(.*)-[^\-]+"))) 55 | set 56 | (#(filter %1 pkgs)) 57 | set))) 58 | 59 | (defn uninstall! 60 | "Removes a package or packages." 61 | [pkg-or-pkgs] 62 | (let [pkgs (if (coll? pkg-or-pkgs) pkg-or-pkgs (list pkg-or-pkgs)) 63 | pkgs (installed pkgs)] 64 | (c/su (apply c/exec :pkgin :-y :remove pkgs)))) 65 | 66 | (defn installed? 67 | "Are the given packages, or singular package, installed on the current 68 | system?" 69 | [pkg-or-pkgs] 70 | (let [pkgs (if (coll? pkg-or-pkgs) pkg-or-pkgs (list pkg-or-pkgs))] 71 | (every? (installed pkgs) (map name pkgs)))) 72 | 73 | (defn installed-version 74 | "Given a package name, determines the installed version of that package, or 75 | nil if it is not installed." 76 | [pkg] 77 | (some->> (c/exec :pkgin :-p :list) 78 | str/split-lines 79 | (map (comp first #(str/split %1 #";"))) 80 | (map (partial re-find #"(.*)-[^\-]+")) 81 | (filter #(= (second %) (name pkg))) 82 | first 83 | first 84 | (re-find #".*-([^\-]+)") 85 | second)) 86 | 87 | (defn install 88 | "Ensure the given packages are installed. Can take a flat collection of 89 | packages, passed as symbols, strings, or keywords, or, alternatively, a map 90 | of packages to version strings." 91 | [pkgs] 92 | (if (map? pkgs) 93 | ; Install specific versions 94 | (dorun 95 | (for [[pkg version] pkgs] 96 | (when (not= version (installed-version pkg)) 97 | (info "Installing" pkg version) 98 | (c/exec :pkgin :-y :install 99 | (str (name pkg) "-" version))))) 100 | 101 | ; Install any version 102 | (let [pkgs (set (map name pkgs)) 103 | missing (set/difference pkgs (installed pkgs))] 104 | (when-not (empty? missing) 105 | (c/su 106 | (info "Installing" missing) 107 | (apply c/exec :pkgin :-y :install missing)))))) 108 | 109 | (def os 110 | (reify os/OS 111 | (setup! [_ test node] 112 | (info node "setting up smartos") 113 | 114 | (setup-hostfile!) 115 | 116 | (maybe-update!) 117 | 118 | (c/su 119 | ; Packages! 120 | (install [:wget 121 | :curl 122 | :vim 123 | :unzip 124 | :rsyslog 125 | :logrotate])) 126 | 127 | (c/su 128 | (c/exec :svcadm :enable :-r :ipfilter)) 129 | 130 | (meh (net/heal! (:net test) test))) 131 | 132 | (teardown! [_ test node]))) 133 | -------------------------------------------------------------------------------- /generator/test/jepsen/generator/context_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.generator.context-test 2 | (:require [clojure [datafy :refer [datafy]] 3 | [pprint :refer [pprint]] 4 | [test :refer :all]] 5 | [jepsen.generator.context :refer :all]) 6 | (:import (io.lacuna.bifurcan IMap 7 | Map 8 | Set))) 9 | 10 | (deftest context-test 11 | (let [c (context {:concurrency 2}) 12 | _ (testing "basics" 13 | (is (= 0 (:time c))) 14 | (is (= 3 (all-thread-count c))) 15 | (is (= 3 (free-thread-count c))) 16 | (is (= (Set/from [:nemesis 0 1]) (free-threads c))) 17 | (is (= #{:nemesis 0 1} (set (free-processes c)))) 18 | (is (= :nemesis (thread->process c :nemesis))) 19 | (is (= :nemesis (process->thread c :nemesis))) 20 | (is (= 1 (thread->process c 1))) 21 | (is (= 1 (process->thread c 1))) 22 | (is (= 0 (some-free-process c)))) 23 | 24 | c1 (busy-thread c 5 0) 25 | _ (testing "busy" 26 | (is (= 5 (:time c1))) 27 | (is (= 3 (all-thread-count c1))) 28 | (is (= 2 (free-thread-count c1))) 29 | (is (= (Set/from [:nemesis 1]) (free-threads c1))) 30 | (is (= #{:nemesis 1} (set (free-processes c1)))) 31 | (is (= 1 (some-free-process c1)))) 32 | 33 | c2 (free-thread c1 10 0) 34 | _ (testing "free" 35 | (is (= (assoc c :time 10) c2)) 36 | ; But the free process has advanced, for fairness! 37 | (is (= 1 (some-free-process c2)))) 38 | 39 | _ (testing "thread-filter" 40 | (testing "all-but" 41 | (let [c3 ((make-thread-filter (all-but :nemesis) c) c)] 42 | (is (= (Set/from [0 1]) (free-threads c3))) 43 | (is (= (Set/from [0 1]) (all-threads c3))) 44 | (is (= #{0 1} (set (free-processes c3)))) 45 | (is (= #{0 1} (set (all-processes c3)))) 46 | (is (= 0 (some-free-process c3))))) 47 | 48 | (testing "set" 49 | (let [c3 ((make-thread-filter #{1 :nemesis} c) c)] 50 | (is (= (Set/from [1 :nemesis]) (free-threads c3))) 51 | (is (= (Set/from [1 :nemesis]) (all-threads c3))) 52 | (is (= #{1 :nemesis} (set (free-processes c3)))) 53 | (is (= #{1 :nemesis} (set (all-processes c3)))) 54 | (is (= 1 (some-free-process c3))))) 55 | 56 | (testing "fn" 57 | (let [c3 ((make-thread-filter integer? c) c)] 58 | (is (= (Set/from [0 1]) (free-threads c3))) 59 | (is (= (Set/from [0 1]) (all-threads c3))) 60 | (is (= #{0 1} (set (free-processes c3)))) 61 | (is (= #{0 1} (set (all-processes c3)))) 62 | (is (= 0 (some-free-process c3)))))) 63 | ])) 64 | 65 | (deftest with-next-process-test 66 | ; As we crash threads their processes should advance 67 | (let [c (context {:concurrency 2}) 68 | c1 (with-next-process c 0) 69 | _ (is (= 2 (thread->process c1 0))) 70 | _ (is (= 0 (process->thread c1 2))) 71 | c2 (with-next-process c1 0) 72 | _ (is (= 4 (thread->process c2 0))) 73 | _ (is (= 0 (process->thread c2 4)))])) 74 | 75 | (deftest some-free-process-test 76 | (let [c (context {:concurrency 2})] 77 | (testing "all free" 78 | (is (= 0 (some-free-process c)))) 79 | 80 | (testing "some busy" 81 | (is (= 1 (-> c (busy-thread 0 0) some-free-process))) 82 | (is (not= 1 (-> c (busy-thread 0 1) some-free-process)))) 83 | 84 | ; We want to make sure that if we use and free a process, and all *later* 85 | ; processes are busy, we realize the first process is still free. 86 | (testing "driven forward" 87 | (let [c' (-> c 88 | (busy-thread 0 1) 89 | (busy-thread 0 :nemesis))] 90 | (is (= 0 (some-free-process c'))))) 91 | 92 | ; We want to distribute requests evenly across threads to prevent 93 | ; starvation. 94 | (testing "fair" 95 | (let [c1 (-> c (busy-thread 0 0) (free-thread 0 0)) 96 | _ (is (= 1 (some-free-process c1))) 97 | c2 (-> c1 (busy-thread 0 1) (free-thread 0 1)) 98 | _ (is (= :nemesis (some-free-process c2))) 99 | c3 (-> c2 (busy-thread 0 :nemesis) (free-thread 0 :nemesis)) 100 | _ (is (= 0 (some-free-process c3)))])))) 101 | 102 | (deftest assoc-test 103 | (let [c (context {:concurrency 2}) 104 | c' (assoc c :special 123)] 105 | (is (= 123 (:special c'))) 106 | (is (= (class c) (class c'))))) 107 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/tests/causal_reverse.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.tests.causal-reverse 2 | "Checks for a strict serializability anomaly in which T1 < T2, but T2 is 3 | visible without T1. 4 | 5 | We perform concurrent blind inserts across n keys, and meanwhile, perform 6 | reads of n keys in a transaction. To verify, we replay the history, 7 | tracking the writes which were known to have completed before the invocation 8 | of any write w_i. If w_i is visible, and some w_j < w_i is *not* visible, 9 | we've found a violation of strict serializability. 10 | 11 | Splits keys up onto different tables to make sure they fall in different 12 | shard ranges" 13 | (:require [jepsen [checker :as checker] 14 | [generator :as gen] 15 | [history :as h] 16 | [independent :as independent]] 17 | [clojure.core.reducers :as r] 18 | [clojure.set :as set] 19 | [clojure.tools.logging :refer :all])) 20 | 21 | (defn graph 22 | "Takes a history and returns a first-order write precedence graph." 23 | [history] 24 | (loop [completed (sorted-set) 25 | expected {} 26 | [op & more :as history] history] 27 | (cond 28 | ;; Done 29 | (not (seq history)) 30 | expected 31 | 32 | ;; We know this value is definitely written 33 | (= :write (:f op)) 34 | (cond 35 | ;; Write is beginning; record precedence 36 | (h/invoke? op) 37 | (recur completed 38 | (assoc expected (:value op) completed) 39 | more) 40 | 41 | ;; Write is completing; we can now expect to see 42 | ;; it 43 | (h/ok? op) 44 | (recur (conj completed (:value op)) 45 | expected more) 46 | 47 | ;; Not a write, ignore and continue 48 | true (recur completed expected more)) 49 | true (recur completed expected more)))) 50 | 51 | (defn errors 52 | "Takes a history and an expected graph of write precedence, returning ops that 53 | violate the expected write order." 54 | [history expected] 55 | (let [h (->> history 56 | (h/filter (h/has-f? :read)) 57 | (h/filter h/ok?)) 58 | f (fn [errors op] 59 | (let [seen (:value op) 60 | our-expected (->> seen 61 | (map expected) 62 | (reduce set/union)) 63 | missing (set/difference our-expected 64 | seen)] 65 | (if (empty? missing) 66 | errors 67 | (conj errors 68 | (-> op 69 | (dissoc :value) 70 | (assoc :missing missing) 71 | (assoc :expected-count 72 | (count our-expected)))))))] 73 | (reduce f [] h))) 74 | 75 | (defn checker 76 | "Takes a history of writes and reads. Verifies that subquent writes do not 77 | appear without prior acknowledged writes." 78 | [] 79 | (reify checker/Checker 80 | (check [this test history opts] 81 | (let [expected (graph history) 82 | errors (errors history expected)] 83 | {:valid? (empty? errors) 84 | :errors errors})))) 85 | 86 | (defn r [] {:type :invoke, :f :read, :value nil}) 87 | (defn w [k] {:type :invoke, :f :write, :value k}) 88 | 89 | (defn workload 90 | "A package of a generator and checker. Options: 91 | 92 | :nodes A set of nodes you're going to operate on. We only care 93 | about the count, so we can figure out how many workers 94 | to use per key. 95 | :per-key-limit Maximum number of ops per key. Default 500." 96 | [opts] 97 | {:checker (checker/compose 98 | {:perf (checker/perf) 99 | :sequential (independent/checker (checker))}) 100 | :generator (let [n (count (:nodes opts)) 101 | reads {:f :read} 102 | writes (map (fn [x] {:f :write, :value x}) (range))] 103 | (independent/concurrent-generator 104 | n 105 | (range) 106 | (fn [k] 107 | ; TODO: I'm not entirely sure this is the same--I 108 | ; thiiiink the original generator didn't actually mean to 109 | ; re-use the same write generator for each distinct key, 110 | ; but if it *did* and relied on that behavior, this will 111 | ; break. 112 | (->> (gen/mix [reads writes]) 113 | (gen/stagger 1/100) 114 | (gen/limit (:per-key-limit opts 500))))))}) 115 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/tests/causal.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.tests.causal 2 | (:refer-clojure :exclude [test]) 3 | (:require [jepsen [checker :as checker] 4 | [generator :as gen] 5 | [history :as h] 6 | [independent :as independent]] 7 | [clojure.tools.logging :refer [info warn]] 8 | [clojure.pprint :refer [pprint]])) 9 | 10 | (defprotocol Model 11 | (step [model op])) 12 | 13 | (defrecord Inconsistent [msg] 14 | Model 15 | (step [this op] this) 16 | 17 | Object 18 | (toString [this] msg)) 19 | 20 | (defn inconsistent 21 | "Represents an invalid termination of a model; e.g. that an operation could 22 | not have taken place." 23 | [msg] 24 | (Inconsistent. msg)) 25 | 26 | (defn inconsistent? 27 | "Is a model inconsistent?" 28 | [model] 29 | (instance? Inconsistent model)) 30 | 31 | (defrecord CausalRegister [value counter last-pos] 32 | Model 33 | (step [r op] 34 | (let [c (inc counter) 35 | v' (:value op) 36 | pos (:position op) 37 | link (:link op)] 38 | (if-not (or (= link :init) 39 | (= link last-pos)) 40 | (inconsistent (str "Cannot link " link 41 | " to last-seen position " last-pos)) 42 | (condp = (:f op) 43 | :write (cond 44 | ;; Write aligns with next counter, OK 45 | (= v' c) 46 | (CausalRegister. v' c pos) 47 | 48 | ;; Attempting to write an unknown value 49 | (not= v' c) 50 | (inconsistent (str "expected value " c 51 | " attempting to write " 52 | v' " instead"))) 53 | 54 | :read-init (cond 55 | ;; Read a non-0 value from a freshly initialized register 56 | (and (= 0 counter) 57 | (not= 0 v')) 58 | (inconsistent (str "expected init value 0, read " v')) 59 | 60 | ;; Read the expected value of the register, 61 | ;; update the last known position 62 | (or (nil? v') 63 | (= value v')) 64 | (CausalRegister. value counter pos) 65 | 66 | ;; Read a value that we haven't written 67 | true (inconsistent (str "can't read " v' 68 | " from register " value))) 69 | 70 | :read (cond 71 | ;; Read the expected value of the register, 72 | ;; update the last known position 73 | (or (nil? v') 74 | (= value v')) 75 | (CausalRegister. value counter pos) 76 | 77 | ;; Read a value that we haven't written 78 | true (inconsistent (str "can't read " v' 79 | " from register " value))))))) 80 | Object 81 | (toString [this] (pr-str value))) 82 | 83 | (defn causal-register [] 84 | (CausalRegister. 0 0 nil)) 85 | 86 | (defn check 87 | "A series of causally consistent (CC) ops are a causal order (CO). We issue a 88 | CO of 5 read (r) and write (w) operations (r w r w r) against a register 89 | (key). All operations in this CO must appear to execute in the order provided 90 | by the issuing site (process). We also look for anomalies, such as unexpected 91 | values" 92 | [model] 93 | (reify checker/Checker 94 | (check [this test history opts] 95 | (let [completed (h/oks history)] 96 | (loop [s model 97 | history completed] 98 | (if (empty? history) 99 | ;; We've checked every operation in the history 100 | {:valid? true 101 | :model s} 102 | 103 | ;; checking checking checking... 104 | (let [op (first history) 105 | s' (step s op)] 106 | (if (inconsistent? s') 107 | {:valid? false 108 | :error (:msg s')} 109 | (recur s' (rest history)))))))))) 110 | 111 | ;; Generators 112 | (defn r [_ _] {:type :invoke, :f :read}) 113 | (defn ri [_ _] {:type :invoke, :f :read-init}) 114 | (defn cw1 [_ _] {:type :invoke, :f :write, :value 1}) 115 | (defn cw2 [_ _] {:type :invoke, :f :write, :value 2}) 116 | 117 | (defn test 118 | [opts] 119 | {:checker (independent/checker (check (causal-register))) 120 | :generator (->> (independent/concurrent-generator 121 | 1 122 | (range) 123 | (fn [k] [ri cw1 r cw2 r])) 124 | (gen/stagger 1) 125 | (gen/nemesis 126 | (cycle [(gen/sleep 10) 127 | {:type :info, :f :start} 128 | (gen/sleep 10) 129 | {:type :info, :f :stop}])) 130 | (gen/time-limit (:time-limit opts)))}) 131 | -------------------------------------------------------------------------------- /jepsen/resources/strobe-time.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | const int64_t NANOS_PER_SEC = 1000000000; 8 | 9 | # define TIMEVAL_TO_TIMESPEC(tv, ts) { \ 10 | (ts)->tv_sec = (tv)->tv_sec; \ 11 | (ts)->tv_nsec = (tv)->tv_usec * 1000; \ 12 | } 13 | # define TIMESPEC_TO_TIMEVAL(tv, ts) { \ 14 | (tv)->tv_sec = (ts)->tv_sec; \ 15 | (tv)->tv_usec = (ts)->tv_nsec / 1000; \ 16 | } 17 | 18 | /* Convert nanoseconds to a timespec */ 19 | struct timespec nanos_to_timespec(int64_t nanos) { 20 | struct timespec t; 21 | int64_t dnanos = nanos % NANOS_PER_SEC; 22 | int64_t dseconds = (nanos - dnanos) / NANOS_PER_SEC; 23 | t.tv_nsec = dnanos; 24 | t.tv_sec = dseconds; 25 | return t; 26 | } 27 | 28 | /* Obtain monotonic clock as a timespec */ 29 | struct timespec monotonic_now() { 30 | struct timespec now; 31 | clock_gettime(CLOCK_MONOTONIC, &now); 32 | return now; 33 | } 34 | 35 | /* Obtain wall clock as a timespec */ 36 | struct timespec wall_now() { 37 | struct timespec now_ts; 38 | if (0 != clock_gettime(CLOCK_REALTIME, &now_ts)) { 39 | perror("clock_gettime"); 40 | exit(1); 41 | } 42 | return now_ts; 43 | } 44 | 45 | /* Set wall clock */ 46 | void set_wall_clock(struct timespec ts) { 47 | /* printf("Setting clock: %d %d\n", ts.tv_sec, ts.tv_nsec); */ 48 | if (0 != clock_settime(CLOCK_REALTIME, &ts)) { 49 | perror("clock_settime"); 50 | exit(2); 51 | } 52 | } 53 | 54 | /* Rebalances sec/nsec to be within bounds. Mutates t.*/ 55 | void balance_timespec_m(struct timespec *t) { 56 | while (t->tv_nsec <= NANOS_PER_SEC) { 57 | t->tv_sec -= 1; 58 | t->tv_nsec += NANOS_PER_SEC; 59 | } 60 | while (NANOS_PER_SEC <= t->tv_nsec) { 61 | t->tv_sec += 1; 62 | t->tv_nsec -= NANOS_PER_SEC; 63 | } 64 | } 65 | 66 | /* Add two timespecs, returning their sum */ 67 | struct timespec add_timespec(struct timespec a, struct timespec b) { 68 | struct timespec result; 69 | result.tv_sec = a.tv_sec + b.tv_sec; 70 | result.tv_nsec = a.tv_nsec + b.tv_nsec; 71 | balance_timespec_m(&result); 72 | return result; 73 | } 74 | 75 | /* Subtract one timespec from another, returning their difference. */ 76 | struct timespec sub_timespec(struct timespec a, struct timespec b) { 77 | struct timespec result; 78 | result.tv_sec = a.tv_sec - b.tv_sec; 79 | result.tv_nsec = a.tv_nsec - b.tv_nsec; 80 | balance_timespec_m(&result); 81 | return result; 82 | } 83 | 84 | /* Standard -1, 0, +1 comparator over timespecs */ 85 | int8_t cmp_timespec(struct timespec a, struct timespec b) { 86 | if (a.tv_sec < b.tv_sec) { 87 | return 1; 88 | } else if (b.tv_sec < a.tv_sec) { 89 | return -1; 90 | } else { 91 | if (a.tv_nsec < b.tv_nsec) { 92 | return 1; 93 | } else if (b.tv_nsec < a.tv_nsec) { 94 | return -1; 95 | } else { 96 | return 0; 97 | } 98 | } 99 | } 100 | 101 | int main(int argc, char **argv) { 102 | if (argc < 2) { 103 | fprintf(stderr, "usage: %s \n", argv[0]); 104 | fprintf(stderr, "Delta and period are in ms, duration is in seconds. " 105 | "Every period ms, adjusts the clock forward by delta ms, or, " 106 | "alternatively, back by delta ms. Does this for duration seconds, " 107 | "then exits. Useful for confusing the heck out of systems that " 108 | "assume clocks are monotonic and linear.\n"); 109 | return 1; 110 | } 111 | 112 | /* Parse args */ 113 | struct timespec delta = nanos_to_timespec(atof(argv[1]) * 1000000); 114 | struct timespec period = nanos_to_timespec(atof(argv[2]) * 1000000); 115 | struct timespec duration = nanos_to_timespec(atof(argv[3]) * 1000000000); 116 | 117 | /* How far ahead of the monotonic clock is wall time? */ 118 | struct timespec normal_offset = sub_timespec(wall_now(), monotonic_now()); 119 | struct timespec weird_offset = add_timespec(normal_offset, delta); 120 | 121 | /* And somewhere to store nanosleep remainders */ 122 | struct timespec rem; 123 | 124 | /* When (in monotonic time) should we stop changing the clock? */ 125 | struct timespec end = add_timespec(monotonic_now(), duration); 126 | 127 | /* Are we in weird time mode or not? */ 128 | int8_t weird = 0; 129 | 130 | /* Number of adjustments */ 131 | int64_t count = 0; 132 | 133 | /* Strobe the clock until duration's up! */ 134 | while (0 < cmp_timespec(monotonic_now(), end)) { 135 | set_wall_clock(add_timespec(monotonic_now(), 136 | (weird ? normal_offset : weird_offset))); 137 | // printf("Time now: %d %d\n", wall_now().tv_sec, wall_now().tv_nsec); 138 | weird = !weird; 139 | count += 1; 140 | 141 | if (0 != nanosleep(&period, &rem)) { 142 | perror("nanosleep"); 143 | exit(3); 144 | } 145 | } 146 | 147 | /* Reset clock and print number of changes */ 148 | set_wall_clock(add_timespec(monotonic_now(), normal_offset)); 149 | printf("%d\n", count); 150 | return 0; 151 | } 152 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/os/centos.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.os.centos 2 | "Common tasks for CentOS boxes." 3 | (:require [clojure.set :as set] 4 | [clojure.tools.logging :refer [info]] 5 | [jepsen.util :as u] 6 | [jepsen.os :as os] 7 | [jepsen.control :as c] 8 | [jepsen.control.util :as cu] 9 | [jepsen.net :as net] 10 | [clojure.string :as str])) 11 | 12 | (defn setup-hostfile! 13 | "Makes sure the hostfile has a loopback entry for the local hostname" 14 | [] 15 | (let [name (c/exec :hostname) 16 | hosts (c/exec :cat "/etc/hosts") 17 | hosts' (->> hosts 18 | str/split-lines 19 | (map (fn [line] 20 | (if (and (re-find #"^127\.0\.0\.1" line) 21 | (not (re-find (re-pattern name) line))) 22 | (str line " " name) 23 | line))) 24 | (str/join "\n"))] 25 | (c/su (c/exec :echo hosts' :> "/etc/hosts")))) 26 | 27 | (defn time-since-last-update 28 | "When did we last run a yum update, in seconds ago" 29 | [] 30 | (- (Long/parseLong (c/exec :date "+%s")) 31 | (Long/parseLong (c/exec :stat :-c "%Y" "/var/log/yum.log")))) 32 | 33 | (defn update! 34 | "Yum update." 35 | [] 36 | (c/su (c/exec :yum :-y :update))) 37 | 38 | (defn maybe-update! 39 | "Yum update if we haven't done so recently." 40 | [] 41 | (try (when (< 86400 (time-since-last-update)) 42 | (update!)) 43 | (catch Exception e 44 | (update!)))) 45 | 46 | (defn installed 47 | "Given a list of centos packages (strings, symbols, keywords, etc), returns 48 | the set of packages which are installed, as strings." 49 | [pkgs] 50 | (let [pkgs (->> pkgs (map name) set)] 51 | (->> (c/exec :yum :list :installed) 52 | str/split-lines 53 | (map (comp first #(str/split %1 #"\s+"))) 54 | (map (comp second (partial re-find #"(.*)\.[^\-]+"))) 55 | set 56 | ((partial clojure.set/intersection pkgs)) 57 | u/spy))) 58 | 59 | (defn uninstall! 60 | "Removes a package or packages." 61 | [pkg-or-pkgs] 62 | (let [pkgs (if (coll? pkg-or-pkgs) pkg-or-pkgs (list pkg-or-pkgs)) 63 | pkgs (installed pkgs)] 64 | (info "Uninstalling" pkgs) 65 | (c/su (apply c/exec :yum :-y :remove pkgs)))) 66 | 67 | (defn installed? 68 | "Are the given packages, or singular package, installed on the current 69 | system?" 70 | [pkg-or-pkgs] 71 | (let [pkgs (if (coll? pkg-or-pkgs) pkg-or-pkgs (list pkg-or-pkgs))] 72 | (every? (installed pkgs) (map name pkgs)))) 73 | 74 | (defn installed-version 75 | "Given a package name, determines the installed version of that package, or 76 | nil if it is not installed." 77 | [pkg] 78 | (some->> (c/exec :yum :list :installed) 79 | str/split-lines 80 | (map (comp first #(str/split %1 #";"))) 81 | (map (partial re-find #"(.*).[^\-]+")) 82 | (filter #(= (second %) (name pkg))) 83 | first 84 | first 85 | (re-find #".*-([^\-]+)") 86 | second)) 87 | 88 | (defn install 89 | "Ensure the given packages are installed. Can take a flat collection of 90 | packages, passed as symbols, strings, or keywords, or, alternatively, a map 91 | of packages to version strings." 92 | [pkgs] 93 | (if (map? pkgs) 94 | ; Install specific versions 95 | (dorun 96 | (for [[pkg version] pkgs] 97 | (when (not= version (installed-version pkg)) 98 | (info "Installing" pkg version) 99 | (c/exec :yum :-y :install 100 | (str (name pkg) "-" version))))) 101 | 102 | ; Install any version 103 | (let [pkgs (set (map name pkgs)) 104 | missing (set/difference pkgs (installed pkgs))] 105 | (when-not (empty? missing) 106 | (c/su 107 | (info "Installing" missing) 108 | (apply c/exec :yum :-y :install missing)))))) 109 | 110 | (defn install-start-stop-daemon! 111 | "Installs start-stop-daemon on centos" 112 | [] 113 | (info "Installing start-stop-daemon") 114 | (c/su 115 | (c/exec :wget "http://ftp.de.debian.org/debian/pool/main/d/dpkg/dpkg_1.19.8.tar.xz") 116 | (c/exec :tar :-xf :dpkg_1.19.8.tar.xz) 117 | (c/exec "bash" "-c" "cd dpkg-1.19.8 && ./configure") 118 | (c/exec "bash" "-c" "cd dpkg-1.19.8 && make") 119 | (c/exec "bash" "-c" "cp /dpkg-1.19.8/utils/start-stop-daemon /usr/bin/start-stop-daemon") 120 | (c/exec "bash" "-c" "rm -f dpkg_1.19.8.tar.xz"))) 121 | 122 | (defn installed-start-stop-daemon? 123 | "Is start-stop-daemon Installed?" 124 | [] 125 | (->> (c/exec :ls "/usr/bin") 126 | str/split-lines 127 | (some #(if (re-find #"start-stop-daemon" %) true)))) 128 | 129 | (deftype CentOS [] 130 | os/OS 131 | (setup! [_ test node] 132 | (info node "setting up centos") 133 | 134 | (setup-hostfile!) 135 | 136 | (maybe-update!) 137 | 138 | (c/su 139 | ; Packages! 140 | (install [:wget 141 | :gcc 142 | :gcc-c++ 143 | :curl 144 | :vim-common 145 | :unzip 146 | :rsyslog 147 | :iptables 148 | :ncurses-devel 149 | :iproute 150 | :logrotate])) 151 | 152 | (if (not= true (installed-start-stop-daemon?)) (install-start-stop-daemon!) (info "start-stop-daemon already installed")) 153 | 154 | (u/meh (net/heal! (:net test) test))) 155 | 156 | (teardown! [_ test node])) 157 | 158 | (def os "Support for CentOS." (CentOS.)) 159 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/client.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.client 2 | "Applies operations to a database." 3 | (:require [clojure.tools.logging :refer :all] 4 | [clojure.reflect :refer [reflect]] 5 | [jepsen.util :as util] 6 | [dom-top.core :refer [with-retry]] 7 | [clj-commons.slingshot :refer [try+ throw+]])) 8 | 9 | (defprotocol Client 10 | ; TODO: this should be open, not open! 11 | ; 12 | ; TODO: it would also be really nice to have this be (open client test node 13 | ; process)--we keep wanting to make decisions based on the process at client 14 | ; open time. 15 | (open! [client test node] 16 | "Set up the client to work with a particular node. Returns a client 17 | which is ready to accept operations via invoke! Open *should not* 18 | affect the logical state of the test; it should not, for instance, 19 | modify tables or insert records.") 20 | (close! [client test] 21 | "Close the client connection when work is completed or an invocation 22 | crashes the client. Close should not affect the logical state of the 23 | test.") 24 | (setup! [client test] 25 | "Called to set up database state for testing.") 26 | (invoke! [client test operation] 27 | "Apply an operation to the client, returning an operation to be 28 | appended to the history. For multi-stage operations, the client may 29 | reach into the test and conj onto the history atom directly.") 30 | (teardown! [client test] 31 | "Tear down database state when work is complete.")) 32 | 33 | (defprotocol Reusable 34 | (reusable? [client test] 35 | "If true, this client can be re-used with a fresh process after a 36 | call to `invoke` throws or returns an `info` operation. If false 37 | (or if this protocol is not implemented), crashed clients will be 38 | closed and new ones opened to replace them.")) 39 | 40 | (defn is-reusable? 41 | "Wrapper around reusable?; returns false when not implemented." 42 | [client test] 43 | ; satisfies? Reusable is somehow true for records which DEFINITELY don't 44 | ; implement it and I don't know how this is possible, so we're falling back 45 | ; to IllegalArgException 46 | (try (reusable? client test) 47 | (catch IllegalArgumentException e 48 | false))) 49 | 50 | (def noop 51 | "Does nothing." 52 | (reify Client 53 | (setup! [this test]) 54 | (teardown! [this test]) 55 | (invoke! [this test op] (assoc op :type :ok)) 56 | (open! [this test node] this) 57 | (close! [this test]))) 58 | 59 | (defn closable? 60 | "Returns true if the given client implements method `close!`." 61 | [client] 62 | (->> client 63 | reflect 64 | :members 65 | (map :name) 66 | (some #{'close_BANG_}))) 67 | 68 | (defrecord Validate [client] 69 | Client 70 | (open! [this test node] 71 | (let [res (open! client test node)] 72 | (when-not (satisfies? Client res) 73 | (throw+ {:type ::open-returned-non-client 74 | :got res} 75 | nil 76 | "expected open! to return a Client, but got %s instead" 77 | (pr-str res))) 78 | (Validate. res))) 79 | 80 | (close! [this test] 81 | (close! client test)) 82 | 83 | (setup! [this test] 84 | (Validate. (setup! client test))) 85 | 86 | (invoke! [this test op] 87 | (let [op' (invoke! client test op)] 88 | (let [problems 89 | (cond-> [] 90 | (not (map? op')) 91 | (conj "should be a map") 92 | 93 | (not (#{:ok :info :fail} (:type op'))) 94 | (conj ":type should be :ok, :info, or :fail") 95 | 96 | (not= (:process op) (:process op')) 97 | (conj ":process should be the same") 98 | 99 | (not= (:f op) (:f op')) 100 | (conj ":f should be the same"))] 101 | (when (seq problems) 102 | (throw+ {:type ::invalid-completion 103 | :op op 104 | :op' op' 105 | :problems problems}))) 106 | op')) 107 | 108 | (teardown! [this test] 109 | (teardown! client test)) 110 | 111 | Reusable 112 | (reusable? [this test] 113 | (reusable? client test))) 114 | 115 | (defn validate 116 | "Wraps a client, validating that its return types are what you'd expect." 117 | [client] 118 | (Validate. client)) 119 | 120 | (defrecord Timeout [timeout-fn client] 121 | Client 122 | (open! [this test node] 123 | (Timeout. timeout-fn (open! client test node))) 124 | 125 | (setup! [this test] 126 | (Timeout. timeout-fn (setup! client test))) 127 | 128 | (invoke! [this test op] 129 | (let [ms (timeout-fn op)] 130 | (util/timeout ms (assoc op :type :info, :error ::timeout) 131 | (invoke! client test op)))) 132 | 133 | (teardown! [this test] 134 | (teardown! client test)) 135 | 136 | (close! [this test] 137 | (close! client test)) 138 | 139 | Reusable 140 | (reusable? [this test] 141 | (reusable? client test))) 142 | 143 | (defn timeout 144 | "Sometimes a client library's own timeouts don't work reliably. This takes 145 | either a timeout as a number of ms, or a function (f op) => timeout-in-ms, 146 | and a client. Wraps that client in a new one which automatically times out 147 | operations that take longer than the given timeout. Timed out operations have 148 | :error :jepsen.client/timeout." 149 | [timeout-or-fn client] 150 | (if (number? timeout-or-fn) 151 | (Timeout. (constantly timeout-or-fn) client) 152 | (Timeout. timeout-or-fn client))) 153 | 154 | (defmacro with-client 155 | "Analogous to with-open. Takes a binding of the form [client-sym 156 | client-expr], and a body. Binds client-sym to client-expr (presumably, 157 | client-expr opens a new client), evaluates body with client-sym bound, and 158 | ensures client is closed before returning." 159 | [[client-sym client-expr] & body] 160 | `(let [~client-sym ~client-expr] 161 | (try 162 | ~@body 163 | (finally 164 | (close! ~client-sym test))))) 165 | 166 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/control/scp.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.control.scp 2 | "Built-in JDK SSH libraries can be orders of magnitude slower than plain old 3 | SCP for copying even medium-sized files of a few GB. This provides a faster 4 | implementation of a Remote which shells out to SCP." 5 | (:require [clojure.string :as str] 6 | [clojure.tools.logging :refer [info warn]] 7 | [jepsen [random :as rand] 8 | [util :as util]] 9 | [jepsen.control.core :as core] 10 | [clj-commons.slingshot :refer [try+ throw+]])) 11 | 12 | (def tmp-dir 13 | "The remote directory we temporarily store files in while transferring up and 14 | down." 15 | "/tmp/jepsen/scp") 16 | 17 | (defn exec! 18 | "A super basic exec implementation for our own purposes. At some point we 19 | might want to pull some? all? of control/exec all the way down into 20 | control.remote, and get rid of this." 21 | [remote ctx cmd-args] 22 | (->> cmd-args 23 | (map core/escape) 24 | (str/join " ") 25 | (hash-map :cmd) 26 | (core/wrap-sudo ctx) 27 | (core/execute! remote ctx) 28 | core/throw-on-nonzero-exit)) 29 | 30 | (defmacro with-tmp-dir 31 | "Evaluates body. If a nonzero exit status occurs, forces the tmp dir to 32 | exist, and re-evals body. We do this to avoid the overhead of checking for 33 | existence every time someone wants to upload/download a file." 34 | [remote ctx & body] 35 | `(try+ ~@body 36 | (catch (#{:jepsen.control/nonzero-exit 37 | :jepsen.util/nonzero-exit} 38 | (:type ~'%)) e# 39 | (exec! ~remote ~ctx [:mkdir :-p tmp-dir]) 40 | (exec! ~remote ~ctx [:chmod "a+rwx" tmp-dir]) 41 | ~@body))) 42 | 43 | (defn tmp-file 44 | "Returns a randomly generated tmpfile for use during uploads/downloads" 45 | [] 46 | (str tmp-dir "/" (rand/long Integer/MAX_VALUE))) 47 | 48 | (defmacro with-tmp-file 49 | "Evaluates body with tmp-file-sym bound to the remote path of a temporary 50 | file. Cleans up file at exit." 51 | [remote ctx [tmp-file-sym] & body] 52 | `(let [~tmp-file-sym (tmp-file) 53 | ; We're going to want to do our tmpfile management as root in case 54 | ; /tmp/jepsen already exists and we don't own it. Blegh. 55 | ctx# (assoc ~ctx :sudo "root")] 56 | (try (with-tmp-dir ~remote ctx# ~@body) 57 | (finally 58 | (exec! ~remote ctx# [:rm :-f ~tmp-file-sym]))))) 59 | 60 | (defn scp! 61 | "Runs an SCP command by shelling out. Takes a conn-spec (used for port, key, 62 | etc), a seq of sources, and a single destination, all as strings." 63 | [conn-spec sources dest] 64 | (apply util/sh "scp" "-rpC" 65 | "-P" (str (:port conn-spec)) 66 | (concat (when-let [k (:private-key-path conn-spec)] 67 | ["-i" k]) 68 | (if-not (:strict-host-key-checking conn-spec) 69 | ["-o StrictHostKeyChecking=no"]) 70 | sources 71 | [dest])) 72 | nil) 73 | 74 | (defn remote-path 75 | "Returns the string representation of a remote path using a conn spec; e.g. 76 | admin@n1:/foo/bar" 77 | [{:keys [username host]} path] 78 | (assert host "No node given for remote-path!") 79 | (str (when username 80 | (str username "@")) 81 | host ":" path)) 82 | 83 | (defrecord Remote [cmd-remote conn-spec] 84 | core/Remote 85 | (connect [this conn-spec] 86 | (-> this 87 | (assoc :conn-spec conn-spec) 88 | (update :cmd-remote core/connect conn-spec))) 89 | 90 | (disconnect! [this] 91 | (update this :cmd-remote core/disconnect!)) 92 | 93 | (execute! [this ctx action] 94 | (core/execute! cmd-remote ctx action)) 95 | 96 | (upload! [this ctx srcs dest _] 97 | (let [sudo (:sudo ctx)] 98 | (if (or (nil? sudo) (= sudo (:user conn-spec))) 99 | ; We can upload directly using our connection credentials. 100 | (scp! conn-spec 101 | (util/coll srcs) 102 | (remote-path conn-spec dest)) 103 | 104 | ; We need to become a different user for this. Upload each source to a 105 | ; tmpfile and rename. 106 | (with-tmp-file cmd-remote ctx [tmp] 107 | (doseq [src (util/coll srcs)] 108 | ; Upload to tmpfile 109 | (core/upload! this {} src tmp nil) 110 | ; Chown and move to dest, as root 111 | (exec! cmd-remote {:sudo "root"} [:chown sudo tmp]) 112 | (exec! cmd-remote {:sudo "root"} [:mv tmp dest])))))) 113 | 114 | (download! [this ctx srcs dest _] 115 | (let [sudo (:sudo ctx)] 116 | (if (or (nil? sudo) (= sudo (:user conn-spec))) 117 | ; We can download directly using our conn credentials. 118 | (scp! conn-spec 119 | (->> (util/coll srcs) 120 | (map (partial remote-path conn-spec))) 121 | dest) 122 | ; We need to copy each file to a tmpfile we CAN read before downloading 123 | ; it. 124 | (doseq [src (util/coll srcs)] 125 | (with-tmp-file cmd-remote ctx [tmp] 126 | ; See if we can read this source as the current user, even if 127 | ; it's not our own file 128 | (if (try+ (exec! cmd-remote {} [:head :-n 1 src]) true 129 | (catch [:exit 1] _ false)) 130 | ; We can directly download this file. 131 | (core/download! this {} src dest nil) 132 | ; Nope; gotta copy. Try a hardlink? 133 | (do (try+ (exec! cmd-remote {:sudo "root"} [:ln :-L src tmp]) 134 | (catch [:exit 1] _ 135 | ; Fine, maybe a different fs. Try a full copy. 136 | (exec! cmd-remote {:sudo "root"} 137 | [:cp src tmp]))) 138 | ; Make the tmpfile readable to us 139 | (exec! cmd-remote {:sudo "root"} [:chown sudo tmp]) 140 | ; Download it 141 | (core/download! this {} tmp dest nil))))))))) 142 | 143 | (defn remote 144 | "Takes a remote which can execute commands, and wraps it in a remote which 145 | overrides upload & download to use SCP." 146 | [cmd-remote] 147 | (Remote. cmd-remote nil)) 148 | -------------------------------------------------------------------------------- /jepsen/test/jepsen/control/util_test.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.control.util-test 2 | (:require [clojure.tools.logging :refer [info warn]] 3 | [clojure [string :as str] 4 | [test :refer :all]] 5 | [clojure.java.io :as io] 6 | [jepsen [common-test :refer [quiet-logging]] 7 | [control :as c]] 8 | [jepsen.control [util :as util] 9 | [sshj :as sshj]] 10 | [clj-commons.slingshot :refer [try+ throw+]])) 11 | 12 | (use-fixtures :once quiet-logging) 13 | (use-fixtures :once (fn with-ssh [t] 14 | (c/with-ssh {} 15 | (c/on "n1" 16 | (t))))) 17 | 18 | (defn assert-file-exists 19 | "Asserts that a file exists at a given destination" 20 | [dest file] 21 | (is (util/exists? (io/file dest file)))) 22 | 23 | (defn assert-file-cached 24 | "Asserts that a file from a url was downloaded and cached in the wget-cache-dir" 25 | [url] 26 | (assert-file-exists util/wget-cache-dir (util/encode url))) 27 | 28 | (deftest ^:integration daemon-test 29 | (let [logfile "/tmp/jepsen-daemon-test.log" 30 | pidfile "/tmp/jepsen-daemon-test.pid"] 31 | (c/exec :rm :-f logfile pidfile) 32 | (util/start-daemon! {:env {:DOG "bark" 33 | :CAT "meow mix"} 34 | :chdir "/tmp" 35 | :logfile logfile 36 | :pidfile pidfile} 37 | "/usr/bin/perl" 38 | :-e 39 | "$|++; print \"$ENV{'CAT'}\\n\"; sleep 10;") 40 | (Thread/sleep 100) 41 | (let [pid (str/trim (c/exec :cat pidfile)) 42 | log (c/exec :cat logfile) 43 | lines (str/split log #"\n")] 44 | (testing "pidfile exists" 45 | (is (re-find #"\d+" pid))) 46 | (testing "daemon running" 47 | (is (try+ (c/exec :kill :-0 pid) 48 | true 49 | (catch [:exit 1] _ false)))) 50 | 51 | (testing "log starts with Jepsen debug line" 52 | (is (re-find #"^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} Jepsen starting DOG=bark CAT=\"meow mix\" /usr/bin/perl -e \"" 53 | (first lines)))) 54 | (testing "env vars threaded through to daemon" 55 | (is (= "meow mix" (nth lines 1)))) 56 | 57 | (testing "shutdown" 58 | (util/stop-daemon! "/usr/bin/perl" pidfile) 59 | (testing "pidfile cleaned up" 60 | (is (not (util/exists? pidfile)))) 61 | (testing "process exited" 62 | (is (is (try+ (c/exec :kill :-0 pid) 63 | false 64 | (catch [:exit 1] _ true))))))))) 65 | 66 | (deftest ^:integration install-archive-test 67 | (testing "without auth credentials" 68 | (let [dest "/tmp/test" 69 | url "https://storage.googleapis.com/etcd/v3.0.0/etcd-v3.0.0-linux-amd64.tar.gz"] 70 | (util/install-archive! (str url) dest {:force? true}) 71 | (assert-file-exists dest "etcd") 72 | (assert-file-cached url))) 73 | 74 | (testing "with auth credentials" 75 | (let [dest "/tmp/test" 76 | url "https://aphyr.com/jepsen-auth/test.zip"] 77 | (util/install-archive! (str url) dest {:force? true 78 | :user? "jepsen" 79 | :pw? "hunter2"}) 80 | (assert-file-exists dest "zeroes.txt") 81 | (assert-file-cached url)))) 82 | 83 | (deftest ^:integration cached-wget-test 84 | (testing "without auth credentials" 85 | (let [url "https://aphyr.com/jepsen/test.zip"] 86 | (util/cached-wget! url {:force? true}) 87 | (assert-file-cached url))) 88 | (testing "with auth credentials" 89 | (let [url "https://aphyr.com/jepsen-auth/test.zip"] 90 | (util/cached-wget! url {:force? true :user? "jepsen" :pw? "hunter2"}) 91 | (assert-file-cached url)))) 92 | 93 | (deftest ^:integration tarball-test 94 | ; Populate a temporary directory 95 | (let [dir (util/tmp-dir!)] 96 | (try 97 | (util/write-file! "foo" (str dir "/foo.txt")) 98 | (util/write-file! "bar" (str dir "/bar.txt")) 99 | ; Tar it up 100 | (let [tarball (util/tarball! dir)] 101 | (try 102 | (is (string? tarball)) 103 | (is (re-find #"^/.+\.tar\.gz$" tarball)) 104 | ; Extract it 105 | (let [dir2 (util/tmp-dir!)] 106 | (try 107 | (util/install-archive! (str "file://" tarball) dir2) 108 | (is (= "foo" (c/exec :cat (str dir2 "/foo.txt")))) 109 | (is (= "bar" (c/exec :cat (str dir2 "/bar.txt")))) 110 | (finally 111 | (c/exec :rm :-rf dir2)))) 112 | (finally 113 | (c/exec :rm :-rf tarball)))) 114 | (finally 115 | (c/exec :rm :-rf dir))))) 116 | 117 | (deftest ^:integration ls-test 118 | (let [dir (util/tmp-dir!)] 119 | (try 120 | (c/exec :mkdir :-p (str dir "/foo/bar")) 121 | (util/write-file! "baz" (str dir "/foo/bar/baz")) 122 | (util/write-file! "blarg" (str dir "/foo/bar/blarg")) 123 | (util/write-file! "xyzzy" (str dir "/xyzzy")) 124 | 125 | (testing "simple" 126 | (is (= ["foo" 127 | "xyzzy"] 128 | (util/ls dir)))) 129 | 130 | (testing "recursive" 131 | (is (= ["foo" 132 | "foo/bar" 133 | "foo/bar/baz" 134 | "foo/bar/blarg" 135 | "xyzzy"] 136 | (util/ls dir {:recursive? true})))) 137 | 138 | (testing "full path" 139 | (is (= [(str dir "/foo") 140 | (str dir "/xyzzy")] 141 | (util/ls dir {:full-path? true})))) 142 | 143 | (testing "files" 144 | (is (= ["foo/bar/baz" 145 | "foo/bar/blarg" 146 | "xyzzy"] 147 | (util/ls dir {:recursive? true, :types [:file]})))) 148 | 149 | (testing "dirs" 150 | (is (= ["foo"] 151 | (util/ls dir {:types [:dir]})))) 152 | 153 | (testing "trailing /" 154 | (is (= ["xyzzy"] 155 | (util/ls dir {:types [:file]}) 156 | (util/ls (str dir "/") {:types [:file]})))) 157 | (finally 158 | (c/exec :rm :-rf dir))))) 159 | -------------------------------------------------------------------------------- /docker/bin/up: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # "To provide additional docker compose args, set the COMPOSE var. Ex: 4 | # COMPOSE="-f FILE_PATH_HERE" 5 | 6 | set -o errexit 7 | set -o pipefail 8 | set -o nounset 9 | # set -o xtrace 10 | 11 | ERROR() { 12 | printf "\e[101m\e[97m[ERROR]\e[49m\e[39m %s\n" "$@" 13 | } 14 | 15 | WARNING() { 16 | printf "\e[101m\e[97m[WARNING]\e[49m\e[39m %s\n" "$@" 17 | } 18 | 19 | INFO() { 20 | printf "\e[104m\e[97m[INFO]\e[49m\e[39m %s\n" "$@" 21 | } 22 | 23 | exists() { 24 | type "$1" > /dev/null 2>&1 25 | } 26 | 27 | JEPSEN_ROOT=${JEPSEN_ROOT:-""} 28 | 29 | # Change directory to the parent directory of this script. Taken from: 30 | # https://stackoverflow.com/a/246128/3858681 31 | pushd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )/.." 32 | 33 | HELP=0 34 | INIT_ONLY=0 35 | NODE_COUNT=5 36 | DEV="" 37 | COMPOSE=${COMPOSE:-""} 38 | RUN_AS_DAEMON=0 39 | POSITIONAL=() 40 | 41 | while [[ $# -gt 0 ]] 42 | do 43 | key="$1" 44 | 45 | case $key in 46 | --help) 47 | HELP=1 48 | shift # past argument 49 | ;; 50 | --init-only) 51 | INIT_ONLY=1 52 | shift # past argument 53 | ;; 54 | --dev) 55 | if [ ! "$JEPSEN_ROOT" ]; then 56 | JEPSEN_ROOT="$(cd ../ && pwd)" 57 | export JEPSEN_ROOT 58 | INFO "JEPSEN_ROOT is not set, defaulting to: $JEPSEN_ROOT" 59 | fi 60 | INFO "Running docker compose with dev config" 61 | DEV="-f docker-compose.dev.yml" 62 | shift # past argument 63 | ;; 64 | --compose) 65 | COMPOSE="-f $2" 66 | shift # past argument 67 | shift # past value 68 | ;; 69 | -d|--daemon) 70 | INFO "Running docker compose as daemon" 71 | RUN_AS_DAEMON=1 72 | shift # past argument 73 | ;; 74 | -n|--node-count) 75 | NODE_COUNT=$2 76 | shift 77 | shift 78 | ;; 79 | *) 80 | POSITIONAL+=("$1") 81 | ERROR "unknown option $1" 82 | shift # past argument 83 | ;; 84 | esac 85 | done 86 | if [ "${#POSITIONAL[@]}" -gt 0 ]; then 87 | set -- "${POSITIONAL[@]}" # restore positional parameters 88 | fi 89 | 90 | if [ "${HELP}" -eq 1 ]; then 91 | echo "Usage: $0 [OPTION]" 92 | echo " --help Display this message" 93 | echo " --init-only Initializes ssh-keys, but does not call docker-compose" 94 | echo " --daemon Runs docker compose in the background" 95 | echo " --dev Mounts dir at host's JEPSEN_ROOT to /jepsen on jepsen-control container, syncing files for development" 96 | echo " --compose PATH Path to an additional docker-compose yml config." 97 | echo "To provide multiple additional docker compose args, set the COMPOSE var directly, with the -f flag. Ex: COMPOSE=\"-f FILE_PATH_HERE -f ANOTHER_PATH\" ./up.sh --dev" 98 | exit 0 99 | fi 100 | 101 | exists ssh-keygen || { ERROR "Please install ssh-keygen (apt-get install openssh-client)"; exit 1; } 102 | exists perl || { ERROR "Please install perl (apt-get install perl)"; exit 1; } 103 | 104 | # Generate SSH keys for the control node 105 | if [ ! -f ./secret/node.env ]; then 106 | INFO "Generating key pair" 107 | mkdir -p secret 108 | ssh-keygen -t rsa -N "" -f ./secret/id_rsa 109 | 110 | INFO "Generating ./secret/control.env" 111 | { echo "# generated by jepsen/docker/up.sh, parsed by jepsen/docker/control/bashrc"; 112 | echo "# NOTE: newline is expressed as ↩"; 113 | echo "SSH_PRIVATE_KEY=$(perl -p -e "s/\n/↩/g" < ./secret/id_rsa)"; 114 | echo "SSH_PUBLIC_KEY=$(cat ./secret/id_rsa.pub)"; } >> ./secret/control.env 115 | 116 | INFO "Generating authorized_keys" 117 | { echo "# generated by jepsen/docker/up.sh"; 118 | echo "$(cat ./secret/id_rsa.pub)"; } >> ./secret/authorized_keys 119 | 120 | INFO "Generating ./secret/node.env" 121 | { echo "# generated by jepsen/docker/up.sh, parsed by the \"tutum/debian\" docker image entrypoint script"; 122 | echo "ROOT_PASS=root"; } >> ./secret/node.env 123 | else 124 | INFO "No need to generate key pair" 125 | fi 126 | 127 | 128 | ## Build dockerfile 129 | bin/build-docker-compose "${NODE_COUNT}" 130 | 131 | # Make sure folders referenced in control Dockerfile exist and don't contain leftover files 132 | rm -rf ./control/jepsen 133 | mkdir -p ./control/jepsen/jepsen 134 | # Copy the jepsen directory if we're not mounting the JEPSEN_ROOT 135 | if [ -z "${DEV}" ]; then 136 | exclude_params=( 137 | --exclude=./docker 138 | --exclude=./.git 139 | ) 140 | case $(uname) in 141 | Linux) 142 | exclude_params+=(--exclude-ignore=.gitignore) 143 | ;; 144 | esac 145 | # Dockerfile does not allow `ADD ..`. So we need to copy it here in setup. 146 | INFO "Copying .. to control/jepsen" 147 | ( 148 | (cd ..; tar "${exclude_params[@]}" -cf - .) | tar Cxf ./control/jepsen - 149 | cp ../jepsen/src/jepsen/store/* ./control/jepsen/jepsen/src/jepsen/store/ 150 | ) 151 | fi 152 | 153 | if [ "${INIT_ONLY}" -eq 1 ]; then 154 | exit 0 155 | fi 156 | 157 | exists docker || 158 | { ERROR "Please install docker (https://docs.docker.com/engine/installation/)"; 159 | exit 1; } 160 | 161 | INFO "Running \`docker compose build\`" 162 | # shellcheck disable=SC2086 163 | docker compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} build 164 | 165 | INFO "Running \`docker compose up\`" 166 | if [ "${RUN_AS_DAEMON}" -eq 1 ]; then 167 | # shellcheck disable=SC2086 168 | docker compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} up -d 169 | INFO "All containers started! Run \`docker ps\` to view, and \`bin/console\` to get started." 170 | else 171 | INFO "Please run \`bin/console\` in another terminal to proceed" 172 | # shellcheck disable=SC2086 173 | docker compose --compatibility -p jepsen -f docker-compose.yml ${COMPOSE} ${DEV} up 174 | fi 175 | 176 | popd 177 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/control/core.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.control.core 2 | "Provides the base protocol for running commands on remote nodes, as well as 3 | common functions for constructing and evaluating shell commands." 4 | (:require [clojure [pprint :refer [pprint]] 5 | [string :as str]] 6 | [clojure.tools.logging :refer [info warn]] 7 | [clj-commons.slingshot :refer [try+ throw+]])) 8 | 9 | (defprotocol Remote 10 | "Remotes allow jepsen.control to run shell commands, upload, and download 11 | files. They use a *context map*, which encodes the current user, directory, 12 | etc: 13 | 14 | :dir - The directory to execute remote commands in 15 | :sudo - The user we want to execute a command as 16 | :password - The user's password, for sudo, if necessary." 17 | 18 | (connect [this conn-spec] 19 | "Set up the remote to work with a particular node. Returns a Remote which 20 | is ready to accept actions via `execute!` and `upload!` and `download!`. 21 | conn-spec is a map of: 22 | 23 | {:host 24 | :post 25 | :username 26 | :password 27 | :private-key-path 28 | :strict-host-key-checking} 29 | ") 30 | 31 | (disconnect! [this] 32 | "Disconnect a remote that has been connected to a host.") 33 | 34 | (execute! [this context action] 35 | "Execute the specified action in a remote connected a host. Takes a context 36 | map, and an action: a map of... 37 | 38 | :cmd A string command to execute. 39 | :in A string to provide for the command's stdin. 40 | 41 | Should return the action map with additional keys: 42 | 43 | :exit The command's exit status. 44 | :out The stdout string. 45 | :err The stderr string. 46 | ") 47 | 48 | (upload! [this context local-paths remote-path opts] 49 | "Copy the specified local-path to the remote-path on the connected host. 50 | 51 | Opts is an option map. There are no defined options right now, but later we 52 | might introduce some for e.g. recursive uploads, compression, etc. This is 53 | also a place for Remote implementations to offer custom semantics.") 54 | 55 | (download! [this context remote-paths local-path opts] 56 | "Copy the specified remote-paths to the local-path on the connected host. 57 | 58 | TODO: remote-paths is, in fact, a single remote path: it looks like I 59 | forgot to finish making it multiple paths. May want to fix this later--not 60 | sure whether it should be a single path or multiple. 61 | 62 | Opts is an option map. There are no defined options right now, but later we 63 | might introduce some for e.g. recursive uploads, compression, etc. This is 64 | also a place for Remote implementations to offer custom semantics.")) 65 | 66 | (defrecord Literal [string]) 67 | 68 | (defn lit 69 | "A literal string to be passed, unescaped, to the shell." 70 | [s] 71 | (Literal. s)) 72 | 73 | (defn escape 74 | "Escapes a thing for the shell. 75 | 76 | Nils are empty strings. 77 | 78 | Literal wrappers are passed through directly. 79 | 80 | The special keywords :>, :>>, and :< map to their corresponding shell I/O 81 | redirection operators. 82 | 83 | Named things like keywords and symbols use their name, escaped. Strings are 84 | escaped like normal. 85 | 86 | Sequential collections and sets have each element escaped and 87 | space-separated." 88 | [s] 89 | (cond 90 | (nil? s) 91 | "" 92 | 93 | (instance? Literal s) 94 | (:string s) 95 | 96 | (#{:> :>> :<} s) 97 | (name s) 98 | 99 | (or (sequential? s) (set? s)) 100 | (str/join " " (map escape s)) 101 | 102 | :else 103 | (let [s (if (instance? clojure.lang.Named s) 104 | (name s) 105 | (str s))] 106 | (cond 107 | ; Empty string 108 | (= "" s) 109 | "\"\"" 110 | 111 | (re-find #"[\\\$`\"\s\(\)\{\}\[\]\*\?<>&;]" s) 112 | (str "\"" 113 | (str/replace s #"([\\\$`\"])" "\\\\$1") 114 | "\"") 115 | 116 | :else s)))) 117 | 118 | (defn env 119 | "We often want to construct env vars for a process. This function takes a map 120 | of environment variable names (any Named type, e.g. :HOME, \"HOME\") to 121 | values (which are coerced using `(str value)`), and constructs a Literal 122 | string, suitable for passing to exec, which binds those environment 123 | variables. 124 | 125 | Callers of this function (especially indirectly, as with start-stop-daemon), 126 | may wish to construct env var strings themselves. Passing a string `s` to this 127 | function simply returns `(lit s)`. Passing a Literal `l` to this function 128 | returns `l`. nil is passed through unchanged." 129 | [env] 130 | (cond (map? env) (->> env 131 | (map (fn [[k v]] 132 | (str (name k) "=" (escape v)))) 133 | (str/join " ") 134 | lit) 135 | 136 | (instance? Literal env) 137 | env 138 | 139 | (instance? String env) 140 | (lit env) 141 | 142 | (nil? env) nil 143 | 144 | :else 145 | (throw (IllegalArgumentException. 146 | (str "Unsure how to construct an environment variable mapping from " (pr-str env)))))) 147 | 148 | (defn wrap-sudo 149 | "Takes a context map and a command action, and returns the command action, 150 | modified to wrap it in a sudo command, if necessary. Uses the context map's 151 | :sudo and :sudo-password fields." 152 | [{:keys [sudo sudo-password]} cmd] 153 | (if sudo 154 | (cond-> (assoc cmd :cmd (str "sudo -k -S -u " sudo " bash -c " 155 | (escape (:cmd cmd)))) 156 | ; If we have a password, provide it in the input so sudo sees it. 157 | sudo-password (assoc :in (str sudo-password "\n" (:in cmd)))) 158 | ; Not a sudo context! 159 | cmd)) 160 | 161 | (defn throw-on-nonzero-exit 162 | "Throws when an SSH result has nonzero exit status." 163 | [{:keys [exit action] :as result}] 164 | (if (and exit (zero? exit)) 165 | result 166 | (throw+ 167 | (merge {:type :jepsen.control/nonzero-exit 168 | :cmd (:cmd action)} 169 | result) 170 | nil ; cause 171 | "Command exited with non-zero status %d on node %s:\n%s\n\nSTDIN:\n%s\n\nSTDOUT:\n%s\n\nSTDERR:\n%s" 172 | exit 173 | (:host result) 174 | (:cmd action) 175 | (:in action) 176 | (:out result) 177 | (:err result)))) 178 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/reconnect.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.reconnect 2 | "Stateful wrappers for automatically reconnecting network clients. 3 | 4 | A wrapper is a map with a connection atom `conn` and a pair of functions: 5 | `(open)`, which opens a new connection, and `(close conn)`, which closes a 6 | connection. We use these to provide a with-conn macro that acquires the 7 | current connection from a wrapper, evaluates body, and automatically 8 | closes/reopens the connection when errors occur. 9 | 10 | Connect/close/reconnect lock the wrapper, but multiple threads may acquire 11 | the current connection at once." 12 | (:require [clojure.tools.logging :refer [info warn]] 13 | [jepsen.util :as util] 14 | [clj-commons.slingshot :refer [try+ throw+]]) 15 | (:import (java.io InterruptedIOException) 16 | (java.util.concurrent.locks ReentrantReadWriteLock))) 17 | 18 | (defn wrapper 19 | "A wrapper is a stateful construct for talking to a database. Options: 20 | 21 | :name An optional name for this wrapper (for debugging logs) 22 | :open A function which generates a new conn 23 | :close A function which closes a conn 24 | :log? Whether to log reconnect messages. A special value, :minimal 25 | logs only a single line rather than a full stacktrace." 26 | [options] 27 | (assert (ifn? (:open options))) 28 | (assert (ifn? (:close options))) 29 | {:open (:open options) 30 | :close (:close options) 31 | :log? (:log? options) 32 | :name (:name options) 33 | :lock (ReentrantReadWriteLock.) 34 | :conn (atom nil)}) 35 | 36 | (defmacro with-lock 37 | [wrapper lock-method & body] 38 | `(let [lock# (~lock-method ^ReentrantReadWriteLock (:lock ~wrapper))] 39 | (.lock lock#) 40 | (try ~@body 41 | (finally 42 | (.unlock lock#))))) 43 | 44 | (defmacro with-read-lock 45 | [wrapper & body] 46 | `(with-lock ~wrapper .readLock ~@body)) 47 | 48 | (defmacro with-write-lock 49 | [wrapper & body] 50 | `(with-lock ~wrapper .writeLock ~@body)) 51 | 52 | (defn conn 53 | "Active connection for a wrapper, if one exists." 54 | [wrapper] 55 | @(:conn wrapper)) 56 | 57 | (defn open! 58 | "Given a wrapper, opens a connection. Noop if conn is already open." 59 | [wrapper] 60 | (with-write-lock wrapper 61 | (when-not (conn wrapper) 62 | (let [new-conn ((:open wrapper))] 63 | (when (nil? new-conn) 64 | (throw (IllegalStateException. 65 | (str "Reconnect wrapper " (:name wrapper) 66 | "'s :open function returned nil " 67 | "instead of a connection!")))) 68 | (reset! (:conn wrapper) new-conn)))) 69 | wrapper) 70 | 71 | (defn close! 72 | "Closes a wrapper." 73 | [wrapper] 74 | (with-write-lock wrapper 75 | (when-let [c (conn wrapper)] 76 | ((:close wrapper) c) 77 | (reset! (:conn wrapper) nil))) 78 | wrapper) 79 | 80 | (defn reopen! 81 | "Reopens a wrapper's connection." 82 | [wrapper] 83 | (with-write-lock wrapper 84 | (when-let [c (conn wrapper)] 85 | ((:close wrapper) c)) 86 | (let [c' ((:open wrapper))] 87 | (when (nil? c') 88 | (throw (IllegalStateException. 89 | (str "Reconnect wrapper " (:name wrapper) 90 | "'s :open function returned nil " 91 | "instead of a connection!")))) 92 | (reset! (:conn wrapper) c'))) 93 | wrapper) 94 | 95 | (defmacro with-conn 96 | "Acquires a read lock, takes a connection from the wrapper, and evaluates 97 | body with that connection bound to c. If any Exception is thrown, closes the 98 | connection and opens a new one." 99 | [[c wrapper] & body] 100 | ; We want to hold the read lock while executing the body, but we're going to 101 | ; release it in complicated ways, so we can't use the with-read-lock macro 102 | ; here. 103 | `(let [read-lock# (.readLock ^ReentrantReadWriteLock (:lock ~wrapper))] 104 | (.lock read-lock#) 105 | (let [~c (conn ~wrapper)] 106 | (try (when (nil? ~c) 107 | (throw+ {:type ::no-conn 108 | :wrapper ~wrapper})) 109 | ~@body 110 | (catch InterruptedException e# 111 | ; When threads are interrupted, we're generally 112 | ; terminating--there's no reason to reopen or log a message here. 113 | (throw e#)) 114 | (catch InterruptedIOException e# 115 | ; Ditto here; this is a consequence of an interrupt, and we 116 | ; should, I think, treat it as if it were an interrupt itself. 117 | (throw e#)) 118 | (catch Exception e# 119 | ; We can't acquire the write lock until we release our read lock, 120 | ; because ??? 121 | (.unlock read-lock#) 122 | (try 123 | (with-write-lock ~wrapper 124 | (when (identical? ~c (conn ~wrapper)) 125 | ; This is the same conn that yielded the error 126 | (cond (= :minimal (:log? ~wrapper)) 127 | (warn (str (.getName (class e#)) 128 | " with conn " 129 | (pr-str (:name ~wrapper)) 130 | "; reopening.")) 131 | (:log? ~wrapper) 132 | (warn e# (str "Encountered error with conn " 133 | (pr-str (:name ~wrapper)) 134 | "; reopening"))) 135 | (reopen! ~wrapper))) 136 | (catch InterruptedException e# 137 | ; Same here 138 | (throw e#)) 139 | (catch Exception e2# 140 | ; We don't want to lose the original exception, but we will 141 | ; log the reconnect error here. If we don't throw the 142 | ; original exception, our caller might not know what kind of 143 | ; error occurred in their transaction logic! 144 | (cond (= :minimal (:log? ~wrapper)) 145 | (warn (str (.getName (class e2#)) 146 | " reopening " (pr-str (:name ~wrapper)))) 147 | 148 | (:log? ~wrapper) 149 | (warn e2# "Error reopening" (pr-str (:name ~wrapper))))) 150 | (finally 151 | (.lock read-lock#))) 152 | ; Right, that's done with, now we can propagate the exception 153 | (throw e#)) 154 | (finally 155 | (.unlock read-lock#)))))) 156 | -------------------------------------------------------------------------------- /jepsen/src/jepsen/os/debian.clj: -------------------------------------------------------------------------------- 1 | (ns jepsen.os.debian 2 | "Common tasks for Debian boxes." 3 | (:require [clojure [set :as set] 4 | [string :as str]] 5 | [clojure.tools.logging :refer [info]] 6 | [jepsen [control :as c :refer [|]] 7 | [net :as net] 8 | [os :as os] 9 | [util :as util :refer [meh]]] 10 | [jepsen.control.util :as cu] 11 | [clj-commons.slingshot :refer [try+ throw+]])) 12 | 13 | (def node-locks 14 | "Prevents running apt operations concurrently on the same node." 15 | (util/named-locks)) 16 | 17 | (defn setup-hostfile! 18 | "Makes sure the hostfile has a loopback entry for the local hostname" 19 | [] 20 | (let [name (c/exec :hostname) 21 | hosts (c/exec :cat "/etc/hosts") 22 | hosts' (->> hosts 23 | str/split-lines 24 | (map (fn [line] 25 | (if (re-find #"^127\.0\.0\.1\t" line) 26 | (str "127.0.0.1\tlocalhost") ; name) 27 | line))) 28 | (str/join "\n"))] 29 | (when-not (= hosts hosts') 30 | (c/su (c/exec :echo hosts' :> "/etc/hosts"))))) 31 | 32 | (defn time-since-last-update 33 | "When did we last run an apt-get update, in seconds ago" 34 | [] 35 | (- (Long/parseLong (c/exec :date "+%s")) 36 | (Long/parseLong (c/exec :stat :-c "%Y" "/var/cache/apt/pkgcache.bin" "||" :echo 0)))) 37 | 38 | (defn update! 39 | "Apt-get update." 40 | [] 41 | (util/with-named-lock node-locks c/*host* 42 | (c/su (c/exec :apt-get :--allow-releaseinfo-change :update)))) 43 | 44 | (defn maybe-update! 45 | "Apt-get update if we haven't done so recently." 46 | [] 47 | (when (< 86400 (time-since-last-update)) 48 | (update!))) 49 | 50 | (defn installed 51 | "Given a list of debian packages (strings, symbols, keywords, etc), returns 52 | the set of packages which are installed, as strings." 53 | [pkgs] 54 | (let [pkgs (->> pkgs (map name) set)] 55 | (->> (apply c/exec :dpkg :--get-selections pkgs) 56 | str/split-lines 57 | (map (fn [line] (str/split line #"\s+"))) 58 | (filter #(= "install" (second %))) 59 | (map first) 60 | (map (fn [p] (str/replace p #":amd64|:i386" {":amd64" "" ":i386" ""}))) 61 | set))) 62 | 63 | (defn uninstall! 64 | "Removes a package or packages." 65 | [pkg-or-pkgs] 66 | (util/with-named-lock node-locks c/*host* 67 | (let [pkgs (if (coll? pkg-or-pkgs) pkg-or-pkgs (list pkg-or-pkgs)) 68 | pkgs (installed pkgs)] 69 | (c/su (apply c/exec :apt-get :remove :--purge :-y pkgs))))) 70 | 71 | (defn installed? 72 | "Are the given debian packages, or singular package, installed on the current 73 | system?" 74 | [pkg-or-pkgs] 75 | (let [pkgs (if (coll? pkg-or-pkgs) pkg-or-pkgs (list pkg-or-pkgs))] 76 | (every? (installed pkgs) (map name pkgs)))) 77 | 78 | (defn installed-version 79 | "Given a package name, determines the installed version of that package, or 80 | nil if it is not installed." 81 | [pkg] 82 | (->> (c/exec :apt-cache :policy (name pkg)) 83 | (re-find #"Installed: ([^\s]+)") 84 | second)) 85 | 86 | (defn install 87 | "Ensure the given packages are installed. Can take a flat collection of 88 | packages, passed as symbols, strings, or keywords, or, alternatively, a map 89 | of packages to version strings. Can optionally take a collection of 90 | additional CLI options to be passed to apt-get." 91 | ([pkgs] 92 | (install pkgs [])) 93 | ([pkgs apt-opts] 94 | (if (map? pkgs) 95 | ; Install specific versions 96 | (dorun 97 | (for [[pkg version] pkgs] 98 | (when (not= version (installed-version pkg)) 99 | (util/with-named-lock node-locks c/*host* 100 | (info "Installing" pkg version) 101 | (c/su 102 | (c/exec :env "DEBIAN_FRONTEND=noninteractive" 103 | :apt-get :install 104 | :-y 105 | :--allow-downgrades 106 | :--allow-change-held-packages 107 | apt-opts 108 | (str (name pkg) "=" version))))))) 109 | 110 | ; Install any version 111 | (let [pkgs (set (map name pkgs)) 112 | missing (set/difference pkgs (installed pkgs))] 113 | (when-not (empty? missing) 114 | (util/with-named-lock node-locks c/*host* 115 | (c/su 116 | (info "Installing" missing) 117 | (apply c/exec :env "DEBIAN_FRONTEND=noninteractive" 118 | :apt-get :install 119 | :-y 120 | :--allow-downgrades 121 | :--allow-change-held-packages 122 | apt-opts 123 | missing)))))))) 124 | 125 | (defn add-key! 126 | "Receives an apt key from the given keyserver." 127 | [keyserver key] 128 | (c/su 129 | (c/exec :apt-key :adv 130 | :--keyserver keyserver 131 | :--recv key))) 132 | 133 | (defn add-repo! 134 | "Adds an apt repo (and optionally a key from the given keyserver)." 135 | ([repo-name apt-line] 136 | (add-repo! repo-name apt-line nil nil)) 137 | ([repo-name apt-line keyserver key] 138 | (let [list-file (str "/etc/apt/sources.list.d/" (name repo-name) ".list")] 139 | (when-not (cu/exists? list-file) 140 | (info "setting up" repo-name "apt repo") 141 | (when (or keyserver key) 142 | (add-key! keyserver key)) 143 | (c/exec :echo apt-line :> list-file) 144 | (update!))))) 145 | 146 | (defn install-jdk11! 147 | "Installs an openjdk jdk11 via stretch-backports." 148 | [] 149 | (c/su 150 | (add-repo! 151 | "stretch-backports" 152 | "deb http://deb.debian.org/debian stretch-backports main") 153 | (install [:openjdk-11-jdk]))) 154 | 155 | (deftype Debian [] 156 | os/OS 157 | (setup! [_ test node] 158 | (info node "setting up debian") 159 | 160 | (setup-hostfile!) 161 | (maybe-update!) 162 | 163 | (c/su 164 | ; Packages! 165 | (install [:apt-transport-https 166 | :build-essential 167 | :libzip4 168 | :wget 169 | :curl 170 | :vim 171 | :man-db 172 | :faketime 173 | :netcat-openbsd 174 | :ntpdate 175 | :unzip 176 | :iptables 177 | :psmisc 178 | :tar 179 | :bzip2 180 | :iputils-ping 181 | :iproute2 182 | :rsyslog 183 | :logrotate 184 | :dirmngr 185 | :tcpdump])) 186 | 187 | (meh (net/heal! (:net test) test))) 188 | 189 | (teardown! [_ test node])) 190 | 191 | (def os "An implementation of the Debian OS." (Debian.)) 192 | --------------------------------------------------------------------------------