├── .gitignore ├── LICENSE ├── Makefile ├── README.md ├── bank ├── .gitignore ├── README.md └── main.go ├── bank2 ├── .gitignore ├── README.md └── main.go ├── block_writer ├── .gitignore ├── Dockerfile ├── README.md └── main.go ├── fakerealtime ├── .gitignore ├── README.md └── main.go ├── filesystem ├── .gitignore ├── README.md ├── block.go ├── block_test.go ├── fs.go ├── main.go ├── node.go └── sql.go ├── hotspot ├── .gitignore ├── README.md └── main.go ├── ledger ├── .gitignore ├── README.md └── main.go ├── photos ├── db.go ├── main.go ├── user.go └── user_test.go ├── teamcity-push.sh └── teamcity-test.sh /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | 3 | # Compiled Object files, Static and Dynamic libs (Shared Objects) 4 | *.o 5 | *.a 6 | *.so 7 | 8 | # Folders 9 | _obj 10 | _test 11 | 12 | # Architecture specific extensions/prefixes 13 | *.[568vq] 14 | [568vq].out 15 | 16 | *.cgo1.go 17 | *.cgo2.c 18 | _cgo_defun.c 19 | _cgo_gotypes.go 20 | _cgo_export.* 21 | 22 | _testmain.go 23 | 24 | photos/photos 25 | 26 | *.exe 27 | *.test 28 | *.prof 29 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # Copyright 2014 The Cockroach Authors. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | # implied. See the License for the specific language governing 13 | # permissions and limitations under the License. See the AUTHORS file 14 | # for names of contributors. 15 | # 16 | # Author: Andrew Bonventre (andybons@gmail.com) 17 | # Author: Shawn Morel (shawnmorel@gmail.com) 18 | # Author: Spencer Kimball (spencer.kimball@gmail.com) 19 | # 20 | 21 | # Cockroach build rules. 22 | GO ?= go 23 | # Allow setting of go build flags from the command line. 24 | GOFLAGS := 25 | # Set to 1 to use static linking for all builds (including tests). 26 | STATIC := 27 | 28 | ifeq ($(STATIC),1) 29 | LDFLAGS += -s -w -extldflags "-static" 30 | endif 31 | 32 | .PHONY: all 33 | all: build test check 34 | 35 | .PHONY: test 36 | test: 37 | $(GO) test -v -i ./... 38 | $(GO) test -v ./... 39 | 40 | .PHONY: deps 41 | deps: 42 | $(GO) get -d bazil.org/fuse 43 | $(GO) get -d -t ./... 44 | $(GO) get github.com/golang/lint/golint 45 | 46 | .PHONY: build 47 | build: deps block_writer fakerealtime filesystem bank photos 48 | 49 | .PHONY: block_writer 50 | block_writer: 51 | $(GO) build -tags '$(TAGS)' $(GOFLAGS) -ldflags '$(LDFLAGS)' -v -i -o block_writer/block_writer ./block_writer 52 | 53 | .PHONY: fakerealtime 54 | fakerealtime: 55 | $(GO) build -tags '$(TAGS)' $(GOFLAGS) -ldflags '$(LDFLAGS)' -v -i -o fakerealtime/fakerealtime ./fakerealtime 56 | 57 | .PHONY: filesystem 58 | filesystem: 59 | $(GO) build -tags '$(TAGS)' $(GOFLAGS) -ldflags '$(LDFLAGS)' -v -i -o filesystem/filesystem ./filesystem 60 | 61 | .PHONY: hotspot 62 | hotspot: 63 | $(GO) build -tags '$(TAGS)' $(GOFLAGS) -ldflags '$(LDFLAGS)' -v -i -o hotspot/hotspot ./hotspot 64 | 65 | .PHONY: bank 66 | bank: 67 | $(GO) build -tags '$(TAGS)' $(GOFLAGS) -ldflags '$(LDFLAGS)' -v -i -o bank/bank ./bank 68 | 69 | .PHONY: bank2 70 | bank2: 71 | $(GO) build -tags '$(TAGS)' $(GOFLAGS) -ldflags '$(LDFLAGS)' -v -i -o bank2/bank2 ./bank2 72 | 73 | .PHONY: ledger 74 | ledger: 75 | $(GO) build -tags '$(TAGS)' $(GOFLAGS) -ldflags '$(LDFLAGS)' -v -i -o ledger/ledger ./ledger 76 | 77 | .PHONY: photos 78 | photos: 79 | $(GO) build -tags '$(TAGS)' $(GOFLAGS) -ldflags '$(LDFLAGS)' -v -i -o photos/photos ./photos 80 | 81 | .PHONY: check 82 | check: 83 | @echo "checking for tabs in shell scripts" 84 | @! git grep -F ' ' -- '*.sh' 85 | @echo "checking for \"path\" imports" 86 | @! git grep -F '"path"' -- '*.go' 87 | @echo "errcheck" 88 | @errcheck -ignore=Fprintf ./... 89 | @echo "vet" 90 | @! go tool vet . 2>&1 | \ 91 | grep -vE '^vet: cannot process directory .git' 92 | @echo "vet --shadow" 93 | @! go tool vet --shadow . 2>&1 | \ 94 | grep -vE '(declaration of "err" shadows|^vet: cannot process directory \.git)' 95 | @echo "golint" 96 | @! golint ./... | grep -vE '(\.pb\.go)' 97 | @echo "gofmt (simplify)" 98 | @! gofmt -s -d -l . 2>&1 | grep -vE '^\.git/' 99 | @echo "goimports" 100 | @! goimports -l . | grep -vF 'No Exceptions' 101 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Cockroach Go examples 2 | 3 | This repo contains example uses of cockroach DB using Go clients. 4 | These are informative, and not meant to be complete or bug-free solutions. 5 | -------------------------------------------------------------------------------- /bank/.gitignore: -------------------------------------------------------------------------------- 1 | bank 2 | -------------------------------------------------------------------------------- /bank/README.md: -------------------------------------------------------------------------------- 1 | # Bank example 2 | 3 | ## Summary 4 | 5 | The bank example program continuously performs balance transfers between 6 | accounts using concurrent transactions. 7 | 8 | ## Running 9 | 10 | Run against an existing cockroach node or cluster. 11 | 12 | #### Insecure node or cluster 13 | ``` 14 | # Launch your node or cluster in insecure mode (with --insecure passed to cockroach). 15 | # Find a reachable address: [mycockroach:26257]. 16 | # Run the example with: 17 | ./sql_bank postgres://root@mycockroach:26257?sslmode=disable 18 | ``` 19 | 20 | #### Secure node or cluster 21 | ``` 22 | # Launch your node or cluster in secure mode with certificates in [mycertsdir] 23 | # Find a reachable address:[mycockroach:26257]. 24 | # Run the example with: 25 | ./sql_bank "postgres://root@mycockroach:26257?sslcert=mycertsdir/root.client.crt&sslkey=mycertsdir/root.client.key" 26 | ``` 27 | -------------------------------------------------------------------------------- /bank/main.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Tamir Duberstein 17 | 18 | package main 19 | 20 | import ( 21 | "database/sql" 22 | "flag" 23 | "fmt" 24 | "log" 25 | "math/rand" 26 | "net/url" 27 | "os" 28 | "time" 29 | 30 | // Import postgres driver. 31 | _ "github.com/lib/pq" 32 | ) 33 | 34 | var maxTransfer = flag.Int("max-transfer", 999, "Maximum amount to transfer in one transaction.") 35 | var numAccounts = flag.Int("num-accounts", 999, "Number of accounts.") 36 | var concurrency = flag.Int("concurrency", 5, "Number of concurrent actors moving money.") 37 | var transferStyle = flag.String("transfer-style", "txn", "\"single-stmt\" or \"txn\"") 38 | var balanceCheckInterval = flag.Duration("balance-check-interval", 1*time.Second, "Interval of balance check.") 39 | 40 | type measurement struct { 41 | read, write, total time.Duration 42 | } 43 | 44 | func moveMoney(db *sql.DB, readings chan measurement) { 45 | for { 46 | from, to := rand.Intn(*numAccounts), rand.Intn(*numAccounts) 47 | if from == to { 48 | continue 49 | } 50 | amount := rand.Intn(*maxTransfer) 51 | switch *transferStyle { 52 | case "single-stmt": 53 | update := ` 54 | UPDATE accounts 55 | SET balance = CASE id WHEN $1 THEN balance-$3 WHEN $2 THEN balance+$3 END 56 | WHERE id IN ($1, $2) AND (SELECT balance >= $3 FROM accounts WHERE id = $1) 57 | ` 58 | start := time.Now() 59 | result, err := db.Exec(update, from, to, amount) 60 | if err != nil { 61 | log.Print(err) 62 | continue 63 | } 64 | affected, err := result.RowsAffected() 65 | if err != nil { 66 | log.Fatal(err) 67 | } 68 | if affected > 0 { 69 | d := time.Since(start) 70 | readings <- measurement{read: d, write: d, total: d} 71 | } 72 | 73 | case "txn": 74 | start := time.Now() 75 | tx, err := db.Begin() 76 | if err != nil { 77 | log.Fatal(err) 78 | } 79 | startRead := time.Now() 80 | rows, err := tx.Query(`SELECT id, balance FROM accounts WHERE id IN ($1, $2)`, from, to) 81 | if err != nil { 82 | log.Print(err) 83 | if err = tx.Rollback(); err != nil { 84 | log.Fatal(err) 85 | } 86 | continue 87 | } 88 | readDuration := time.Since(startRead) 89 | var fromBalance, toBalance int 90 | for rows.Next() { 91 | var id, balance int 92 | if err = rows.Scan(&id, &balance); err != nil { 93 | log.Fatal(err) 94 | } 95 | switch id { 96 | case from: 97 | fromBalance = balance 98 | case to: 99 | toBalance = balance 100 | default: 101 | panic(fmt.Sprintf("got unexpected account %d", id)) 102 | } 103 | } 104 | startWrite := time.Now() 105 | if fromBalance >= amount { 106 | update := `UPDATE accounts 107 | SET balance = CASE id WHEN $1 THEN $3::int WHEN $2 THEN $4::int END 108 | WHERE id IN ($1, $2)` 109 | if _, err = tx.Exec(update, to, from, toBalance+amount, fromBalance-amount); err != nil { 110 | log.Print(err) 111 | if err = tx.Rollback(); err != nil { 112 | log.Fatal(err) 113 | } 114 | continue 115 | } 116 | } 117 | writeDuration := time.Since(startWrite) 118 | if err = tx.Commit(); err != nil { 119 | log.Print(err) 120 | continue 121 | } 122 | if fromBalance >= amount { 123 | readings <- measurement{read: readDuration, write: writeDuration, total: time.Since(start)} 124 | } 125 | } 126 | } 127 | } 128 | 129 | func verifyBank(db *sql.DB) { 130 | var sum int 131 | if err := db.QueryRow("SELECT SUM(balance) FROM accounts").Scan(&sum); err != nil { 132 | log.Fatal(err) 133 | } 134 | if sum == *numAccounts*1000 { 135 | log.Print("The bank is in good order.") 136 | } else { 137 | log.Printf("The bank is not in good order. Total value: %d", sum) 138 | os.Exit(1) 139 | } 140 | } 141 | 142 | var usage = func() { 143 | fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0]) 144 | fmt.Fprintf(os.Stderr, " %s \n\n", os.Args[0]) 145 | flag.PrintDefaults() 146 | } 147 | 148 | func main() { 149 | flag.Usage = usage 150 | flag.Parse() 151 | 152 | if flag.NArg() != 1 { 153 | usage() 154 | os.Exit(2) 155 | } 156 | 157 | dbURL := flag.Arg(0) 158 | 159 | parsedURL, err := url.Parse(dbURL) 160 | if err != nil { 161 | log.Fatal(err) 162 | } 163 | parsedURL.Path = "bank" 164 | 165 | db, err := sql.Open("postgres", parsedURL.String()) 166 | if err != nil { 167 | log.Fatal(err) 168 | } 169 | defer func() { _ = db.Close() }() 170 | 171 | if _, err := db.Exec("CREATE DATABASE IF NOT EXISTS bank"); err != nil { 172 | log.Fatal(err) 173 | } 174 | 175 | // concurrency + 1, for this thread and the "concurrency" number of 176 | // goroutines that move money 177 | db.SetMaxOpenConns(*concurrency + 1) 178 | 179 | if _, err = db.Exec("CREATE TABLE IF NOT EXISTS accounts (id BIGINT PRIMARY KEY, balance BIGINT NOT NULL)"); err != nil { 180 | log.Fatal(err) 181 | } 182 | 183 | if _, err = db.Exec("TRUNCATE TABLE accounts"); err != nil { 184 | log.Fatal(err) 185 | } 186 | 187 | for i := 0; i < *numAccounts; i++ { 188 | if _, err = db.Exec("INSERT INTO accounts (id, balance) VALUES ($1, $2)", i, 1000); err != nil { 189 | log.Fatal(err) 190 | } 191 | } 192 | 193 | verifyBank(db) 194 | 195 | lastNow := time.Now() 196 | readings := make(chan measurement, 10000) 197 | 198 | for i := 0; i < *concurrency; i++ { 199 | go moveMoney(db, readings) 200 | } 201 | 202 | for range time.NewTicker(*balanceCheckInterval).C { 203 | now := time.Now() 204 | elapsed := time.Since(lastNow) 205 | lastNow = now 206 | transfers := len(readings) 207 | log.Printf("%d transfers were executed at %.1f/second.", transfers, float64(transfers)/elapsed.Seconds()) 208 | if transfers > 0 { 209 | var aggr measurement 210 | for i := 0; i < transfers; i++ { 211 | reading := <-readings 212 | aggr.read += reading.read 213 | aggr.write += reading.write 214 | aggr.total += reading.total 215 | } 216 | d := time.Duration(transfers) 217 | log.Printf("read time: %v, write time: %v, txn time: %v", aggr.read/d, aggr.write/d, aggr.total/d) 218 | } 219 | verifyBank(db) 220 | } 221 | } 222 | -------------------------------------------------------------------------------- /bank2/.gitignore: -------------------------------------------------------------------------------- 1 | bank2 2 | -------------------------------------------------------------------------------- /bank2/README.md: -------------------------------------------------------------------------------- 1 | # Bank example (part deux) 2 | 3 | ## Summary 4 | 5 | This bank example program transfers money between accounts, creating 6 | new ledger transactions in the form of a transaction record and two 7 | transaction "legs" per database transaction. Each transfer additionally 8 | queries and updates account balances. 9 | 10 | There are two mechanisms for running: high contention and low contention. 11 | Specify -contention={high|low} on the command line to specify which. The 12 | default is low contention. 13 | 14 | ## Running 15 | 16 | Run against an existing cockroach node or cluster. 17 | 18 | #### Insecure node or cluster 19 | ``` 20 | # Launch your node or cluster in insecure mode (with --insecure passed to cockroach). 21 | # Find a reachable address: [mycockroach:26257]. 22 | # Run the example with: 23 | ./bank2 postgres://root@mycockroach:26257?sslmode=disable 24 | ``` 25 | 26 | #### Secure node or cluster 27 | ``` 28 | # Launch your node or cluster in secure mode with certificates in [mycertsdir] 29 | # Find a reachable address:[mycockroach:26257]. 30 | # Run the example with: 31 | ./bank2 "postgres://root@mycockroach:26257?sslcert=mycertsdir/root.client.crt&sslkey=mycertsdir/root.client.key" 32 | ``` 33 | -------------------------------------------------------------------------------- /bank2/main.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Spencer Kimball (spencer@cockroachlabs.com) 17 | 18 | package main 19 | 20 | import ( 21 | "context" 22 | "database/sql" 23 | "flag" 24 | "fmt" 25 | "log" 26 | "math/rand" 27 | "net/url" 28 | "os" 29 | "sync/atomic" 30 | "time" 31 | 32 | "github.com/cockroachdb/cockroach-go/crdb" 33 | ) 34 | 35 | const systemAccountID = 0 36 | const initialBalance = 1000 37 | 38 | var maxTransfer = flag.Int("max-transfer", 100, "Maximum amount to transfer in one transaction.") 39 | var numTransfers = flag.Int("num-transfers", 0, "Number of transfers (0 to continue indefinitely).") 40 | var numAccounts = flag.Int("num-accounts", 100, "Number of accounts.") 41 | var concurrency = flag.Int("concurrency", 16, "Number of concurrent actors moving money.") 42 | var contention = flag.String("contention", "low", "Contention model {low | high}.") 43 | var updatemethod = flag.String("updatemethod", "update", "update or upsert DML {update | upsert}.") 44 | var balanceCheckInterval = flag.Duration("balance-check-interval", time.Second, "Interval of balance check.") 45 | var parallelStmts = flag.Bool("parallel-stmts", false, "Run independent statements in parallel.") 46 | 47 | var txnCount int32 48 | var successCount int32 49 | var initialSystemBalance int 50 | 51 | type measurement struct { 52 | read, write, total int64 53 | retries int32 54 | } 55 | 56 | func transfersComplete() bool { 57 | return *numTransfers > 0 && atomic.LoadInt32(&successCount) >= int32(*numTransfers) 58 | } 59 | 60 | func moveMoney(db *sql.DB, aggr *measurement) { 61 | useSystemAccount := *contention == "high" 62 | 63 | for !transfersComplete() { 64 | var startWrite time.Time 65 | var readDuration time.Duration 66 | var fromBalance, toBalance int 67 | from, to := rand.Intn(*numAccounts)+1, rand.Intn(*numAccounts)+1 68 | if from == to { 69 | continue 70 | } 71 | if useSystemAccount { 72 | // Use the first account number we generated as a coin flip to 73 | // determine whether we're transferring money into or out of 74 | // the system account. 75 | if from > *numAccounts/2 { 76 | from = systemAccountID 77 | } else { 78 | to = systemAccountID 79 | } 80 | } 81 | amount := rand.Intn(*maxTransfer) 82 | start := time.Now() 83 | attempts := 0 84 | 85 | if err := crdb.ExecuteTx(context.TODO(), db, nil, func(tx *sql.Tx) error { 86 | attempts++ 87 | if attempts > 1 { 88 | atomic.AddInt32(&aggr.retries, 1) 89 | } 90 | startRead := time.Now() 91 | rows, err := tx.Query(`SELECT id, balance FROM account WHERE id IN ($1, $2)`, from, to) 92 | if err != nil { 93 | return err 94 | } 95 | readDuration = time.Since(startRead) 96 | for rows.Next() { 97 | var id, balance int 98 | if err = rows.Scan(&id, &balance); err != nil { 99 | log.Fatal(err) 100 | } 101 | switch id { 102 | case from: 103 | fromBalance = balance 104 | case to: 105 | toBalance = balance 106 | default: 107 | panic(fmt.Sprintf("got unexpected account %d", id)) 108 | } 109 | } 110 | startWrite = time.Now() 111 | if fromBalance < amount { 112 | return nil 113 | } 114 | insertTxn := `INSERT INTO transaction (id, txn_ref) VALUES ($1, $2)` 115 | insertTxnLeg := `INSERT INTO transaction_leg (account_id, amount, running_balance, txn_id) VALUES ($1, $2, $3, $4)` 116 | var updateAcct string 117 | if *updatemethod == "update" { 118 | updateAcct = `UPDATE account SET balance = $1 WHERE id = $2` 119 | } else { 120 | updateAcct = `INSERT into account (balance,id) VALUES ($1,$2) ON CONFLICT (id) DO UPDATE SET balance = excluded.balance` 121 | } 122 | if *parallelStmts { 123 | const parallelize = ` RETURNING NOTHING` 124 | insertTxn += parallelize 125 | insertTxnLeg += parallelize 126 | updateAcct += parallelize 127 | } 128 | txnID := atomic.AddInt32(&txnCount, 1) 129 | if _, err = tx.Exec(insertTxn, txnID, fmt.Sprintf("txn %d", txnID)); err != nil { 130 | return err 131 | } 132 | if _, err = tx.Exec(insertTxnLeg, from, -amount, fromBalance-amount, txnID); err != nil { 133 | return err 134 | } 135 | if _, err = tx.Exec(insertTxnLeg, to, amount, toBalance+amount, txnID); err != nil { 136 | return err 137 | } 138 | if _, err = tx.Exec(updateAcct, toBalance+amount, to); err != nil { 139 | return err 140 | } 141 | if _, err = tx.Exec(updateAcct, fromBalance-amount, from); err != nil { 142 | return err 143 | } 144 | return nil 145 | }); err != nil { 146 | log.Printf("failed transaction: %v", err) 147 | continue 148 | } 149 | if fromBalance >= amount { 150 | atomic.AddInt32(&successCount, 1) 151 | atomic.AddInt64(&aggr.read, readDuration.Nanoseconds()) 152 | atomic.AddInt64(&aggr.write, time.Since(startWrite).Nanoseconds()) 153 | atomic.AddInt64(&aggr.total, time.Since(start).Nanoseconds()) 154 | } 155 | } 156 | } 157 | 158 | func verifyTotalBalance(db *sql.DB) { 159 | var sum int 160 | if err := db.QueryRow("SELECT SUM(balance) FROM account").Scan(&sum); err != nil { 161 | log.Fatal(err) 162 | } 163 | if sum != *numAccounts*initialBalance+initialSystemBalance { 164 | log.Printf("The total balance is incorrect: %d.", sum) 165 | os.Exit(1) 166 | } 167 | } 168 | 169 | var usage = func() { 170 | fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0]) 171 | fmt.Fprintf(os.Stderr, " %s \n\n", os.Args[0]) 172 | flag.PrintDefaults() 173 | } 174 | 175 | func main() { 176 | flag.Usage = usage 177 | flag.Parse() 178 | 179 | dbURL := "postgresql://root@localhost:26257/bank2?sslmode=disable" 180 | if flag.NArg() == 1 { 181 | dbURL = flag.Arg(0) 182 | } 183 | 184 | parsedURL, err := url.Parse(dbURL) 185 | if err != nil { 186 | log.Fatal(err) 187 | } 188 | parsedURL.Path = "bank2" 189 | 190 | db, err := sql.Open("postgres", parsedURL.String()) 191 | if err != nil { 192 | log.Fatal(err) 193 | } 194 | defer func() { _ = db.Close() }() 195 | 196 | if _, err := db.Exec("CREATE DATABASE IF NOT EXISTS bank2"); err != nil { 197 | log.Fatal(err) 198 | } 199 | 200 | // concurrency + 1, for this thread and the "concurrency" number of 201 | // goroutines that move money 202 | db.SetMaxOpenConns(*concurrency + 1) 203 | db.SetMaxIdleConns(*concurrency + 1) 204 | 205 | if _, err = db.Exec(` 206 | CREATE TABLE IF NOT EXISTS account ( 207 | id INT, 208 | balance INT NOT NULL, 209 | name STRING, 210 | 211 | PRIMARY KEY (id), 212 | UNIQUE INDEX byName (name) 213 | ); 214 | 215 | CREATE TABLE IF NOT EXISTS transaction ( 216 | id INT, 217 | booking_date TIMESTAMP DEFAULT NOW(), 218 | txn_date TIMESTAMP DEFAULT NOW(), 219 | txn_ref STRING, 220 | 221 | PRIMARY KEY (id), 222 | UNIQUE INDEX byTxnRef (txn_ref) 223 | ); 224 | 225 | CREATE TABLE IF NOT EXISTS transaction_leg ( 226 | id BYTES DEFAULT uuid_v4(), 227 | account_id INT, 228 | amount INT NOT NULL, 229 | running_balance INT NOT NULL, 230 | txn_id INT, 231 | 232 | PRIMARY KEY (id) 233 | ); 234 | 235 | TRUNCATE TABLE account; 236 | TRUNCATE TABLE transaction; 237 | TRUNCATE TABLE transaction_leg; 238 | `); err != nil { 239 | log.Fatal(err) 240 | } 241 | 242 | insertSQL := "INSERT INTO account (id, balance, name) VALUES ($1, $2, $3)" 243 | 244 | // Insert initialSystemBalance into the system account. 245 | initialSystemBalance = *numAccounts * initialBalance 246 | if _, err = db.Exec(insertSQL, systemAccountID, initialSystemBalance, "system account"); err != nil { 247 | log.Fatal(err) 248 | } 249 | // Insert initialBalance into all user accounts. 250 | for i := 1; i <= *numAccounts; i++ { 251 | if _, err = db.Exec(insertSQL, i, initialBalance, fmt.Sprintf("account %d", i)); err != nil { 252 | log.Fatal(err) 253 | } 254 | } 255 | 256 | verifyTotalBalance(db) 257 | 258 | var aggr measurement 259 | var lastSuccesses int32 260 | for i := 0; i < *concurrency; i++ { 261 | go moveMoney(db, &aggr) 262 | } 263 | 264 | start := time.Now() 265 | lastTime := start 266 | for range time.NewTicker(*balanceCheckInterval).C { 267 | now := time.Now() 268 | elapsed := now.Sub(lastTime) 269 | lastTime = now 270 | successes := atomic.LoadInt32(&successCount) 271 | newSuccesses := (successes - lastSuccesses) 272 | log.Printf("%d transfers were executed at %.1f/s", newSuccesses, float64(newSuccesses)/elapsed.Seconds()) 273 | lastSuccesses = successes 274 | 275 | d := time.Duration(successes) 276 | read := time.Duration(atomic.LoadInt64(&aggr.read)) 277 | write := time.Duration(atomic.LoadInt64(&aggr.write)) 278 | total := time.Duration(atomic.LoadInt64(&aggr.total)) 279 | retries := time.Duration(atomic.LoadInt32(&aggr.retries)) 280 | log.Printf("averages: read: %v, write: %v, txn: %v, retries: %d", 281 | read/d, write/d, total/d, retries/d) 282 | verifyTotalBalance(db) 283 | if transfersComplete() { 284 | break 285 | } 286 | } 287 | log.Printf("completed %d transfers in %s with %d retries", atomic.LoadInt32(&successCount), 288 | time.Since(start), atomic.LoadInt32(&aggr.retries)) 289 | } 290 | -------------------------------------------------------------------------------- /block_writer/.gitignore: -------------------------------------------------------------------------------- 1 | block_writer 2 | -------------------------------------------------------------------------------- /block_writer/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM golang:1.11.1 2 | 3 | COPY *.go src/github.com/cockroachdb/examples-go/block_writer/ 4 | 5 | RUN \ 6 | go get github.com/cockroachdb/examples-go/block_writer && \ 7 | go install github.com/cockroachdb/examples-go/block_writer && \ 8 | rm -rf $GOPATH/src 9 | 10 | ENTRYPOINT ["/go/bin/block_writer"] 11 | -------------------------------------------------------------------------------- /block_writer/README.md: -------------------------------------------------------------------------------- 1 | # Block writer example 2 | 3 | ## Summary 4 | 5 | The block writer example program is a write-only workload intended to insert 6 | a large amount of data into cockroach quickly. This example is intended to 7 | trigger range splits and rebalances. 8 | 9 | ## Running 10 | 11 | Run against an existing cockroach node or cluster. 12 | 13 | #### Insecure node or cluster 14 | ``` 15 | # Launch your node or cluster in insecure mode (with --insecure passed to cockroach). 16 | # Find a reachable address: [mycockroach:26257]. 17 | # Run the example with: 18 | ./block_writer postgres://root@mycockroach:26257?sslmode=disable 19 | ``` 20 | 21 | #### Secure node or cluster 22 | ``` 23 | # Launch your node or cluster in secure mode with certificates in [mycertsdir] 24 | # Find a reachable address:[mycockroach:26257]. 25 | # Run the example with: 26 | ./block_writer "postgres://root@mycockroach:26257?sslcert=mycertsdir/root.client.crt&sslkey=mycertsdir/root.client.key" 27 | ``` 28 | -------------------------------------------------------------------------------- /block_writer/main.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Matt Tracy 17 | 18 | // The block writer example program is a write-only workload intended to insert 19 | // a large amount of data into cockroach quickly. This example is intended to 20 | // trigger range splits and rebalances. 21 | package main 22 | 23 | import ( 24 | "bytes" 25 | "database/sql" 26 | "flag" 27 | "fmt" 28 | "log" 29 | "math/rand" 30 | "net/url" 31 | "os" 32 | "os/signal" 33 | "runtime" 34 | "sync" 35 | "sync/atomic" 36 | "syscall" 37 | "time" 38 | 39 | "github.com/codahale/hdrhistogram" 40 | "github.com/satori/go.uuid" 41 | 42 | // Import postgres driver. 43 | _ "github.com/lib/pq" 44 | ) 45 | 46 | const ( 47 | insertBlockStmt = `INSERT INTO blocks (block_id, writer_id, block_num, raw_bytes) VALUES` 48 | ) 49 | 50 | var createDB = flag.Bool("create-db", true, "Attempt to create the database (root user only)") 51 | 52 | // concurrency = number of concurrent insertion processes. 53 | var concurrency = flag.Int("concurrency", 2*runtime.NumCPU(), "Number of concurrent writers inserting blocks") 54 | 55 | // batch = number of blocks to insert in a single SQL statement. 56 | var batch = flag.Int("batch", 1, "Number of blocks to insert in a single SQL statement") 57 | 58 | var splits = flag.Int("splits", 0, "Number of splits to perform before starting normal operations") 59 | 60 | var tolerateErrors = flag.Bool("tolerate-errors", false, "Keep running on error") 61 | 62 | // outputInterval = interval at which information is output to console. 63 | var outputInterval = flag.Duration("output-interval", 1*time.Second, "Interval of output") 64 | 65 | // Minimum and maximum size of inserted blocks. 66 | var minBlockSizeBytes = flag.Int("min-block-bytes", 256, "Minimum amount of raw data written with each insertion") 67 | var maxBlockSizeBytes = flag.Int("max-block-bytes", 1024, "Maximum amount of raw data written with each insertion") 68 | 69 | var maxBlocks = flag.Uint64("max-blocks", 0, "Maximum number of blocks to write") 70 | var duration = flag.Duration("duration", 0, "The duration to run. If 0, run forever.") 71 | var benchmarkName = flag.String("benchmark-name", "BenchmarkBlockWriter", "Test name to report "+ 72 | "for Go benchmark results.") 73 | 74 | // numBlocks keeps a global count of successfully written blocks. 75 | var numBlocks uint64 76 | 77 | const ( 78 | minLatency = 100 * time.Microsecond 79 | maxLatency = 10 * time.Second 80 | ) 81 | 82 | func clampLatency(d, min, max time.Duration) time.Duration { 83 | if d < min { 84 | return min 85 | } 86 | if d > max { 87 | return max 88 | } 89 | return d 90 | } 91 | 92 | type blockWriter struct { 93 | db *sql.DB 94 | rand *rand.Rand 95 | latency struct { 96 | sync.Mutex 97 | *hdrhistogram.WindowedHistogram 98 | } 99 | } 100 | 101 | func newBlockWriter(db *sql.DB) *blockWriter { 102 | bw := &blockWriter{ 103 | db: db, 104 | rand: rand.New(rand.NewSource(int64(time.Now().UnixNano()))), 105 | } 106 | bw.latency.WindowedHistogram = hdrhistogram.NewWindowed(1, 107 | minLatency.Nanoseconds(), maxLatency.Nanoseconds(), 1) 108 | return bw 109 | } 110 | 111 | // run is an infinite loop in which the blockWriter continuously attempts to 112 | // write blocks of random data into a table in cockroach DB. 113 | func (bw *blockWriter) run(errCh chan<- error, wg *sync.WaitGroup) { 114 | defer wg.Done() 115 | 116 | id := uuid.Must(uuid.NewV4()).String() 117 | var blockCount uint64 118 | 119 | for { 120 | var buf bytes.Buffer 121 | var args []interface{} 122 | fmt.Fprintf(&buf, "%s", insertBlockStmt) 123 | 124 | for i := 0; i < *batch; i++ { 125 | blockID := bw.rand.Int63() 126 | blockCount++ 127 | args = append(args, bw.randomBlock()) 128 | if i > 0 { 129 | fmt.Fprintf(&buf, ",") 130 | } 131 | fmt.Fprintf(&buf, ` (%d, '%s', %d, $%d)`, blockID, id, blockCount, i+1) 132 | } 133 | 134 | start := time.Now() 135 | if _, err := bw.db.Exec(buf.String(), args...); err != nil { 136 | errCh <- err 137 | } else { 138 | elapsed := clampLatency(time.Since(start), minLatency, maxLatency) 139 | bw.latency.Lock() 140 | if err := bw.latency.Current.RecordValue(elapsed.Nanoseconds()); err != nil { 141 | log.Fatal(err) 142 | } 143 | bw.latency.Unlock() 144 | v := atomic.AddUint64(&numBlocks, uint64(*batch)) 145 | if *maxBlocks > 0 && v >= *maxBlocks { 146 | return 147 | } 148 | } 149 | } 150 | } 151 | 152 | func (bw *blockWriter) randomBlock() []byte { 153 | blockSize := bw.rand.Intn(*maxBlockSizeBytes-*minBlockSizeBytes) + *minBlockSizeBytes 154 | blockData := make([]byte, blockSize) 155 | for i := range blockData { 156 | blockData[i] = byte(bw.rand.Int() & 0xff) 157 | } 158 | return blockData 159 | } 160 | 161 | // setupDatabase performs initial setup for the example, creating a database and 162 | // with a single table. If the desired table already exists on the cluster, the 163 | // existing table will be dropped. 164 | func setupDatabase(dbURL string) (*sql.DB, error) { 165 | parsedURL, err := url.Parse(dbURL) 166 | if err != nil { 167 | return nil, err 168 | } 169 | parsedURL.Path = "datablocks" 170 | 171 | // Open connection to server and create a database. 172 | db, err := sql.Open("postgres", parsedURL.String()) 173 | if err != nil { 174 | return nil, err 175 | } 176 | 177 | if *createDB { 178 | if _, err := db.Exec("CREATE DATABASE IF NOT EXISTS datablocks"); err != nil { 179 | return nil, err 180 | } 181 | } 182 | 183 | // Allow a maximum of concurrency+1 connections to the database. 184 | db.SetMaxOpenConns(*concurrency + 1) 185 | db.SetMaxIdleConns(*concurrency + 1) 186 | 187 | // Create the initial table for storing blocks. 188 | if _, err := db.Exec(` 189 | CREATE TABLE IF NOT EXISTS blocks ( 190 | block_id BIGINT NOT NULL, 191 | writer_id STRING NOT NULL, 192 | block_num BIGINT NOT NULL, 193 | raw_bytes BYTES NOT NULL, 194 | PRIMARY KEY (block_id, writer_id, block_num) 195 | )`); err != nil { 196 | return nil, err 197 | } 198 | 199 | if *splits > 0 { 200 | r := rand.New(rand.NewSource(int64(time.Now().UnixNano()))) 201 | for i := 0; i < *splits; i++ { 202 | if _, err := db.Exec(`ALTER TABLE blocks SPLIT AT VALUES ($1, '', 0)`, r.Int63()); err != nil { 203 | return nil, err 204 | } 205 | } 206 | } 207 | 208 | return db, nil 209 | } 210 | 211 | var usage = func() { 212 | fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0]) 213 | fmt.Fprintf(os.Stderr, " %s \n\n", os.Args[0]) 214 | flag.PrintDefaults() 215 | } 216 | 217 | func main() { 218 | flag.Usage = usage 219 | flag.Parse() 220 | 221 | dbURL := "postgresql://root@localhost:26257/photos?sslmode=disable" 222 | if flag.NArg() == 1 { 223 | dbURL = flag.Arg(0) 224 | } 225 | 226 | if *concurrency < 1 { 227 | log.Fatalf("Value of 'concurrency' flag (%d) must be greater than or equal to 1", *concurrency) 228 | } 229 | 230 | if max, min := *maxBlockSizeBytes, *minBlockSizeBytes; max < min { 231 | log.Fatalf("Value of 'max-block-bytes' (%d) must be greater than or equal to value of 'min-block-bytes' (%d)", max, min) 232 | } 233 | 234 | var db *sql.DB 235 | { 236 | var err error 237 | for err == nil || *tolerateErrors { 238 | db, err = setupDatabase(dbURL) 239 | if err == nil { 240 | break 241 | } 242 | if !*tolerateErrors { 243 | log.Fatal(err) 244 | } 245 | } 246 | } 247 | 248 | lastNow := time.Now() 249 | start := lastNow 250 | var lastBlocks uint64 251 | writers := make([]*blockWriter, *concurrency) 252 | 253 | errCh := make(chan error) 254 | var wg sync.WaitGroup 255 | for i := range writers { 256 | wg.Add(1) 257 | writers[i] = newBlockWriter(db) 258 | go writers[i].run(errCh, &wg) 259 | } 260 | 261 | var numErr int 262 | tick := time.Tick(*outputInterval) 263 | done := make(chan os.Signal, 3) 264 | signal.Notify(done, syscall.SIGINT, syscall.SIGTERM) 265 | 266 | go func() { 267 | wg.Wait() 268 | done <- syscall.Signal(0) 269 | }() 270 | 271 | if *duration > 0 { 272 | go func() { 273 | time.Sleep(*duration) 274 | done <- syscall.Signal(0) 275 | }() 276 | } 277 | 278 | defer func() { 279 | // Output results that mimic Go's built-in benchmark format. 280 | elapsed := time.Since(start) 281 | fmt.Printf("%s\t%8d\t%12.1f ns/op\n", 282 | *benchmarkName, numBlocks, float64(elapsed.Nanoseconds())/float64(numBlocks)) 283 | }() 284 | 285 | for i := 0; ; { 286 | select { 287 | case err := <-errCh: 288 | numErr++ 289 | if !*tolerateErrors { 290 | log.Fatal(err) 291 | } else { 292 | log.Print(err) 293 | } 294 | continue 295 | 296 | case <-tick: 297 | var h *hdrhistogram.Histogram 298 | for _, w := range writers { 299 | w.latency.Lock() 300 | m := w.latency.Merge() 301 | w.latency.Rotate() 302 | w.latency.Unlock() 303 | if h == nil { 304 | h = m 305 | } else { 306 | h.Merge(m) 307 | } 308 | } 309 | 310 | p50 := h.ValueAtQuantile(50) 311 | p95 := h.ValueAtQuantile(95) 312 | p99 := h.ValueAtQuantile(99) 313 | pMax := h.ValueAtQuantile(100) 314 | 315 | now := time.Now() 316 | elapsed := time.Since(lastNow) 317 | blocks := atomic.LoadUint64(&numBlocks) 318 | if i%20 == 0 { 319 | fmt.Println("_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)") 320 | } 321 | i++ 322 | fmt.Printf("%8s %8d %14.1f %14.1f %8.1f %8.1f %8.1f %8.1f\n", 323 | time.Duration(time.Since(start).Seconds()+0.5)*time.Second, 324 | numErr, 325 | float64(blocks-lastBlocks)/elapsed.Seconds(), 326 | float64(blocks)/time.Since(start).Seconds(), 327 | time.Duration(p50).Seconds()*1000, 328 | time.Duration(p95).Seconds()*1000, 329 | time.Duration(p99).Seconds()*1000, 330 | time.Duration(pMax).Seconds()*1000) 331 | lastBlocks = blocks 332 | lastNow = now 333 | 334 | case <-done: 335 | blocks := atomic.LoadUint64(&numBlocks) 336 | elapsed := time.Since(start).Seconds() 337 | fmt.Println("\n_elapsed___errors_________blocks___ops/sec(cum)") 338 | fmt.Printf("%7.1fs %8d %14d %14.1f\n\n", 339 | time.Since(start).Seconds(), numErr, 340 | blocks, float64(blocks)/elapsed) 341 | return 342 | } 343 | } 344 | } 345 | -------------------------------------------------------------------------------- /fakerealtime/.gitignore: -------------------------------------------------------------------------------- 1 | fakerealtime 2 | -------------------------------------------------------------------------------- /fakerealtime/README.md: -------------------------------------------------------------------------------- 1 | # Fake Real Time example 2 | 3 | ## Summary 4 | 5 | This example uses a log-style table in an approximation of the 6 | "fake real time" system used at Friendfeed. Two tables are used: a 7 | `messages` table stores the complete data for all messages 8 | organized by channel, and a global `updates` table stores metadata 9 | about recently-updated channels. 10 | 11 | ## Running 12 | 13 | Run against an existing cockroach node or cluster. 14 | 15 | #### Insecure node or cluster 16 | ``` 17 | # Launch your node or cluster in insecure mode (with --insecure passed to cockroach). 18 | # Find a reachable address: [mycockroach:26257]. 19 | # Run the example with: 20 | ./fakerealtime postgres://root@mycockroach:26257?sslmode=disable 21 | ``` 22 | 23 | #### Secure node or cluster 24 | ``` 25 | # Launch your node or cluster in secure mode with certificates in [mycertsdir] 26 | # Find a reachable address:[mycockroach:26257]. 27 | # Run the example with: 28 | ./fakerealtime "postgres://root@mycockroach:26257?sslcert=mycertsdir/root.client.crt&sslkey=mycertsdir/root.client.key" 29 | ``` 30 | -------------------------------------------------------------------------------- /fakerealtime/main.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Ben Darnell 17 | 18 | // This example uses a log-style table in an approximation of the 19 | // "fake real time" system used at Friendfeed. Two tables are used: a 20 | // `messages` table stores the complete data for all messages 21 | // organized by channel, and a global `updates` table stores metadata 22 | // about recently-updated channels. 23 | // 24 | // The example currently runs a number of writers, which update both 25 | // tables transactionally. Future work includes implementing a reader 26 | // which will effectively tail the `updates` table to learn when it 27 | // needs to query a channel's messages from the `messages` table. 28 | // 29 | // The implementation currently guarantees that `update_ids` are 30 | // strictly monotonic, which is extremely expensive under contention 31 | // (two writer threads will fight with each other for 20 seconds or 32 | // more before making progress). 33 | // 34 | // One alternate solution is to relax the monotonicity constraint (by 35 | // using a timestamp instead of a sequential ID), at the expense of 36 | // making the reader more complicated (since it would need to re-scan 37 | // records it had already read to see if anything "in the past" 38 | // committed since it last read). We would still need to put some sort 39 | // of bound on the reader's scans; any updates that went longer than 40 | // this without being committed would go unnoticed. 41 | // 42 | // Another alternative is to reduce contention by adding a small 43 | // random number (in range (0, N)) to the beginning of the `updates` 44 | // table primary key. This reduces contention but makes the reader do 45 | // N queries to see all updates. This is effectively what Friendfeed 46 | // did, since there was one `updates` table per MySQL shard. 47 | package main 48 | 49 | import ( 50 | "database/sql" 51 | "flag" 52 | "fmt" 53 | "log" 54 | "math/rand" 55 | "os" 56 | "sync" 57 | "time" 58 | 59 | "github.com/montanaflynn/stats" 60 | // Import postgres driver. 61 | _ "github.com/lib/pq" 62 | ) 63 | 64 | func createTables(db *sql.DB) error { 65 | statements := []string{ 66 | `CREATE DATABASE IF NOT EXISTS fakerealtime`, 67 | 68 | `DROP TABLE IF EXISTS fakerealtime.updates`, 69 | `CREATE TABLE fakerealtime.updates ( 70 | update_id INT, 71 | channel STRING, 72 | msg_id INT, 73 | PRIMARY KEY (update_id, channel) 74 | )`, 75 | 76 | `DROP TABLE IF EXISTS fakerealtime.messages`, 77 | `CREATE TABLE fakerealtime.messages ( 78 | channel STRING, 79 | msg_id INT, 80 | message STRING, 81 | PRIMARY KEY (channel, msg_id) 82 | )`, 83 | } 84 | for _, stmt := range statements { 85 | if _, err := db.Exec(stmt); err != nil { 86 | return err 87 | } 88 | } 89 | return nil 90 | } 91 | 92 | type statistics struct { 93 | sync.Mutex 94 | writeTimes stats.Float64Data 95 | } 96 | 97 | func (s *statistics) recordWrite(start time.Time) { 98 | duration := time.Now().Sub(start) 99 | s.Lock() 100 | defer s.Unlock() 101 | s.writeTimes = append(s.writeTimes, float64(duration.Nanoseconds())) 102 | } 103 | 104 | func (s *statistics) report() { 105 | for range time.Tick(time.Second) { 106 | s.Lock() 107 | writeTimes := s.writeTimes 108 | s.writeTimes = nil 109 | s.Unlock() 110 | 111 | // The stats functions return an error only when the input is empty. 112 | mean, _ := stats.Mean(writeTimes) 113 | stddev, _ := stats.StandardDeviation(writeTimes) 114 | log.Printf("wrote %d messages, latency mean=%s, stddev=%s", 115 | len(writeTimes), time.Duration(mean), time.Duration(stddev)) 116 | } 117 | } 118 | 119 | type writer struct { 120 | db *sql.DB 121 | numChannels int 122 | wg *sync.WaitGroup 123 | stats *statistics 124 | } 125 | 126 | func (w writer) run() { 127 | defer w.wg.Done() 128 | for { 129 | if err := w.writeMessage(); err != nil { 130 | log.Printf("error writing message: %s", err) 131 | } 132 | } 133 | } 134 | 135 | func (w writer) writeMessage() error { 136 | start := time.Now() 137 | defer w.stats.recordWrite(start) 138 | channel := fmt.Sprintf("room-%d", rand.Int31n(int32(w.numChannels))) 139 | message := start.String() 140 | 141 | // TODO(bdarnell): retry only on certain errors. 142 | for { 143 | txn, err := w.db.Begin() 144 | if err != nil { 145 | continue 146 | } 147 | 148 | // TODO(bdarnell): make this a subquery when subqueries are supported on insert. 149 | row := txn.QueryRow(`select max(msg_id) from fakerealtime.messages where channel=$1`, channel) 150 | var maxMsgID sql.NullInt64 151 | if err := row.Scan(&maxMsgID); err != nil { 152 | _ = txn.Rollback() 153 | continue 154 | } 155 | if !maxMsgID.Valid { 156 | maxMsgID.Int64 = 0 157 | } 158 | newMsgID := maxMsgID.Int64 + 1 159 | 160 | row = txn.QueryRow(`select max(update_id) from fakerealtime.updates`) 161 | var maxUpdateID sql.NullInt64 162 | if err := row.Scan(&maxUpdateID); err != nil { 163 | _ = txn.Rollback() 164 | continue 165 | } 166 | if !maxUpdateID.Valid { 167 | maxUpdateID.Int64 = 0 168 | } 169 | newUpdateID := maxUpdateID.Int64 + 1 170 | 171 | if _, err := txn.Exec(`insert into fakerealtime.messages (channel, msg_id, message) values ($1, $2, $3)`, 172 | channel, newMsgID, message); err != nil { 173 | _ = txn.Rollback() 174 | continue 175 | } 176 | 177 | if _, err := txn.Exec(`insert into fakerealtime.updates (update_id, channel, msg_id) values ($1, $2, $3)`, 178 | newUpdateID, channel, newMsgID); err != nil { 179 | _ = txn.Rollback() 180 | continue 181 | } 182 | 183 | if err := txn.Commit(); err == nil { 184 | return nil 185 | } 186 | } 187 | } 188 | 189 | var usage = func() { 190 | fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0]) 191 | fmt.Fprintf(os.Stderr, " %s \n\n", os.Args[0]) 192 | flag.PrintDefaults() 193 | } 194 | 195 | func main() { 196 | flag.Usage = usage 197 | 198 | numChannels := flag.Int("num-channels", 100, "number of channels") 199 | numWriters := flag.Int("num-writers", 2, "number of writers") 200 | flag.Parse() 201 | 202 | if flag.NArg() != 1 { 203 | usage() 204 | os.Exit(2) 205 | } 206 | 207 | dbURL := flag.Arg(0) 208 | 209 | db, err := sql.Open("postgres", dbURL) 210 | if err != nil { 211 | log.Fatal(err) 212 | } 213 | 214 | if err := createTables(db); err != nil { 215 | log.Fatal(err) 216 | } 217 | 218 | var stats statistics 219 | var wg sync.WaitGroup 220 | for i := 0; i < *numWriters; i++ { 221 | wg.Add(1) 222 | w := writer{db, *numChannels, &wg, &stats} 223 | go w.run() 224 | } 225 | go stats.report() 226 | wg.Wait() 227 | } 228 | -------------------------------------------------------------------------------- /filesystem/.gitignore: -------------------------------------------------------------------------------- 1 | filesystem 2 | -------------------------------------------------------------------------------- /filesystem/README.md: -------------------------------------------------------------------------------- 1 | # Filesystem example 2 | 3 | ## Summary 4 | 5 | This is a fuse filesystem using cockroach as a backing store. 6 | The implemented features attempt to be posix compliant. 7 | See `main.go` for more details, including implemented features and caveats. 8 | 9 | ## Running 10 | 11 | Run against an existing cockroach node or cluster. 12 | 13 | #### Development node 14 | ``` 15 | # Build cockroach binary from https://github.com/cockroachdb/cockroach 16 | # Start it in insecure mode (listens on localhost:15432) 17 | ./cockroach start --insecure 18 | 19 | # Build filesystem example. 20 | # Start it with: 21 | mkdir /tmp/foo 22 | ./filesystem postgresql://root@localhost:15432/?sslmode=disable /tmp/foo 23 | # to umount and quit 24 | # Use /tmp/foo as a filesystem. 25 | ``` 26 | 27 | #### Insecure node or cluster 28 | ``` 29 | # Launch your node or cluster in insecure mode (with --insecure passed to cockroach). 30 | # Find a reachable address: [mycockroach:15432]. 31 | # Run the example with: 32 | mkdir /tmp/foo 33 | ./filesystem postgresql://root@mycockroach:15432/?sslmode=disable /tmp/foo 34 | # to umount and quit 35 | # Use /tmp/foo as a filesystem. 36 | ``` 37 | 38 | #### Secure node or cluster 39 | ``` 40 | # Launch your node or cluster in secure mode with certificates in [mycertsdir] 41 | # Find a reachable address:[mycockroach:15432]. 42 | # Run the example with: 43 | mkdir /tmp/foo 44 | ./filesystem postgresqls://root@mycockroach:15432/?certs=verify-ca&sslcert=mycertsdir/root.client.crt&sslkey=mycertsdir/root.client.key&sslrootcert=mycertsdir/ca.crt /tmp/foo 45 | # to umount and quit 46 | # Use /tmp/foo as a filesystem. 47 | ``` 48 | -------------------------------------------------------------------------------- /filesystem/block.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Marc Berhault (marc@cockroachlabs.com) 17 | 18 | package main 19 | 20 | import ( 21 | "fmt" 22 | "strings" 23 | ) 24 | 25 | // BlockSize is the size of each data block. It must not 26 | // change throughout the lifetime of the filesystem. 27 | const BlockSize = 4 << 10 // 4KB 28 | 29 | func min(a, b uint64) uint64 { 30 | if a < b { 31 | return a 32 | } 33 | return b 34 | } 35 | 36 | // blockRange describes a range of blocks. 37 | // If the first and last block are the same, the effective data range 38 | // will be: [startOffset, lastLength) 39 | type blockRange struct { 40 | start int // index of the start block 41 | startOffset uint64 // starting offset within the first block 42 | startLength uint64 // length of data in first block 43 | last int // index of the last block 44 | lastLength uint64 // length of the last block 45 | } 46 | 47 | // newBlockRange returns the block range for 'size' bytes from 'from'. 48 | func newBlockRange(from, length uint64) blockRange { 49 | end := from + length 50 | return blockRange{ 51 | start: int(from / BlockSize), 52 | startOffset: from % BlockSize, 53 | startLength: min(length, BlockSize-(from%BlockSize)), 54 | last: int(end / BlockSize), 55 | lastLength: end % BlockSize, 56 | } 57 | } 58 | 59 | // shrink resizes the data to a smaller length. 60 | // Requirement: from > to. 61 | // If truncates are done on block boundaries, this is reasonably 62 | // efficient. However, if truncating in the middle of a block, 63 | // we need to fetch the block first, truncate it, and write it again. 64 | func shrink(e sqlExecutor, inodeID, from, to uint64) error { 65 | delRange := newBlockRange(to, from-to) 66 | deleteFrom := delRange.start 67 | 68 | if delRange.startOffset > 0 { 69 | // We're truncating in the middle of a block, fetch it, truncate its 70 | // data, and write it again. 71 | // TODO(marc): this would be more efficient if we had LEFT for bytes. 72 | data, err := getBlockData(e, inodeID, delRange.start) 73 | if err != nil { 74 | return err 75 | } 76 | data = data[:delRange.startOffset] 77 | if err := updateBlockData(e, inodeID, delRange.start, data); err != nil { 78 | return err 79 | } 80 | // We don't need to delete this block. 81 | deleteFrom++ 82 | } 83 | 84 | deleteTo := delRange.last 85 | if delRange.lastLength == 0 { 86 | // The last block did not previously exist. 87 | deleteTo-- 88 | } 89 | if deleteTo < deleteFrom { 90 | return nil 91 | } 92 | 93 | // There is something to delete. 94 | // TODO(marc): would it be better to pass the block IDs? 95 | delStmt := `DELETE FROM fs.BLOCK WHERE id = $1 AND block >= $2` 96 | if _, err := e.Exec(delStmt, inodeID, deleteFrom); err != nil { 97 | return err 98 | } 99 | 100 | return nil 101 | } 102 | 103 | // grow resizes the data to a larger length. 104 | // Requirement: to > from. 105 | // If the file ended in a partial block, we fetch it, grow it, 106 | // and write it back. 107 | func grow(e sqlExecutor, inodeID, from, to uint64) error { 108 | addRange := newBlockRange(from, to-from) 109 | insertFrom := addRange.start 110 | 111 | if addRange.startOffset > 0 { 112 | // We need to extend the original 'last block'. 113 | // Fetch it, grow it, and update it. 114 | // TODO(marc): this would be more efficient if we had RPAD for bytes. 115 | data, err := getBlockData(e, inodeID, addRange.start) 116 | if err != nil { 117 | return err 118 | } 119 | data = append(data, make([]byte, addRange.startLength, addRange.startLength)...) 120 | if err := updateBlockData(e, inodeID, addRange.start, data); err != nil { 121 | return err 122 | } 123 | // We don't need to insert this block. 124 | insertFrom++ 125 | } 126 | 127 | insertTo := addRange.last 128 | if insertTo < insertFrom { 129 | return nil 130 | } 131 | 132 | // Build the sql statement and blocks to insert. 133 | // We don't share this functionality with 'write' because we can repeat empty blocks. 134 | // This would be shorter if we weren't trying to be efficient. 135 | // TODO(marc): this would also be better if we supported sparse files. 136 | paramStrings := []string{} 137 | params := []interface{}{} 138 | count := 1 // placeholder count starts at 1. 139 | if insertFrom != insertTo { 140 | // We have full blocks. Only send a full block once. 141 | params = append(params, make([]byte, BlockSize, BlockSize)) 142 | count++ 143 | } 144 | 145 | // Go over all blocks that are certainly full. 146 | for i := insertFrom; i < insertTo; i++ { 147 | paramStrings = append(paramStrings, fmt.Sprintf("(%d, %d, $1)", inodeID, i)) 148 | } 149 | 150 | // Check the last block. 151 | if addRange.lastLength > 0 { 152 | // Not empty, write it. It can't be a full block, because we 153 | // would have an empty block right after. 154 | params = append(params, make([]byte, addRange.lastLength, addRange.lastLength)) 155 | paramStrings = append(paramStrings, fmt.Sprintf("(%d, %d, $%d)", 156 | inodeID, addRange.last, count)) 157 | count++ 158 | } 159 | 160 | if len(paramStrings) == 0 { 161 | // We had only one block, and it was empty. Nothing do to. 162 | return nil 163 | } 164 | 165 | insStmt := fmt.Sprintf(`INSERT INTO fs.block VALUES %s`, strings.Join(paramStrings, ",")) 166 | if _, err := e.Exec(insStmt, params...); err != nil { 167 | return err 168 | } 169 | 170 | return nil 171 | } 172 | 173 | // read returns the data [from, to). 174 | // Requires: to > from and [to, from) is contained in the file. 175 | func read(e sqlExecutor, inodeID, from, to uint64) ([]byte, error) { 176 | readRange := newBlockRange(from, to-from) 177 | end := readRange.last 178 | if readRange.lastLength == 0 { 179 | end-- 180 | } 181 | 182 | blockInfos, err := getBlocksBetween(e, inodeID, readRange.start, end) 183 | if err != nil { 184 | return nil, err 185 | } 186 | if len(blockInfos) != end-readRange.start+1 { 187 | return nil, fmt.Errorf("wrong number of blocks, asked for [%d-%d], got %d back", 188 | readRange.start, end, len(blockInfos)) 189 | } 190 | 191 | if readRange.lastLength != 0 { 192 | // We have a last partial block, truncate it. 193 | last := len(blockInfos) - 1 194 | blockInfos[last].data = blockInfos[last].data[:readRange.lastLength] 195 | } 196 | blockInfos[0].data = blockInfos[0].data[readRange.startOffset:] 197 | 198 | var data []byte 199 | for _, b := range blockInfos { 200 | data = append(data, b.data...) 201 | } 202 | 203 | return data, nil 204 | } 205 | 206 | // write commits data to the blocks starting at 'offset' 207 | // Amount of data to write must be non-zero. 208 | // If offset is greated than 'originalSize', the file is grown first. 209 | // We always write all or nothing. 210 | func write(e sqlExecutor, inodeID, originalSize, offset uint64, data []byte) error { 211 | if offset > originalSize { 212 | diff := offset - originalSize 213 | if diff > BlockSize*2 { 214 | // we need to grow the file by at least two blocks. Use growing method 215 | // which only sends empty blocks once. 216 | if err := grow(e, inodeID, originalSize, offset); err != nil { 217 | return err 218 | } 219 | originalSize = offset 220 | } else if diff > 0 { 221 | // don't grow the file first, just change what we need to write. 222 | data = append(make([]byte, diff, diff), data...) 223 | offset = originalSize 224 | } 225 | } 226 | 227 | // Now we know that offset is <= originalSize. 228 | writeRange := newBlockRange(offset, uint64(len(data))) 229 | writeFrom := writeRange.start 230 | 231 | if writeRange.startOffset > 0 { 232 | // We're partially overwriting a block (this includes appending 233 | // to the last block): fetch it, grow it, and update it. 234 | // TODO(marc): this would be more efficient if we had RPAD for bytes. 235 | blockData, err := getBlockData(e, inodeID, writeRange.start) 236 | if err != nil { 237 | return err 238 | } 239 | blockData = append(blockData[:writeRange.startOffset], data[:writeRange.startLength]...) 240 | data = data[writeRange.startLength:] 241 | if err := updateBlockData(e, inodeID, writeRange.start, blockData); err != nil { 242 | return err 243 | } 244 | // We don't need to insert this block. 245 | writeFrom++ 246 | } 247 | 248 | writeTo := writeRange.last 249 | if writeRange.lastLength == 0 { 250 | // Last block is empty, don't update/insert it. 251 | writeTo-- 252 | } 253 | if writeTo < writeFrom { 254 | return nil 255 | } 256 | 257 | // Figure out last existing block. Needed to tell the difference 258 | // between insert and update. 259 | lastBlock := int(originalSize / BlockSize) 260 | if originalSize%BlockSize == 0 { 261 | // Empty blocks do not exist (size=0 -> lastblock=-1). 262 | lastBlock-- 263 | } 264 | 265 | // Process updates first. 266 | for i := writeFrom; i <= writeTo; i++ { 267 | if i > lastBlock { 268 | // We've reached the end of existing blocks, no more UPDATE. 269 | break 270 | } 271 | if len(data) == 0 { 272 | panic(fmt.Sprintf("reached end of data, but still have %d blocks to write", 273 | writeTo-i)) 274 | } 275 | toWrite := min(BlockSize, uint64(len(data))) 276 | blockData := data[:toWrite] 277 | data = data[toWrite:] 278 | if toWrite != BlockSize { 279 | // This is the last block, and it's partial, fetch the original 280 | // data from this block and append. 281 | // TODO(marc): we could fetch this at the same time as the first 282 | // partial block, if any. This would make overwriting in the middle 283 | // of the file on non-block boundaries a bit more efficient. 284 | origData, err := getBlockData(e, inodeID, i) 285 | if err != nil { 286 | return err 287 | } 288 | toWrite = min(toWrite, uint64(len(origData))) 289 | blockData = append(blockData, origData[toWrite:]...) 290 | } 291 | // TODO(marc): is there a way to do batch updates? 292 | if err := updateBlockData(e, inodeID, i, blockData); err != nil { 293 | return err 294 | } 295 | } 296 | 297 | if len(data) == 0 { 298 | return nil 299 | } 300 | 301 | paramStrings := []string{} 302 | params := []interface{}{} 303 | count := 1 // placeholder count starts at 1. 304 | 305 | for i := lastBlock + 1; i <= writeTo; i++ { 306 | if len(data) == 0 { 307 | panic(fmt.Sprintf("reached end of data, but still have %d blocks to write", 308 | writeTo-i)) 309 | } 310 | toWrite := min(BlockSize, uint64(len(data))) 311 | blockData := data[:toWrite] 312 | data = data[toWrite:] 313 | paramStrings = append(paramStrings, fmt.Sprintf("(%d, %d, $%d)", 314 | inodeID, i, count)) 315 | params = append(params, blockData) 316 | count++ 317 | } 318 | 319 | if len(data) != 0 { 320 | panic(fmt.Sprintf("processed all blocks, but still have %d of data to write", len(data))) 321 | } 322 | 323 | insStmt := fmt.Sprintf(`INSERT INTO fs.block VALUES %s`, strings.Join(paramStrings, ",")) 324 | if _, err := e.Exec(insStmt, params...); err != nil { 325 | return err 326 | } 327 | 328 | return nil 329 | } 330 | 331 | // resize changes the size of the data for the inode with id 'inodeID' 332 | // from 'from' to 'to'. This may grow or shrink. 333 | func resizeBlocks(e sqlExecutor, inodeID, from, to uint64) error { 334 | if to < from { 335 | return shrink(e, inodeID, from, to) 336 | } else if to > from { 337 | return grow(e, inodeID, from, to) 338 | } 339 | return nil 340 | } 341 | -------------------------------------------------------------------------------- /filesystem/block_test.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Marc Berhault (marc@cockroachlabs.com) 17 | 18 | package main 19 | 20 | import ( 21 | "bytes" 22 | "database/sql" 23 | "fmt" 24 | "reflect" 25 | "testing" 26 | 27 | "github.com/cockroachdb/cockroach-go/testserver" 28 | "github.com/cockroachdb/cockroach/pkg/util/randutil" 29 | ) 30 | 31 | func initTestDB(t *testing.T) (*sql.DB, func()) { 32 | db, stop := testserver.NewDBForTest(t) 33 | 34 | if err := initSchema(db); err != nil { 35 | stop() 36 | t.Fatal(err) 37 | } 38 | 39 | return db, stop 40 | } 41 | 42 | func getAllBlocks(db *sql.DB, inode uint64) ([]byte, error) { 43 | blocks, err := getBlocks(db, inode) 44 | if err != nil { 45 | return nil, err 46 | } 47 | num := len(blocks) 48 | var data []byte 49 | for i, b := range blocks { 50 | if i != b.block { 51 | // We can't have missing blocks. 52 | return nil, fmt.Errorf("gap in block list, found block %d at index %d", b.block, i) 53 | } 54 | bl := uint64(len(b.data)) 55 | if bl == 0 { 56 | return nil, fmt.Errorf("empty block found at %d (out of %d blocks)", i, num) 57 | } 58 | if i != (num-1) && bl != BlockSize { 59 | return nil, fmt.Errorf("non-blocksize %d at %d (out of %d blocks)", bl, i, num) 60 | } 61 | data = append(data, b.data...) 62 | } 63 | return data, nil 64 | } 65 | 66 | func TestBlockInfo(t *testing.T) { 67 | testCases := []struct { 68 | start, length uint64 69 | expected blockRange 70 | }{ 71 | {0, 0, blockRange{0, 0, 0, 0, 0}}, 72 | {0, BlockSize * 4, blockRange{0, 0, BlockSize, 4, 0}}, 73 | {0, BlockSize*4 + 500, blockRange{0, 0, BlockSize, 4, 500}}, 74 | {500, BlockSize * 4, blockRange{0, 500, BlockSize - 500, 4, 500}}, 75 | {BlockSize, BlockSize * 4, blockRange{1, 0, BlockSize, 5, 0}}, 76 | {BlockSize, 500, blockRange{1, 0, 500, 1, 500}}, 77 | {500, 1000, blockRange{0, 500, 1000, 0, 1500}}, 78 | } 79 | 80 | for tcNum, tc := range testCases { 81 | actual := newBlockRange(tc.start, tc.length) 82 | if !reflect.DeepEqual(actual, tc.expected) { 83 | t.Errorf("#%d: expected:\n%+v\ngot:\n%+v", tcNum, tc.expected, actual) 84 | } 85 | } 86 | } 87 | 88 | func tryGrow(db *sql.DB, data []byte, id, newSize uint64) ([]byte, error) { 89 | originalSize := uint64(len(data)) 90 | data = append(data, make([]byte, newSize-originalSize)...) 91 | if err := grow(db, id, originalSize, newSize); err != nil { 92 | return nil, err 93 | } 94 | newData, err := getAllBlocks(db, id) 95 | if err != nil { 96 | return nil, err 97 | } 98 | if uint64(len(newData)) != newSize { 99 | return nil, fmt.Errorf("getAllBlocks lengths don't match: got %d, expected %d", len(newData), newSize) 100 | } 101 | if !bytes.Equal(data, newData) { 102 | return nil, fmt.Errorf("getAllBlocks data doesn't match") 103 | } 104 | 105 | if newSize == 0 { 106 | return newData, nil 107 | } 108 | 109 | // Check the read as well. 110 | newData, err = read(db, id, 0, newSize) 111 | if err != nil { 112 | return nil, err 113 | } 114 | 115 | if uint64(len(newData)) != newSize { 116 | return nil, fmt.Errorf("read lengths don't match: got %d, expected %d", len(newData), newSize) 117 | } 118 | if !bytes.Equal(data, newData) { 119 | return nil, fmt.Errorf("read data doesn't match") 120 | } 121 | 122 | return newData, nil 123 | } 124 | 125 | func tryShrink(db *sql.DB, data []byte, id, newSize uint64) ([]byte, error) { 126 | originalSize := uint64(len(data)) 127 | data = data[:newSize] 128 | if err := shrink(db, id, originalSize, newSize); err != nil { 129 | return nil, err 130 | } 131 | newData, err := getAllBlocks(db, id) 132 | if err != nil { 133 | return nil, err 134 | } 135 | if uint64(len(newData)) != newSize { 136 | return nil, fmt.Errorf("getAllData lengths don't match: got %d, expected %d", len(newData), newSize) 137 | } 138 | if !bytes.Equal(data, newData) { 139 | return nil, fmt.Errorf("getAllData data doesn't match") 140 | } 141 | 142 | if newSize == 0 { 143 | return newData, nil 144 | } 145 | 146 | // Check the read as well. 147 | newData, err = read(db, id, 0, newSize) 148 | if err != nil { 149 | return nil, err 150 | } 151 | 152 | if uint64(len(newData)) != newSize { 153 | return nil, fmt.Errorf("read lengths don't match: got %d, expected %d", len(newData), newSize) 154 | } 155 | if !bytes.Equal(data, newData) { 156 | return nil, fmt.Errorf("read data doesn't match") 157 | } 158 | 159 | return newData, nil 160 | } 161 | 162 | func TestShrinkGrow(t *testing.T) { 163 | db, stop := initTestDB(t) 164 | defer stop() 165 | 166 | id := uint64(10) 167 | 168 | var err error 169 | data := []byte{} 170 | 171 | if data, err = tryGrow(db, data, id, BlockSize*4+500); err != nil { 172 | t.Fatal(err) 173 | } 174 | if data, err = tryGrow(db, data, id, BlockSize*4+600); err != nil { 175 | t.Fatal(err) 176 | } 177 | if data, err = tryGrow(db, data, id, BlockSize*5); err != nil { 178 | t.Fatal(err) 179 | } 180 | 181 | // Shrink it down to 0. 182 | if data, err = tryShrink(db, data, id, 0); err != nil { 183 | t.Fatal(err) 184 | } 185 | if data, err = tryGrow(db, data, id, BlockSize*3+500); err != nil { 186 | t.Fatal(err) 187 | } 188 | if data, err = tryShrink(db, data, id, BlockSize*3+300); err != nil { 189 | t.Fatal(err) 190 | } 191 | if data, err = tryShrink(db, data, id, BlockSize*3); err != nil { 192 | t.Fatal(err) 193 | } 194 | if data, err = tryShrink(db, data, id, 0); err != nil { 195 | t.Fatal(err) 196 | } 197 | if data, err = tryGrow(db, data, id, BlockSize); err != nil { 198 | t.Fatal(err) 199 | } 200 | if data, err = tryShrink(db, data, id, BlockSize-200); err != nil { 201 | t.Fatal(err) 202 | } 203 | if data, err = tryShrink(db, data, id, BlockSize-500); err != nil { 204 | t.Fatal(err) 205 | } 206 | if data, err = tryShrink(db, data, id, 0); err != nil { 207 | t.Fatal(err) 208 | } 209 | } 210 | 211 | func TestReadWriteBlocks(t *testing.T) { 212 | db, stop := initTestDB(t) 213 | defer stop() 214 | 215 | id := uint64(10) 216 | rng, _ := randutil.NewPseudoRand() 217 | length := BlockSize*3 + 500 218 | part1 := randutil.RandBytes(rng, length) 219 | 220 | if err := write(db, id, 0, 0, part1); err != nil { 221 | t.Fatal(err) 222 | } 223 | 224 | readData, err := read(db, id, 0, uint64(length)) 225 | if err != nil { 226 | t.Fatal(err) 227 | } 228 | if !bytes.Equal(part1, readData) { 229 | t.Errorf("Bytes differ. lengths: %d, expected %d", len(readData), len(part1)) 230 | } 231 | 232 | verboseData, err := getAllBlocks(db, id) 233 | if err != nil { 234 | t.Fatal(err) 235 | } 236 | if !bytes.Equal(verboseData, part1) { 237 | t.Errorf("Bytes differ. lengths: %d, expected %d", len(verboseData), len(part1)) 238 | } 239 | 240 | // Write with hole in the middle. 241 | part2 := make([]byte, BlockSize*2+250, BlockSize*2+250) 242 | fullData := append(part1, part2...) 243 | part3 := randutil.RandBytes(rng, BlockSize+123) 244 | if err := write(db, id, uint64(len(part1)), uint64(len(fullData)), part3); err != nil { 245 | t.Fatal(err) 246 | } 247 | fullData = append(fullData, part3...) 248 | readData, err = read(db, id, 0, uint64(len(fullData))) 249 | if err != nil { 250 | t.Fatal(err) 251 | } 252 | if !bytes.Equal(fullData, readData) { 253 | t.Errorf("Bytes differ. lengths: %d, expected %d", len(readData), len(fullData)) 254 | } 255 | 256 | verboseData, err = getAllBlocks(db, id) 257 | if err != nil { 258 | t.Fatal(err) 259 | } 260 | if !bytes.Equal(verboseData, fullData) { 261 | t.Errorf("Bytes differ. lengths: %d, expected %d", len(verboseData), len(fullData)) 262 | } 263 | 264 | // Now write into the middle of the file. 265 | part2 = randutil.RandBytes(rng, len(part2)) 266 | if err := write(db, id, uint64(len(fullData)), uint64(len(part1)), part2); err != nil { 267 | t.Fatal(err) 268 | } 269 | fullData = append(part1, part2...) 270 | fullData = append(fullData, part3...) 271 | readData, err = read(db, id, 0, uint64(len(fullData))) 272 | if err != nil { 273 | t.Fatal(err) 274 | } 275 | if !bytes.Equal(fullData, readData) { 276 | t.Errorf("Bytes differ. lengths: %d, expected %d", len(readData), len(fullData)) 277 | } 278 | 279 | verboseData, err = getAllBlocks(db, id) 280 | if err != nil { 281 | t.Fatal(err) 282 | } 283 | if !bytes.Equal(verboseData, fullData) { 284 | t.Errorf("Bytes differ. lengths: %d, expected %d", len(verboseData), len(fullData)) 285 | } 286 | 287 | // New file. 288 | id2 := uint64(20) 289 | if err := write(db, id2, 0, 0, []byte("1")); err != nil { 290 | t.Fatal(err) 291 | } 292 | readData, err = read(db, id2, 0, 1) 293 | if err != nil { 294 | t.Fatal(err) 295 | } 296 | if string(readData) != "1" { 297 | t.Fatalf("mismatch: %s", readData) 298 | } 299 | 300 | if err := write(db, id2, 1, 0, []byte("22")); err != nil { 301 | t.Fatal(err) 302 | } 303 | readData, err = read(db, id2, 0, 2) 304 | if err != nil { 305 | t.Fatal(err) 306 | } 307 | if string(readData) != "22" { 308 | t.Fatalf("mismatch: %s", readData) 309 | } 310 | 311 | id3 := uint64(30) 312 | part1 = randutil.RandBytes(rng, BlockSize) 313 | // Write 5 blocks. 314 | var offset uint64 315 | for i := 0; i < 5; i++ { 316 | if err := write(db, id3, offset, offset, part1); err != nil { 317 | t.Fatal(err) 318 | } 319 | offset += BlockSize 320 | } 321 | } 322 | -------------------------------------------------------------------------------- /filesystem/fs.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Marc Berhault (marc@cockroachlabs.com) 17 | 18 | package main 19 | 20 | import ( 21 | "context" 22 | "database/sql" 23 | "os" 24 | "syscall" 25 | 26 | "bazil.org/fuse" 27 | "bazil.org/fuse/fs" 28 | "github.com/cockroachdb/cockroach-go/crdb" 29 | ) 30 | 31 | const rootNodeID = 1 32 | 33 | const ( 34 | fsSchema = ` 35 | CREATE DATABASE IF NOT EXISTS fs; 36 | 37 | CREATE TABLE IF NOT EXISTS fs.namespace ( 38 | parentID INT, 39 | name STRING, 40 | id INT, 41 | PRIMARY KEY (parentID, name) 42 | ); 43 | 44 | CREATE TABLE IF NOT EXISTS fs.inode ( 45 | id INT PRIMARY KEY, 46 | inode STRING 47 | ); 48 | 49 | CREATE TABLE IF NOT EXISTS fs.block ( 50 | id INT, 51 | block INT, 52 | data BYTES, 53 | PRIMARY KEY (id, block) 54 | ); 55 | ` 56 | ) 57 | 58 | var _ fs.FS = &CFS{} // Root 59 | var _ fs.FSInodeGenerator = &CFS{} // GenerateInode 60 | 61 | // CFS implements a filesystem on top of cockroach. 62 | type CFS struct { 63 | db *sql.DB 64 | } 65 | 66 | func initSchema(db *sql.DB) error { 67 | _, err := db.Exec(fsSchema) 68 | return err 69 | } 70 | 71 | // create inserts a new node. 72 | // parentID: inode ID of the parent directory. 73 | // name: name of the new node 74 | // node: new node 75 | func (cfs CFS) create(ctx context.Context, parentID uint64, name string, node *Node) error { 76 | inode := node.toJSON() 77 | const insertNode = `INSERT INTO fs.inode VALUES ($1, $2)` 78 | const insertNamespace = `INSERT INTO fs.namespace VALUES ($1, $2, $3)` 79 | 80 | err := crdb.ExecuteTx(ctx, cfs.db, nil /* txopts */, func(tx *sql.Tx) error { 81 | if _, err := tx.Exec(insertNode, node.ID, inode); err != nil { 82 | return err 83 | } 84 | if _, err := tx.Exec(insertNamespace, parentID, name, node.ID); err != nil { 85 | return err 86 | } 87 | return nil 88 | }) 89 | return err 90 | } 91 | 92 | // remove removes a node give its name and its parent ID. 93 | // If 'checkChildren' is true, fails if the node has children. 94 | func (cfs CFS) remove(ctx context.Context, parentID uint64, name string, checkChildren bool) error { 95 | const lookupSQL = `SELECT id FROM fs.namespace WHERE (parentID, name) = ($1, $2)` 96 | const deleteNamespace = `DELETE FROM fs.namespace WHERE (parentID, name) = ($1, $2)` 97 | const deleteInode = `DELETE FROM fs.inode WHERE id = $1` 98 | const deleteBlock = `DELETE FROM fs.block WHERE id = $1` 99 | 100 | err := crdb.ExecuteTx(ctx, cfs.db, nil /* txopts */, func(tx *sql.Tx) error { 101 | // Start by looking up the node ID. 102 | var id uint64 103 | if err := tx.QueryRow(lookupSQL, parentID, name).Scan(&id); err != nil { 104 | return err 105 | } 106 | 107 | // Check if there are any children. 108 | if checkChildren { 109 | if err := checkIsEmpty(tx, id); err != nil { 110 | return err 111 | } 112 | } 113 | 114 | // Delete all entries. 115 | if _, err := tx.Exec(deleteNamespace, parentID, name); err != nil { 116 | return err 117 | } 118 | if _, err := tx.Exec(deleteInode, id); err != nil { 119 | return err 120 | } 121 | if _, err := tx.Exec(deleteBlock, id); err != nil { 122 | return err 123 | } 124 | return nil 125 | }) 126 | return err 127 | } 128 | 129 | func (cfs CFS) lookup(parentID uint64, name string) (*Node, error) { 130 | return getInode(cfs.db, parentID, name) 131 | } 132 | 133 | // list returns the children of the node with id 'parentID'. 134 | // Dirent consists of: 135 | // Inode uint64 136 | // Type DirentType (optional) 137 | // Name string 138 | // TODO(pmattis): lookup all inodes and fill in the type, this will save a Getattr(). 139 | func (cfs CFS) list(parentID uint64) ([]fuse.Dirent, error) { 140 | rows, err := cfs.db.Query(`SELECT name, id FROM fs.namespace WHERE parentID = $1`, parentID) 141 | if err != nil { 142 | return nil, err 143 | } 144 | 145 | var results []fuse.Dirent 146 | for rows.Next() { 147 | dirent := fuse.Dirent{Type: fuse.DT_Unknown} 148 | if err := rows.Scan(&dirent.Name, &dirent.Inode); err != nil { 149 | return nil, err 150 | } 151 | results = append(results, dirent) 152 | } 153 | if err := rows.Err(); err != nil { 154 | return nil, err 155 | } 156 | 157 | return results, nil 158 | } 159 | 160 | // validateRename takes a source and destination node and verifies that 161 | // a rename can be performed from source to destination. 162 | // source must not be nil. destination can be. 163 | func validateRename(tx *sql.Tx, source, destination *Node) error { 164 | if destination == nil { 165 | // No object at destination: good. 166 | return nil 167 | } 168 | 169 | if source.isDir() { 170 | if destination.isDir() { 171 | // Both are directories: destination must be empty 172 | return checkIsEmpty(tx, destination.ID) 173 | } 174 | // directory -> file: not allowed. 175 | return fuse.Errno(syscall.ENOTDIR) 176 | } 177 | 178 | // Source is a file. 179 | if destination.isDir() { 180 | // file -> directory: not allowed. 181 | return fuse.Errno(syscall.EISDIR) 182 | } 183 | return nil 184 | } 185 | 186 | // rename moves 'oldParentID/oldName' to 'newParentID/newName'. 187 | // If 'newParentID/newName' already exists, it is deleted. 188 | // See NOTE on node.go:Rename. 189 | func (cfs CFS) rename( 190 | ctx context.Context, oldParentID, newParentID uint64, oldName, newName string, 191 | ) error { 192 | if oldParentID == newParentID && oldName == newName { 193 | return nil 194 | } 195 | 196 | const deleteNamespace = `DELETE FROM fs.namespace WHERE (parentID, name) = ($1, $2)` 197 | const insertNamespace = `INSERT INTO fs.namespace VALUES ($1, $2, $3)` 198 | const updateNamespace = `UPDATE fs.namespace SET id = $1 WHERE (parentID, name) = ($2, $3)` 199 | const deleteInode = `DELETE FROM fs.inode WHERE id = $1` 200 | err := crdb.ExecuteTx(ctx, cfs.db, nil /* txopts */, func(tx *sql.Tx) error { 201 | // Lookup source inode. 202 | srcObject, err := getInode(tx, oldParentID, oldName) 203 | if err != nil { 204 | return err 205 | } 206 | 207 | // Lookup destination inode. 208 | destObject, err := getInode(tx, newParentID, newName) 209 | if err != nil && err != sql.ErrNoRows { 210 | return err 211 | } 212 | 213 | // Check that the rename is allowed. 214 | if err := validateRename(tx, srcObject, destObject); err != nil { 215 | return err 216 | } 217 | 218 | // At this point we know the following: 219 | // - srcObject is not nil 220 | // - destObject may be nil. If not, its inode can be deleted. 221 | if destObject == nil { 222 | // No new object: use INSERT. 223 | if _, err := tx.Exec(deleteNamespace, oldParentID, oldName); err != nil { 224 | return err 225 | } 226 | 227 | if _, err := tx.Exec(insertNamespace, newParentID, newName, srcObject.ID); err != nil { 228 | return err 229 | } 230 | } else { 231 | // Destination exists. 232 | if _, err := tx.Exec(deleteNamespace, oldParentID, oldName); err != nil { 233 | return err 234 | } 235 | 236 | if _, err := tx.Exec(updateNamespace, srcObject.ID, newParentID, newName); err != nil { 237 | return err 238 | } 239 | 240 | if _, err := tx.Exec(deleteInode, destObject.ID); err != nil { 241 | return err 242 | } 243 | } 244 | return nil 245 | }) 246 | return err 247 | } 248 | 249 | // Root returns the filesystem's root node. 250 | // This node is special: it has a fixed ID and is not persisted. 251 | func (cfs CFS) Root() (fs.Node, error) { 252 | return &Node{cfs: cfs, ID: rootNodeID, Mode: os.ModeDir | defaultPerms}, nil 253 | } 254 | 255 | // GenerateInode returns a new inode ID. 256 | func (cfs CFS) GenerateInode(parentInode uint64, name string) uint64 { 257 | return cfs.newUniqueID() 258 | } 259 | 260 | func (cfs CFS) newUniqueID() (id uint64) { 261 | if err := cfs.db.QueryRow(`SELECT unique_rowid()`).Scan(&id); err != nil { 262 | panic(err) 263 | } 264 | return 265 | } 266 | 267 | // newFileNode returns a new node struct corresponding to a file. 268 | func (cfs CFS) newFileNode() *Node { 269 | return &Node{ 270 | cfs: cfs, 271 | ID: cfs.newUniqueID(), 272 | Mode: defaultPerms, 273 | } 274 | } 275 | 276 | // newDirNode returns a new node struct corresponding to a directory. 277 | func (cfs CFS) newDirNode() *Node { 278 | return &Node{ 279 | cfs: cfs, 280 | ID: cfs.newUniqueID(), 281 | Mode: os.ModeDir | defaultPerms, 282 | } 283 | } 284 | 285 | // newSymlinkNode returns a new node struct corresponding to a symlink. 286 | func (cfs CFS) newSymlinkNode() *Node { 287 | return &Node{ 288 | cfs: cfs, 289 | ID: cfs.newUniqueID(), 290 | // Symlinks don't have permissions, allow all. 291 | Mode: os.ModeSymlink | allPerms, 292 | } 293 | } 294 | -------------------------------------------------------------------------------- /filesystem/main.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Marc Berhault (marc@cockroachlabs.com) 17 | 18 | // This is a simple fuse filesystem that stores all metadata and data 19 | // in cockroach. 20 | // 21 | // Inode relationships are stored in the `namespace` table, and inodes 22 | // themselves in the `inode` table. 23 | // 24 | // Data blocks are stored in the `block` table, indexed by inode ID 25 | // and block number. 26 | // 27 | // Basic functionality is implemented, including: 28 | // - mk/rm directory 29 | // - create/rm files 30 | // - read/write files 31 | // - rename 32 | // - symlinks 33 | // 34 | // WARNING: concurrent access on a single mount is fine. However, 35 | // behavior is undefined (read broken) when mounted more than once at the 36 | // same time. Specifically, read/writes will not be seen right away and 37 | // may work on out of date information. 38 | // 39 | // One caveat of the implemented features is that handles are not 40 | // reference counted so if an inode is deleted, all open file descriptors 41 | // pointing to it become invalid. 42 | // 43 | // Some TODOs (definitely not a comprehensive list): 44 | // - support basic attributes (mode, timestamps) 45 | // - support other types: hard links 46 | // - add ref counting (and handle open/release) 47 | // - sparse files: don't store empty blocks 48 | // - sparse files 2: keep track of holes 49 | 50 | package main 51 | 52 | import ( 53 | "database/sql" 54 | "flag" 55 | "fmt" 56 | "log" 57 | "os" 58 | "os/signal" 59 | 60 | "bazil.org/fuse" 61 | "bazil.org/fuse/fs" 62 | _ "bazil.org/fuse/fs/fstestutil" 63 | 64 | // Import postgres driver. 65 | _ "github.com/lib/pq" 66 | ) 67 | 68 | var usage = func() { 69 | fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0]) 70 | fmt.Fprintf(os.Stderr, " %s \n\n", os.Args[0]) 71 | flag.PrintDefaults() 72 | } 73 | 74 | func main() { 75 | flag.Usage = usage 76 | flag.Parse() 77 | 78 | if flag.NArg() != 2 { 79 | usage() 80 | os.Exit(2) 81 | } 82 | 83 | dbURL, mountPoint := flag.Arg(0), flag.Arg(1) 84 | 85 | // Open DB connection first. 86 | db, err := sql.Open("postgres", dbURL) 87 | if err != nil { 88 | log.Fatal(err) 89 | } 90 | defer func() { _ = db.Close() }() 91 | 92 | if err := initSchema(db); err != nil { 93 | log.Fatal(err) 94 | } 95 | 96 | cfs := CFS{db} 97 | // Mount filesystem. 98 | c, err := fuse.Mount( 99 | mountPoint, 100 | fuse.FSName("CockroachFS"), 101 | fuse.Subtype("CockroachFS"), 102 | fuse.LocalVolume(), 103 | fuse.VolumeName(""), 104 | ) 105 | if err != nil { 106 | log.Fatal(err) 107 | } 108 | defer func() { 109 | _ = c.Close() 110 | }() 111 | 112 | go func() { 113 | sig := make(chan os.Signal, 1) 114 | signal.Notify(sig, os.Interrupt) 115 | for range sig { 116 | if err := fuse.Unmount(mountPoint); err != nil { 117 | log.Printf("Signal received, but could not unmount: %s", err) 118 | } else { 119 | break 120 | } 121 | } 122 | }() 123 | 124 | // Serve root. 125 | err = fs.Serve(c, cfs) 126 | if err != nil { 127 | log.Fatal(err) 128 | } 129 | 130 | // check if the mount process has an error to report 131 | <-c.Ready 132 | if err := c.MountError; err != nil { 133 | log.Fatal(err) 134 | } 135 | } 136 | -------------------------------------------------------------------------------- /filesystem/node.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Marc Berhault (marc@cockroachlabs.com) 17 | 18 | package main 19 | 20 | import ( 21 | "database/sql" 22 | "encoding/json" 23 | "fmt" 24 | "log" 25 | "math" 26 | "os" 27 | "sync" 28 | "syscall" 29 | 30 | "github.com/cockroachdb/cockroach-go/crdb" 31 | 32 | "bazil.org/fuse" 33 | "bazil.org/fuse/fs" 34 | "golang.org/x/net/context" 35 | ) 36 | 37 | var _ fs.Node = &Node{} // Attr 38 | var _ fs.NodeSetattrer = &Node{} // Setattr 39 | var _ fs.NodeStringLookuper = &Node{} // Lookup 40 | var _ fs.HandleReadDirAller = &Node{} // HandleReadDirAller 41 | var _ fs.NodeMkdirer = &Node{} // Mkdir 42 | var _ fs.NodeCreater = &Node{} // Create 43 | var _ fs.NodeRemover = &Node{} // Remove 44 | var _ fs.HandleWriter = &Node{} // Write 45 | var _ fs.HandleReader = &Node{} // Read 46 | var _ fs.NodeFsyncer = &Node{} // Fsync 47 | var _ fs.NodeRenamer = &Node{} // Rename 48 | var _ fs.NodeSymlinker = &Node{} // Symlink 49 | var _ fs.NodeReadlinker = &Node{} // Readlink 50 | 51 | // Default permissions: we don't have any right now. 52 | const defaultPerms = 0755 53 | 54 | // All permissions. 55 | const allPerms = 0777 56 | 57 | // Maximum file size. 58 | const maxSize = math.MaxUint64 59 | 60 | // Maximum length of a symlink target. 61 | const maxSymlinkTargetLength = 4096 62 | 63 | // Node implements the Node interface. 64 | // ID, Mode, and SymlinkTarget are currently immutable after node creation. 65 | // Size (for files only) is protected by mu. 66 | type Node struct { 67 | cfs CFS 68 | // ID is a unique ID allocated at node creation time. 69 | ID uint64 70 | // Used for type only, permissions are ignored. 71 | Mode os.FileMode 72 | // SymlinkTarget is the path a symlink points to. 73 | SymlinkTarget string 74 | 75 | // Other fields to add: 76 | // nLinks: number of hard links 77 | // openFDs: number of open file descriptors 78 | // timestamps (probably just ctime and mtime) 79 | 80 | // Implicit fields: 81 | // numBlocks: number of 512b blocks 82 | // blocksize: preferred block size 83 | // mode bits: permissions 84 | 85 | // For regular files only. 86 | // Data blocks are addressed by inode number and offset. 87 | // Any op accessing Size and blocks must lock 'mu'. 88 | mu sync.RWMutex 89 | Size uint64 90 | } 91 | 92 | // convenience functions to query the mode. 93 | func (n *Node) isDir() bool { 94 | return n.Mode.IsDir() 95 | } 96 | 97 | func (n *Node) isRegular() bool { 98 | return n.Mode.IsRegular() 99 | } 100 | 101 | func (n *Node) isSymlink() bool { 102 | return n.Mode&os.ModeSymlink != 0 103 | } 104 | 105 | // toJSON returns the json-encoded string for this node. 106 | func (n *Node) toJSON() string { 107 | ret, err := json.Marshal(n) 108 | if err != nil { 109 | panic(err) 110 | } 111 | return string(ret) 112 | } 113 | 114 | // Attr fills attr with the standard metadata for the node. 115 | func (n *Node) Attr(_ context.Context, a *fuse.Attr) error { 116 | a.Inode = n.ID 117 | a.Mode = n.Mode 118 | // Does preferred block size make sense on things other 119 | // than regular files? 120 | a.BlockSize = BlockSize 121 | 122 | if n.isRegular() { 123 | n.mu.RLock() 124 | defer n.mu.RUnlock() 125 | a.Size = n.Size 126 | 127 | // Blocks is the number of 512 byte blocks, regardless of 128 | // filesystem blocksize. 129 | a.Blocks = (n.Size + 511) / 512 130 | } else if n.isSymlink() { 131 | // Symlink: use target name length. 132 | a.Size = uint64(len(n.SymlinkTarget)) 133 | } 134 | return nil 135 | } 136 | 137 | // Setattr modifies node metadata. This includes changing the size. 138 | func (n *Node) Setattr( 139 | ctx context.Context, req *fuse.SetattrRequest, resp *fuse.SetattrResponse, 140 | ) error { 141 | if !req.Valid.Size() { 142 | // We can exit early since only setting the size is implemented. 143 | return nil 144 | } 145 | 146 | if !n.isRegular() { 147 | // Setting the size is only available on regular files. 148 | return fuse.Errno(syscall.EINVAL) 149 | } 150 | 151 | if req.Size > maxSize { 152 | // Too big. 153 | return fuse.Errno(syscall.EFBIG) 154 | } 155 | 156 | n.mu.Lock() 157 | defer n.mu.Unlock() 158 | 159 | if req.Size == n.Size { 160 | // Nothing to do. 161 | return nil 162 | } 163 | 164 | // Store the current size in case we need to rollback. 165 | originalSize := n.Size 166 | 167 | // Wrap everything inside a transaction. 168 | err := crdb.ExecuteTx(ctx, n.cfs.db, nil /* txopts */, func(tx *sql.Tx) error { 169 | // Resize blocks as needed. 170 | if err := resizeBlocks(tx, n.ID, n.Size, req.Size); err != nil { 171 | return err 172 | } 173 | 174 | n.Size = req.Size 175 | return updateNode(tx, n) 176 | }) 177 | 178 | if err != nil { 179 | // Reset our size. 180 | log.Print(err) 181 | n.Size = originalSize 182 | return err 183 | } 184 | return nil 185 | } 186 | 187 | // Lookup looks up a specific entry in the receiver, 188 | // which must be a directory. Lookup should return a Node 189 | // corresponding to the entry. If the name does not exist in 190 | // the directory, Lookup should return ENOENT. 191 | // 192 | // Lookup need not to handle the names "." and "..". 193 | func (n *Node) Lookup(_ context.Context, name string) (fs.Node, error) { 194 | if !n.isDir() { 195 | return nil, fuse.Errno(syscall.ENOTDIR) 196 | } 197 | node, err := n.cfs.lookup(n.ID, name) 198 | if err != nil { 199 | if err == sql.ErrNoRows { 200 | return nil, fuse.ENOENT 201 | } 202 | return nil, err 203 | } 204 | node.cfs = n.cfs 205 | return node, nil 206 | } 207 | 208 | // ReadDirAll returns the list of child inodes. 209 | func (n *Node) ReadDirAll(_ context.Context) ([]fuse.Dirent, error) { 210 | if !n.isDir() { 211 | return nil, fuse.Errno(syscall.ENOTDIR) 212 | } 213 | return n.cfs.list(n.ID) 214 | } 215 | 216 | // Mkdir creates a directory in 'n'. 217 | // We let the sql query fail if the directory already exists. 218 | // TODO(marc): better handling of errors. 219 | func (n *Node) Mkdir(ctx context.Context, req *fuse.MkdirRequest) (fs.Node, error) { 220 | if !n.isDir() { 221 | return nil, fuse.Errno(syscall.ENOTDIR) 222 | } 223 | if !req.Mode.IsDir() { 224 | return nil, fuse.Errno(syscall.ENOTDIR) 225 | } 226 | 227 | node := n.cfs.newDirNode() 228 | err := n.cfs.create(ctx, n.ID, req.Name, node) 229 | if err != nil { 230 | return nil, err 231 | } 232 | return node, nil 233 | } 234 | 235 | // Create creates a new file in the receiver directory. 236 | func (n *Node) Create( 237 | ctx context.Context, req *fuse.CreateRequest, resp *fuse.CreateResponse, 238 | ) (fs.Node, fs.Handle, error) { 239 | if !n.isDir() { 240 | return nil, nil, fuse.Errno(syscall.ENOTDIR) 241 | } 242 | if req.Mode.IsDir() { 243 | return nil, nil, fuse.Errno(syscall.EISDIR) 244 | } else if !req.Mode.IsRegular() { 245 | return nil, nil, fuse.Errno(syscall.EINVAL) 246 | } 247 | 248 | node := n.cfs.newFileNode() 249 | err := n.cfs.create(ctx, n.ID, req.Name, node) 250 | if err != nil { 251 | return nil, nil, err 252 | } 253 | return node, node, nil 254 | } 255 | 256 | // Remove may be unlink or rmdir. 257 | func (n *Node) Remove(ctx context.Context, req *fuse.RemoveRequest) error { 258 | if !n.isDir() { 259 | return fuse.Errno(syscall.ENOTDIR) 260 | } 261 | 262 | if req.Dir { 263 | // Rmdir. 264 | return n.cfs.remove(ctx, n.ID, req.Name, true /* checkChildren */) 265 | } 266 | // Unlink file/symlink. 267 | return n.cfs.remove(ctx, n.ID, req.Name, false /* !checkChildren */) 268 | } 269 | 270 | // Write writes data to 'n'. It may overwrite existing data, or grow it. 271 | func (n *Node) Write(ctx context.Context, req *fuse.WriteRequest, resp *fuse.WriteResponse) error { 272 | if !n.isRegular() { 273 | return fuse.Errno(syscall.EINVAL) 274 | } 275 | if req.Offset < 0 { 276 | return fuse.Errno(syscall.EINVAL) 277 | } 278 | if len(req.Data) == 0 { 279 | return nil 280 | } 281 | 282 | n.mu.Lock() 283 | defer n.mu.Unlock() 284 | 285 | newSize := uint64(req.Offset) + uint64(len(req.Data)) 286 | if newSize > maxSize { 287 | return fuse.Errno(syscall.EFBIG) 288 | } 289 | 290 | // Store the current size in case we need to rollback. 291 | originalSize := n.Size 292 | 293 | // Wrap everything inside a transaction. 294 | err := crdb.ExecuteTx(ctx, n.cfs.db, nil /* txopts */, func(tx *sql.Tx) error { 295 | 296 | // Update blocks. They will be added as needed. 297 | if err := write(tx, n.ID, n.Size, uint64(req.Offset), req.Data); err != nil { 298 | return err 299 | } 300 | 301 | if newSize > originalSize { 302 | // This was an append, commit the size change. 303 | n.Size = newSize 304 | if err := updateNode(tx, n); err != nil { 305 | return err 306 | } 307 | } 308 | return nil 309 | }) 310 | 311 | if err != nil { 312 | // Reset our size. 313 | log.Print(err) 314 | n.Size = originalSize 315 | return err 316 | } 317 | 318 | // We always write everything. 319 | resp.Size = len(req.Data) 320 | return nil 321 | } 322 | 323 | // Read reads data from 'n'. 324 | func (n *Node) Read(ctx context.Context, req *fuse.ReadRequest, resp *fuse.ReadResponse) error { 325 | if !n.isRegular() { 326 | return fuse.Errno(syscall.EINVAL) 327 | } 328 | if req.Offset < 0 { 329 | // Before beginning of file. 330 | return fuse.Errno(syscall.EINVAL) 331 | } 332 | if req.Size == 0 { 333 | // No bytes requested. 334 | return nil 335 | } 336 | offset := uint64(req.Offset) 337 | 338 | n.mu.RLock() 339 | defer n.mu.RUnlock() 340 | if offset >= n.Size { 341 | // Beyond end of file. 342 | return nil 343 | } 344 | 345 | to := min(n.Size, offset+uint64(req.Size)) 346 | if offset == to { 347 | return nil 348 | } 349 | 350 | data, err := read(n.cfs.db, n.ID, offset, to) 351 | if err != nil { 352 | return err 353 | } 354 | resp.Data = data 355 | return nil 356 | } 357 | 358 | // Fsync is a noop for us, we always push writes to the DB. We do need to implement it though. 359 | func (n *Node) Fsync(_ context.Context, _ *fuse.FsyncRequest) error { 360 | return nil 361 | } 362 | 363 | // Rename renames 'req.OldName' to 'req.NewName', optionally moving it to 'newDir'. 364 | // If req.NewName exists, it is deleted. It is assumed that it cannot be a directory. 365 | // NOTE: we do not keep track of opens, so we delete existing destinations right away. 366 | // This means that anyone holding an open file descriptor on the destination will fail 367 | // when trying to use it. 368 | // To properly handle this, we need to count references (including inode -> inode refs, 369 | // and open handles) and delete the inode only when it reaches zero. 370 | func (n *Node) Rename(ctx context.Context, req *fuse.RenameRequest, newDir fs.Node) error { 371 | newNode, ok := newDir.(*Node) 372 | if !ok { 373 | return fmt.Errorf("newDir is not a Node: %v", newDir) 374 | } 375 | if !n.isDir() || !newNode.isDir() { 376 | return fuse.Errno(syscall.ENOTDIR) 377 | } 378 | return n.cfs.rename(ctx, n.ID, newNode.ID, req.OldName, req.NewName) 379 | } 380 | 381 | // Symlink creates a new symbolic link in the receiver node, which must 382 | // be a directory. 383 | func (n *Node) Symlink(ctx context.Context, req *fuse.SymlinkRequest) (fs.Node, error) { 384 | if !n.isDir() { 385 | return nil, fuse.Errno(syscall.ENOTDIR) 386 | } 387 | if len(req.Target) > maxSymlinkTargetLength { 388 | return nil, fuse.Errno(syscall.ENAMETOOLONG) 389 | } 390 | node := n.cfs.newSymlinkNode() 391 | node.SymlinkTarget = req.Target 392 | err := n.cfs.create(ctx, n.ID, req.NewName, node) 393 | if err != nil { 394 | return nil, err 395 | } 396 | return node, nil 397 | } 398 | 399 | // Readlink reads a symbolic link. 400 | func (n *Node) Readlink(_ context.Context, req *fuse.ReadlinkRequest) (string, error) { 401 | if !n.isSymlink() { 402 | return "", fuse.Errno(syscall.EINVAL) 403 | } 404 | return n.SymlinkTarget, nil 405 | } 406 | -------------------------------------------------------------------------------- /filesystem/sql.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Marc Berhault (marc@cockroachlabs.com) 17 | 18 | package main 19 | 20 | import ( 21 | "database/sql" 22 | "encoding/json" 23 | "syscall" 24 | 25 | "bazil.org/fuse" 26 | ) 27 | 28 | // sqlExecutor is an interface needed for basic queries. 29 | // It is implemented by both sql.DB and sql.Txn. 30 | type sqlExecutor interface { 31 | Exec(query string, args ...interface{}) (sql.Result, error) 32 | Query(query string, args ...interface{}) (*sql.Rows, error) 33 | QueryRow(query string, args ...interface{}) *sql.Row 34 | } 35 | 36 | // getInode looks up an inode given its name and its parent ID. 37 | // If not found, error will be sql.ErrNoRows. 38 | func getInode(e sqlExecutor, parentID uint64, name string) (*Node, error) { 39 | var raw string 40 | const sql = `SELECT inode FROM fs.inode WHERE id = 41 | (SELECT id FROM fs.namespace WHERE (parentID, name) = ($1, $2))` 42 | if err := e.QueryRow(sql, parentID, name).Scan(&raw); err != nil { 43 | return nil, err 44 | } 45 | 46 | node := &Node{} 47 | err := json.Unmarshal([]byte(raw), node) 48 | return node, err 49 | } 50 | 51 | // checkIsEmpty returns nil if 'id' has no children. 52 | func checkIsEmpty(e sqlExecutor, id uint64) error { 53 | var count uint64 54 | const countSQL = ` 55 | SELECT COUNT(parentID) FROM fs.namespace WHERE parentID = $1` 56 | if err := e.QueryRow(countSQL, id).Scan(&count); err != nil { 57 | return err 58 | } 59 | if count != 0 { 60 | return fuse.Errno(syscall.ENOTEMPTY) 61 | } 62 | return nil 63 | } 64 | 65 | // updateNode updates an existing node descriptor. 66 | func updateNode(e sqlExecutor, node *Node) error { 67 | inode := node.toJSON() 68 | const sql = ` 69 | UPDATE fs.inode SET inode = $1 WHERE id = $2; 70 | ` 71 | if _, err := e.Exec(sql, inode, node.ID); err != nil { 72 | return err 73 | } 74 | return nil 75 | } 76 | 77 | // getBlockData returns the block data for a single block. 78 | func getBlockData(e sqlExecutor, inodeID uint64, block int) ([]byte, error) { 79 | var data []byte 80 | const sql = `SELECT data FROM fs.block WHERE id = $1 AND block = $2` 81 | if err := e.QueryRow(sql, inodeID, block).Scan(&data); err != nil { 82 | return nil, err 83 | } 84 | return data, nil 85 | } 86 | 87 | // updateBlockData overwrites the data for a single block. 88 | func updateBlockData(e sqlExecutor, inodeID uint64, block int, data []byte) error { 89 | const sql = `UPDATE fs.block SET data = $1 WHERE (id, block) = ($2, $3)` 90 | if _, err := e.Exec(sql, data, inodeID, block); err != nil { 91 | return err 92 | } 93 | return nil 94 | } 95 | 96 | type blockInfo struct { 97 | block int 98 | data []byte 99 | } 100 | 101 | // getBlocks fetches all the blocks for a given inode and returns 102 | // a list of blockInfo objects. 103 | func getBlocks(e sqlExecutor, inodeID uint64) ([]blockInfo, error) { 104 | stmt := `SELECT block, data FROM fs.block WHERE id = $1` 105 | rows, err := e.Query(stmt, inodeID) 106 | if err != nil { 107 | return nil, err 108 | } 109 | return buildBlockInfos(rows) 110 | } 111 | 112 | // getBlocksBetween fetches blocks with IDs [start, end] for a given inode 113 | // and returns a list of blockInfo objects. 114 | func getBlocksBetween(e sqlExecutor, inodeID uint64, start, end int) ([]blockInfo, error) { 115 | stmt := `SELECT block, data FROM fs.block WHERE id = $1 AND block >= $2 AND block <= $3` 116 | rows, err := e.Query(stmt, inodeID, start, end) 117 | if err != nil { 118 | return nil, err 119 | } 120 | return buildBlockInfos(rows) 121 | } 122 | 123 | func buildBlockInfos(rows *sql.Rows) ([]blockInfo, error) { 124 | var results []blockInfo 125 | for rows.Next() { 126 | b := blockInfo{} 127 | if err := rows.Scan(&b.block, &b.data); err != nil { 128 | return nil, err 129 | } 130 | results = append(results, b) 131 | } 132 | if err := rows.Err(); err != nil { 133 | return nil, err 134 | } 135 | 136 | return results, nil 137 | } 138 | -------------------------------------------------------------------------------- /hotspot/.gitignore: -------------------------------------------------------------------------------- 1 | hotspot 2 | -------------------------------------------------------------------------------- /hotspot/README.md: -------------------------------------------------------------------------------- 1 | # Hotspot example 2 | 3 | ## Summary 4 | 5 | The hotspot example program is a read/write workload intended to always hit 6 | the exact same value. It performs reads and writes to simulate a super 7 | contentious load. 8 | 9 | ## Running 10 | 11 | Run against an existing cockroach node or cluster. 12 | 13 | #### Insecure node or cluster 14 | ``` 15 | # Launch your node or cluster in insecure mode (with --insecure passed to cockroach). 16 | # Find a reachable address: [mycockroach:26257]. 17 | # Run the example with: 18 | ./hotspot postgres://root@mycockroach:26257?sslmode=disable 19 | ``` 20 | 21 | #### Secure node or cluster 22 | ``` 23 | # Launch your node or cluster in secure mode with certificates in [mycertsdir] 24 | # Find a reachable address:[mycockroach:26257]. 25 | # Run the example with: 26 | ./hotspot "postgres://root@mycockroach:26257?sslcert=mycertsdir/root.client.crt&sslkey=mycertsdir/root.client.key" 27 | ``` 28 | -------------------------------------------------------------------------------- /hotspot/main.go: -------------------------------------------------------------------------------- 1 | // Copyright 2015 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | 16 | // The hot spot example program is a read/write workload intended to always hit 17 | // the exact same row. It performs reads and writes to simulate a super 18 | // contentious load. 19 | 20 | package main 21 | 22 | import ( 23 | "database/sql" 24 | "flag" 25 | "fmt" 26 | "log" 27 | "math/rand" 28 | "net/url" 29 | "os" 30 | "os/signal" 31 | "runtime" 32 | "sync" 33 | "sync/atomic" 34 | "syscall" 35 | "time" 36 | 37 | // Import postgres driver. 38 | _ "github.com/lib/pq" 39 | ) 40 | 41 | const ( 42 | createDatabaseStatement = "CREATE DATABASE IF NOT EXISTS hot" 43 | createTableStatement = ` 44 | CREATE TABLE IF NOT EXISTS hot.spot ( 45 | id BIGINT NOT NULL, 46 | value BIGINT, 47 | PRIMARY KEY (id) 48 | )` 49 | ) 50 | 51 | var concurrency = flag.Int("concurrency", 2*runtime.NumCPU(), "Number of concurrent reading/writing processes") 52 | var writePercent = flag.Int("write-percent", 50, "Percentage, from 0 to 100 of the operations that will perform writes instead of reads") 53 | var tolerateErrors = flag.Bool("tolerate-errors", false, "Keep running on error") 54 | var outputInterval = flag.Duration("output-interval", 1*time.Second, "Interval of output") 55 | var duration = flag.Duration("duration", 0, "The duration to run. If 0, run forever.") 56 | var benchmarkName = flag.String("benchmark-name", "BenchmarkHotSpot", "Test name to report for Go benchmark results.") 57 | 58 | var readCount, writeCount uint64 59 | 60 | // A hotSpotWriter writes and reads values from one row in an infinite loop. 61 | type hotSpotWriter struct { 62 | db *sql.DB 63 | rand *rand.Rand 64 | } 65 | 66 | func newHotSpotWriter(db *sql.DB) hotSpotWriter { 67 | source := rand.NewSource(int64(time.Now().UnixNano())) 68 | return hotSpotWriter{ 69 | db: db, 70 | rand: rand.New(source), 71 | } 72 | } 73 | 74 | // run is an infinite loop in which the hotSpotWriter continuously attempts to 75 | // read and write values from a single row. 76 | func (w hotSpotWriter) run(errCh chan<- error, wg *sync.WaitGroup) { 77 | defer wg.Done() 78 | 79 | wPercent := *writePercent 80 | for { 81 | if w.rand.Intn(100) < wPercent { 82 | if _, err := w.db.Exec("UPSERT INTO hot.spot(id, value) VALUES (1, $1)", rand.Int63()); err != nil { 83 | errCh <- err 84 | } else { 85 | atomic.AddUint64(&writeCount, 1) 86 | } 87 | } else { 88 | var value int 89 | if err := w.db.QueryRow("SELECT value FROM hot.spot WHERE id = 1").Scan(&value); err != nil { 90 | errCh <- err 91 | } else { 92 | atomic.AddUint64(&readCount, 1) 93 | } 94 | } 95 | } 96 | } 97 | 98 | // setupDatabase performs initial setup for the example, creating a database and 99 | // with a single table. 100 | func setupDatabase(dbURL string) (*sql.DB, error) { 101 | parsedURL, err := url.Parse(dbURL) 102 | if err != nil { 103 | return nil, err 104 | } 105 | parsedURL.Path = "hot" 106 | 107 | // Open connection to server and create a database. 108 | db, err := sql.Open("postgres", parsedURL.String()) 109 | if err != nil { 110 | return nil, err 111 | } 112 | if _, err := db.Exec(createDatabaseStatement); err != nil { 113 | return nil, err 114 | } 115 | if _, err := db.Exec(createTableStatement); err != nil { 116 | return nil, err 117 | } 118 | // Insert a single value into the database to avoid errors if we try to read 119 | // before writing. 120 | if _, err := db.Exec("UPSERT INTO hot.spot(id, value) VALUES (1, $1)", rand.Int63()); err != nil { 121 | return nil, err 122 | } 123 | 124 | // Allow a maximum of concurrency+1 connections to the database. 125 | db.SetMaxOpenConns(*concurrency + 1) 126 | db.SetMaxIdleConns(*concurrency + 1) 127 | 128 | return db, nil 129 | } 130 | 131 | var usage = func() { 132 | fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0]) 133 | fmt.Fprintf(os.Stderr, " %s \n\n", os.Args[0]) 134 | flag.PrintDefaults() 135 | } 136 | 137 | func main() { 138 | flag.Usage = usage 139 | flag.Parse() 140 | 141 | dbURL := "postgresql://root@localhost:26257/hot?sslmode=disable" 142 | if flag.NArg() == 1 { 143 | dbURL = flag.Arg(0) 144 | } 145 | 146 | if *concurrency < 1 { 147 | log.Fatalf("Value of 'concurrency' flag (%d) must be greater than or equal to 1", *concurrency) 148 | } 149 | 150 | if *writePercent < 0 || *writePercent > 100 { 151 | log.Fatalf("Value of 'writePercent' flag (%d) must be between 0 and 100", *writePercent) 152 | } 153 | 154 | var db *sql.DB 155 | { 156 | var err error 157 | for err == nil || *tolerateErrors { 158 | db, err = setupDatabase(dbURL) 159 | if err == nil { 160 | break 161 | } 162 | if !*tolerateErrors { 163 | log.Fatal(err) 164 | } 165 | } 166 | } 167 | 168 | lastNow := time.Now() 169 | start := lastNow 170 | var lastReads, lastWrites, lastTotal uint64 171 | writers := make([]hotSpotWriter, *concurrency) 172 | 173 | errCh := make(chan error) 174 | var wg sync.WaitGroup 175 | for i := range writers { 176 | wg.Add(1) 177 | writers[i] = newHotSpotWriter(db) 178 | go writers[i].run(errCh, &wg) 179 | } 180 | 181 | var numErr int 182 | tick := time.Tick(*outputInterval) 183 | done := make(chan os.Signal, 3) 184 | signal.Notify(done, syscall.SIGINT, syscall.SIGTERM) 185 | 186 | go func() { 187 | wg.Wait() 188 | done <- syscall.Signal(0) 189 | }() 190 | 191 | if *duration > 0 { 192 | go func() { 193 | time.Sleep(*duration) 194 | done <- syscall.Signal(0) 195 | }() 196 | } 197 | 198 | defer func() { 199 | // Output results that mimic Go's built-in benchmark format. 200 | elapsed := time.Since(start) 201 | reads := atomic.LoadUint64(&readCount) 202 | writes := atomic.LoadUint64(&writeCount) 203 | total := reads + writes 204 | fmt.Printf("%s\t%8d\t%12.1f ns/op\n", 205 | *benchmarkName, total, float64(elapsed.Nanoseconds())/float64(total)) 206 | }() 207 | 208 | ticks := -1 209 | for { 210 | select { 211 | case err := <-errCh: 212 | numErr++ 213 | if !*tolerateErrors { 214 | log.Fatal(err) 215 | } else { 216 | log.Print(err) 217 | } 218 | continue 219 | 220 | case <-tick: 221 | ticks++ 222 | now := time.Now() 223 | elapsed := time.Since(lastNow) 224 | reads := atomic.LoadUint64(&readCount) 225 | writes := atomic.LoadUint64(&writeCount) 226 | total := reads + writes 227 | 228 | if ticks%20 == 0 { 229 | fmt.Println("_elapsed___errors____r/sec____w/sec___rw/sec") 230 | } 231 | 232 | fmt.Printf("%8s %8d %8d %8d %8d\n", 233 | time.Duration(time.Since(start).Seconds()+0.5)*time.Second, 234 | numErr, 235 | int(float64(reads-lastReads)/elapsed.Seconds()), 236 | int(float64(writes-lastWrites)/elapsed.Seconds()), 237 | int(float64(total-lastTotal)/elapsed.Seconds())) 238 | lastReads = reads 239 | lastWrites = writes 240 | lastTotal = total 241 | lastNow = now 242 | 243 | case <-done: 244 | fmt.Println("---------------------------------------------") 245 | reads := atomic.LoadUint64(&readCount) 246 | writes := atomic.LoadUint64(&writeCount) 247 | total := reads + writes 248 | elapsed := time.Duration(time.Since(start).Seconds()+0.5) * time.Second 249 | fmt.Printf("time:%s, reads:%d(%.1f/sec), writes:%d(%.1f/sec), total:%d(%.1f/sec), errors:%d\n", 250 | elapsed, 251 | reads, float64(reads)/elapsed.Seconds(), 252 | writes, float64(writes)/elapsed.Seconds(), 253 | total, float64(total)/elapsed.Seconds(), 254 | numErr, 255 | ) 256 | return 257 | } 258 | } 259 | } 260 | -------------------------------------------------------------------------------- /ledger/.gitignore: -------------------------------------------------------------------------------- 1 | ledger 2 | -------------------------------------------------------------------------------- /ledger/README.md: -------------------------------------------------------------------------------- 1 | # Ledger example 2 | 3 | ## Summary 4 | 5 | Simulate a ledger and a certain type of workload against it. 6 | A general ledger is a complete record of financial transactions over the life 7 | of a bank (or other company). 8 | The example here aims to model a bank in a more realistic setting than our 9 | previous bank example(s) do, and tickles contention issues (causing complete 10 | deadlock in the more contended modes) which will be interesting to investigate. 11 | 12 | ## Running 13 | 14 | See the bank example for more detailed information. 15 | The example may be run both against Postgres and Cockroach. 16 | 17 | Run the example with `--help` to see all configuration options. 18 | 19 | ### Prerequisites 20 | ```bash 21 | ./cockroach start --background # for CockroachDB 22 | docker run -d -p 5432:5432 postgres # for Postgres 23 | ``` 24 | 25 | ### Examples 26 | 27 | For Postgres, change 26257 to 5432 below. 28 | 29 | ``` 30 | go run $GOPATH/src/github.com/cockroachdb/examples-go/ledger/main.go --concurrency 5 --sources 10 --destinations 10 postgres://root@localhost:26257?sslmode=disable 31 | ``` 32 | 33 | This runs a moderately contended example, transferring money between ten random 34 | accounts. You can vary the contention by source and destination: 35 | 36 | * no contention: `--sources=0`, `--destinations=0`. 37 | * asymmetric contention: money is transferred from (practically infinitely) many 38 | accounts to ten destinations accounts: `--sources=0`, `--destinations=10` 39 | * hammering a single account, but only using 100 source accounts: 40 | `--sources=100`, `--destinations=1`. 41 | 42 | The workload is essentially a read for both the source and target account, and 43 | then a write to both the source and target account. Hence, you should expect 44 | some symmetry, but since the source account is accessed first for both the read 45 | and the write, it's not perfectly symmetric. 46 | -------------------------------------------------------------------------------- /ledger/main.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Tobias Schottdorf (tobias.schottdorf@gmail.com) 17 | 18 | // This example simulates a (particular) banking ledger. Depending on the 19 | // chosen generator and concurrency, the workload carried out is contended 20 | // or entirely non-overlapping. 21 | package main 22 | 23 | import ( 24 | "context" 25 | "database/sql" 26 | "flag" 27 | "fmt" 28 | "log" 29 | "math" 30 | "math/rand" 31 | "net/url" 32 | "os" 33 | "strconv" 34 | "time" 35 | 36 | // Import postgres driver. 37 | "github.com/cockroachdb/cockroach-go/crdb" 38 | "github.com/lib/pq" 39 | "github.com/paulbellamy/ratecounter" 40 | ) 41 | 42 | const stmtCreate = ` 43 | CREATE TABLE IF NOT EXISTS accounts ( 44 | causality_id BIGINT NOT NULL, 45 | posting_group_id BIGINT NOT NULL, 46 | 47 | amount BIGINT, 48 | balance BIGINT, 49 | currency VARCHAR, 50 | 51 | created TIMESTAMP, 52 | value_date TIMESTAMP, 53 | 54 | account_id VARCHAR, 55 | transaction_id VARCHAR, 56 | 57 | scheme VARCHAR, 58 | 59 | PRIMARY KEY (account_id, posting_group_id), 60 | UNIQUE (account_id, causality_id) STORING(balance) 61 | ); 62 | -- Could create this inline on Cockroach, but not on Postgres. 63 | CREATE INDEX ON accounts(transaction_id); 64 | CREATE INDEX ON accounts (posting_group_id); 65 | ` 66 | 67 | var nSource = flag.Uint64("sources", 10, "Number of source accounts to choose from at random for transfers. Specify zero for maximum possible.") 68 | var nDest = flag.Uint64("destinations", 10, "Number of destination accounts to choose from at random for transfers. Specify zero for maximum possible.") 69 | 70 | var concurrency = flag.Int("concurrency", 5, "Number of concurrent actors moving money.") 71 | var noRunningBalance = flag.Bool("no-running-balance", false, "Do not keep a running balance per account. Avoids contention.") 72 | var verbose = flag.Bool("verbose", false, "Print information about each transfer.") 73 | 74 | var counter *ratecounter.RateCounter 75 | 76 | func init() { 77 | counter = ratecounter.NewRateCounter(1 * time.Second) 78 | rand.Seed(time.Now().UnixNano()) 79 | } 80 | 81 | var usage = func() { 82 | fmt.Fprintf(os.Stderr, "Usage of %s:\n", os.Args[0]) 83 | fmt.Fprintf(os.Stderr, " %s \n\n", os.Args[0]) 84 | flag.PrintDefaults() 85 | } 86 | 87 | type postingRequest struct { 88 | Group int64 89 | AccountA, AccountB string 90 | Amount int64 // deposited on AccountA, removed from AccountB 91 | Currency string 92 | 93 | Transaction, Scheme string // opaque 94 | } 95 | 96 | var goldenReq = postingRequest{ 97 | Group: 1, 98 | AccountA: "myacc", 99 | AccountB: "youracc", 100 | Amount: 5, 101 | Currency: "USD", 102 | } 103 | 104 | func generator(nSrc, nDst uint64) func() postingRequest { 105 | if nSrc == 0 || nSrc >= math.MaxInt64 { 106 | nSrc = math.MaxInt64 107 | } 108 | if nDst == 0 || nDst >= math.MaxInt64 { 109 | nDst = math.MaxInt64 110 | } 111 | return func() postingRequest { 112 | req := goldenReq 113 | req.AccountA = fmt.Sprintf("acc%d", rand.Int63n(int64(nSrc))) 114 | req.AccountB = fmt.Sprintf("acc%d", rand.Int63n(int64(nDst))) 115 | req.Group = rand.Int63() 116 | return req 117 | } 118 | } 119 | 120 | func getLast(tx *sql.Tx, accountID string) (lastCID int64, lastBalance int64, err error) { 121 | err = tx.QueryRow(`SELECT causality_id, balance FROM accounts `+ 122 | `WHERE account_id = $1 ORDER BY causality_id DESC LIMIT 1`, accountID). 123 | Scan(&lastCID, &lastBalance) 124 | 125 | if err == sql.ErrNoRows { 126 | err = nil 127 | // Paranoia about unspecified semantics. 128 | lastBalance = 0 129 | lastCID = 0 130 | } 131 | return 132 | } 133 | 134 | func doPosting(tx *sql.Tx, req postingRequest) error { 135 | var cidA, balA, cidB, balB int64 136 | if !*noRunningBalance { 137 | var err error 138 | cidA, balA, err = getLast(tx, req.AccountA) 139 | if err != nil { 140 | return err 141 | } 142 | cidB, balB, err = getLast(tx, req.AccountB) 143 | if err != nil { 144 | return err 145 | } 146 | } else { 147 | // For Cockroach, unique_rowid() would be the better choice. 148 | cidA, cidB = rand.Int63(), rand.Int63() 149 | // Want the running balance to always be zero in this case without 150 | // special-casing below. 151 | balA = -req.Amount 152 | balB = req.Amount 153 | } 154 | _, err := tx.Exec(` 155 | INSERT INTO accounts ( 156 | posting_group_id, 157 | amount, 158 | account_id, 159 | causality_id, -- strictly increasing in absolute time. Only used for running balance. 160 | balance 161 | ) 162 | VALUES ( 163 | $1, -- posting_group_id 164 | $2, -- amount 165 | $3, -- account_id (A) 166 | $4, -- causality_id 167 | $5+CAST($2 AS BIGINT) -- (new) balance (Postgres needs the cast) 168 | ), ( 169 | $1, -- posting_group_id 170 | -$2, -- amount 171 | $6, -- account_id (B) 172 | $7, -- causality_id 173 | $8-$2 -- (new) balance 174 | )`, req.Group, req.Amount, 175 | req.AccountA, cidA+1, balA, 176 | req.AccountB, cidB+1, balB) 177 | return err 178 | } 179 | 180 | type worker struct { 181 | l func(string, ...interface{}) // logger 182 | gen func() postingRequest // request generator 183 | } 184 | 185 | func (w *worker) run(db *sql.DB) { 186 | for { 187 | req := w.gen() 188 | if req.AccountA == req.AccountB { 189 | // The code we use throws a unique constraint violation since we 190 | // try to insert two conflicting primary keys. This isn't the 191 | // interesting case. 192 | continue 193 | } 194 | if *verbose { 195 | w.l("running %v", req) 196 | } 197 | if err := crdb.ExecuteTx(context.TODO(), db, nil /* txopts */, func(tx *sql.Tx) error { 198 | return doPosting(tx, req) 199 | }); err != nil { 200 | pqErr, ok := err.(*pq.Error) 201 | if ok { 202 | if pqErr.Code.Class() == pq.ErrorClass("23") { 203 | // Integrity violations. Don't expect many. 204 | w.l("%s", pqErr) 205 | continue 206 | } 207 | if pqErr.Code.Class() == pq.ErrorClass("40") { 208 | // Transaction rollback errors (e.g. Postgres 209 | // serializability restarts) 210 | if *verbose { 211 | w.l("%s", pqErr) 212 | } 213 | continue 214 | } 215 | } 216 | log.Fatal(err) 217 | } else { 218 | if *verbose { 219 | w.l("success") 220 | } 221 | counter.Incr(1) 222 | } 223 | } 224 | } 225 | 226 | func main() { 227 | flag.Usage = usage 228 | flag.Parse() 229 | 230 | if flag.NArg() != 1 { 231 | usage() 232 | os.Exit(2) 233 | } 234 | 235 | gen := generator(*nSource, *nDest) 236 | dbURL := flag.Arg(0) 237 | 238 | parsedURL, err := url.Parse(dbURL) 239 | if err != nil { 240 | log.Fatal(err) 241 | } 242 | if parsedURL.Path != "" && parsedURL.Path != "ledger" { 243 | log.Fatalf("unsupported database name %q in URL", parsedURL.Path) 244 | } 245 | parsedURL.Path = "ledger" 246 | 247 | db, err := sql.Open("postgres", parsedURL.String()) 248 | if err != nil { 249 | log.Fatal(err) 250 | } 251 | defer func() { _ = db.Close() }() 252 | 253 | // Ignoring the error is the easiest way to be reasonably sure the db+table 254 | // exist without bloating the example by introducing separate code for 255 | // CockroachDB and Postgres (for which `IF NOT EXISTS` is not available 256 | // in this context). 257 | _, _ = db.Exec(`CREATE DATABASE ledger`) 258 | if _, err := db.Exec(stmtCreate); err != nil { 259 | log.Fatal(err) 260 | } 261 | 262 | for i := 0; i < *concurrency; i++ { 263 | num := i 264 | go (&worker{ 265 | l: func(s string, args ...interface{}) { 266 | if *verbose { 267 | log.Printf(strconv.Itoa(num)+": "+s, args...) 268 | } 269 | }, 270 | gen: gen}).run(db) 271 | } 272 | 273 | go func() { 274 | t := time.NewTicker(time.Second) 275 | for { 276 | select { 277 | case <-t.C: 278 | log.Printf("%d postings/sec", counter.Rate()) 279 | } 280 | } 281 | }() 282 | 283 | select {} // block until killed 284 | } 285 | -------------------------------------------------------------------------------- /photos/db.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. 14 | // 15 | // Author: Spencer Kimball (spencer.kimball@gmail.com) 16 | 17 | package main 18 | 19 | import ( 20 | "database/sql" 21 | "log" 22 | "math/rand" 23 | "time" 24 | 25 | // Import postgres driver. 26 | _ "github.com/lib/pq" 27 | 28 | "github.com/pkg/errors" 29 | "golang.org/x/net/context" 30 | ) 31 | 32 | var errNoUser = errors.New("no user found") 33 | var errNoPhoto = errors.New("no photos found") 34 | 35 | const ( 36 | // TODO(spencer): update the CREATE DATABASE statement in the schema 37 | // to pull out the database specified in the DB URL and use it instead 38 | // of "photos" below. 39 | photosSchema = ` 40 | CREATE DATABASE IF NOT EXISTS photos; 41 | 42 | CREATE TABLE IF NOT EXISTS users ( 43 | id INT, 44 | photoCount INT, 45 | commentCount INT, 46 | name STRING, 47 | address STRING, 48 | 49 | PRIMARY KEY (id) 50 | ); 51 | 52 | CREATE TABLE IF NOT EXISTS photos ( 53 | id BYTES DEFAULT uuid_v4(), 54 | userID INT, 55 | commentCount INT, 56 | caption STRING, 57 | latitude FLOAT, 58 | longitude FLOAT, 59 | timestamp TIMESTAMP, 60 | 61 | PRIMARY KEY (id), 62 | UNIQUE INDEX byUserID (userID, timestamp) 63 | ); 64 | 65 | CREATE TABLE IF NOT EXISTS comments ( 66 | -- length check guards against insertion of empty photo ID. 67 | -- TODO(bdarnell): consider replacing length check with foreign key. 68 | -- Start with the length check because it's local; we'll want to keep 69 | -- an eye on performance when introducing the FK. 70 | photoID BYTES CHECK (length(photoID) = 16), 71 | commentID BYTES DEFAULT uuid_v4(), 72 | userID INT, 73 | message STRING, 74 | timestamp TIMESTAMP, 75 | 76 | PRIMARY KEY (photoID, timestamp, commentID) 77 | );` 78 | ) 79 | 80 | // openDB opens the database connection according to the context. 81 | func openDB(cfg Config) (*sql.DB, error) { 82 | return sql.Open("postgres", cfg.DBUrl) 83 | } 84 | 85 | // initSchema creates the database schema if it doesn't exist. 86 | func initSchema(ctx context.Context, db *sql.DB) error { 87 | _, err := db.ExecContext(ctx, photosSchema) 88 | return err 89 | } 90 | 91 | // dropDatabase drops the database. 92 | func dropDatabase(ctx context.Context, db *sql.DB) error { 93 | _, err := db.ExecContext(ctx, "DROP DATABASE IF EXISTS photos;") 94 | return err 95 | } 96 | 97 | const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" 98 | 99 | func randString(n int) string { 100 | b := make([]byte, n) 101 | for i := range b { 102 | b[i] = letterBytes[rand.Intn(len(letterBytes))] 103 | } 104 | return string(b) 105 | } 106 | 107 | // userExists looks up a user by ID. 108 | func userExists(ctx context.Context, tx *sql.Tx, userID int) (bool, error) { 109 | var id int 110 | const selectSQL = ` 111 | SELECT id FROM users WHERE id = $1; 112 | ` 113 | err := tx.QueryRowContext(ctx, selectSQL, userID).Scan(&id) 114 | switch err { 115 | case sql.ErrNoRows: 116 | return false, nil 117 | case nil: 118 | return true, nil 119 | default: 120 | return false, err 121 | } 122 | } 123 | 124 | // findClosestUserByID selects the first user which exists with 125 | // id >= userID. Returns the found user ID or an error. 126 | func findClosestUserByID(ctx context.Context, tx *sql.Tx, userID int) (int, error) { 127 | var id int 128 | const selectSQL = ` 129 | SELECT id FROM users WHERE id >= $1 ORDER BY id LIMIT 1; 130 | ` 131 | err := tx.QueryRowContext(ctx, selectSQL, userID).Scan(&id) 132 | if err == sql.ErrNoRows { 133 | return 0, errNoUser 134 | } 135 | return id, err 136 | } 137 | 138 | // createUser creates a new user with random name and address strings. 139 | func createUser(ctx context.Context, tx *sql.Tx, userID int) error { 140 | exists, err := userExists(ctx, tx, userID) 141 | if err != nil || exists { 142 | return err 143 | } 144 | const insertSQL = ` 145 | INSERT INTO users VALUES ($1, 0, 0, $2, $3); 146 | ` 147 | const minNameLen = 1 148 | const maxNameLen = 30 149 | const minAddrLen = 20 150 | const maxAddrLen = 100 151 | name := randString(minNameLen + rand.Intn(maxNameLen-minNameLen)) 152 | addr := randString(minAddrLen + rand.Intn(maxAddrLen-minAddrLen)) 153 | _, err = tx.ExecContext(ctx, insertSQL, userID, name, addr) 154 | return err 155 | } 156 | 157 | // createPhoto looks up or creates a new user to match userID (it's 158 | // the only method in this interface which doesn't match an existing 159 | // user except for createUser). It then creates a new photo for the 160 | // new or pre-existing user. 161 | func createPhoto(ctx context.Context, tx *sql.Tx, userID int) error { 162 | if err := createUser(ctx, tx, userID); err != nil { 163 | return err 164 | } 165 | 166 | const insertSQL = ` 167 | INSERT INTO photos VALUES (DEFAULT, $1, 0, $2, $3, $4, NOW()); 168 | ` 169 | const minCaptionLen = 10 170 | const maxCaptionLen = 200 171 | caption := randString(minCaptionLen + rand.Intn(maxCaptionLen-minCaptionLen)) 172 | latitude := rand.Float32() * 90 173 | longitude := rand.Float32() * 180 174 | if _, err := tx.ExecContext(ctx, insertSQL, userID, caption, latitude, longitude); err != nil { 175 | return err 176 | } 177 | 178 | const updateSQL = ` 179 | UPDATE users SET photoCount = photoCount + 1 WHERE id = $1; 180 | ` 181 | if _, err := tx.ExecContext(ctx, updateSQL, userID); err != nil { 182 | return err 183 | } 184 | return nil 185 | } 186 | 187 | // createComment chooses a random photo from a user with the closest 188 | // matching user ID and generates a random author ID to author the 189 | // comment. Counts are updated on the photo and author user. 190 | func createComment(ctx context.Context, tx *sql.Tx, userID int) error { 191 | photoID, err := chooseRandomPhoto(ctx, tx, userID) 192 | if err != nil { 193 | return err 194 | } 195 | authorID := rand.Intn(userID) + 1 196 | 197 | const insertSQL = ` 198 | INSERT INTO comments VALUES ($1, DEFAULT, $2, $3, NOW()); 199 | ` 200 | const minMessageLen = 32 201 | const maxMessageLen = 1024 202 | message := randString(minMessageLen + rand.Intn(maxMessageLen-minMessageLen)) 203 | if _, err := tx.ExecContext(ctx, insertSQL, photoID, authorID, message); err != nil { 204 | log.Printf("insert into photos failed: %s", err) 205 | return err 206 | } 207 | 208 | const updatePhotoSQL = ` 209 | UPDATE photos SET commentCount = commentCount + 1 WHERE id = $1; 210 | ` 211 | if _, err := tx.ExecContext(ctx, updatePhotoSQL, photoID); err != nil { 212 | return err 213 | } 214 | 215 | const updateUserSQL = ` 216 | UPDATE users SET commentCount = commentCount + 1 WHERE id = $1; 217 | ` 218 | if _, err := tx.ExecContext(ctx, updateUserSQL, authorID); err != nil { 219 | return err 220 | } 221 | return nil 222 | } 223 | 224 | // listPhotos queries up to 100 photos, sorted by timestamp in 225 | // descending order, for the first user with ID >= userID. If photoIDs 226 | // is not nil, stores the queried photo IDs in photoIDs. 227 | func listPhotos(ctx context.Context, tx *sql.Tx, userID int, photoIDs *[][]byte) error { 228 | var err error 229 | userID, err = findClosestUserByID(ctx, tx, userID) 230 | if err != nil { 231 | return err 232 | } 233 | const selectSQL = ` 234 | SELECT id, caption, commentCount, latitude, longitude, timestamp FROM photos WHERE userID = $1 ORDER BY timestamp DESC LIMIT 100` 235 | rows, err := tx.QueryContext(ctx, selectSQL, userID) 236 | switch { 237 | case err == sql.ErrNoRows: 238 | return nil 239 | case err != nil: 240 | return err 241 | } 242 | defer func() { _ = rows.Close() }() 243 | // Count and process the result set so we make sure work is done to 244 | // stream the results. 245 | var count int 246 | for rows.Next() { 247 | if err := rows.Err(); err != nil { 248 | return err 249 | } 250 | var id []byte 251 | var caption string 252 | var cCount int 253 | var lat, lon float64 254 | var ts time.Time 255 | if err := rows.Scan(&id, &caption, &cCount, &lat, &lon, &ts); err != nil { 256 | return errors.Errorf("failed to scan result set for user %d: %s", userID, err) 257 | } 258 | count++ 259 | if photoIDs != nil { 260 | *photoIDs = append(*photoIDs, id) 261 | } 262 | } 263 | //log.Printf("selected %d photos for user %d", count, userID) 264 | return nil 265 | } 266 | 267 | // chooseRandomPhoto selects a random photo for the specified 268 | // user or an existing user with the closest user ID. Returns 269 | // the photo ID or an error. 270 | func chooseRandomPhoto(ctx context.Context, tx *sql.Tx, userID int) ([]byte, error) { 271 | photoIDs := [][]byte{} 272 | if err := listPhotos(ctx, tx, userID, &photoIDs); err != nil { 273 | return nil, err 274 | } 275 | if len(photoIDs) == 0 { 276 | return nil, errNoPhoto 277 | } 278 | photoID := photoIDs[rand.Intn(len(photoIDs))] 279 | return photoID, nil 280 | } 281 | 282 | // listComments chooses a random photo and lists up to 100 of its 283 | // comments. Returns the photoID or an error. If the commentIDs slice 284 | // is not nil, it's set to the queried comments' IDs. 285 | func listComments(ctx context.Context, tx *sql.Tx, userID int, commentIDs *[][]byte) ([]byte, error) { 286 | photoID, err := chooseRandomPhoto(ctx, tx, userID) 287 | if err != nil { 288 | return nil, err 289 | } 290 | const selectSQL = `SELECT commentID, userID, message, timestamp FROM comments ` + 291 | `WHERE photoID = $1 ORDER BY timestamp DESC LIMIT 100` 292 | rows, err := tx.QueryContext(ctx, selectSQL, photoID) 293 | switch { 294 | case err == sql.ErrNoRows: 295 | return photoID, nil 296 | case err != nil: 297 | return nil, err 298 | } 299 | defer func() { _ = rows.Close() }() 300 | // Count and process the result set so we make sure work is done to 301 | // stream the results. 302 | var count int 303 | for rows.Next() { 304 | if err := rows.Err(); err != nil { 305 | return nil, err 306 | } 307 | var commentID []byte 308 | var message string 309 | var userID int 310 | var ts time.Time 311 | if err := rows.Scan(&commentID, &userID, &message, &ts); err != nil { 312 | return nil, errors.Errorf("failed to scan result set for photo %q: %s", photoID, err) 313 | } 314 | count++ 315 | if commentIDs != nil { 316 | *commentIDs = append(*commentIDs, commentID) 317 | } 318 | } 319 | //log.Printf("selected %d comments for photo %q", count, photoID) 320 | return photoID, nil 321 | } 322 | 323 | // chooseRandomComment selects a random comment for the specified 324 | // user or an existing user with the closest user ID. Returns 325 | // the photo and comment IDs or an error. 326 | func chooseRandomComment(ctx context.Context, tx *sql.Tx, userID int) ([]byte, []byte, error) { 327 | commentIDs := [][]byte{} 328 | photoID, err := listComments(ctx, tx, userID, &commentIDs) 329 | if err != nil { 330 | return nil, nil, err 331 | } 332 | if len(commentIDs) == 0 { 333 | return photoID, nil, nil 334 | } 335 | commentID := commentIDs[rand.Intn(len(commentIDs))] 336 | return photoID, commentID, nil 337 | } 338 | 339 | func updatePhoto(ctx context.Context, tx *sql.Tx, userID int) error { 340 | photoID, err := chooseRandomPhoto(ctx, tx, userID) 341 | if err != nil { 342 | return err 343 | } 344 | 345 | const updatePhotoSQL = ` 346 | UPDATE photos SET caption = $1 WHERE id = $2; 347 | ` 348 | const minCaptionLen = 10 349 | const maxCaptionLen = 200 350 | caption := randString(minCaptionLen + rand.Intn(maxCaptionLen-minCaptionLen)) 351 | if _, err := tx.ExecContext(ctx, updatePhotoSQL, caption, photoID); err != nil { 352 | return err 353 | } 354 | return nil 355 | } 356 | 357 | func updateComment(ctx context.Context, tx *sql.Tx, userID int) error { 358 | photoID, commentID, err := chooseRandomComment(ctx, tx, userID) 359 | if err != nil { 360 | return err 361 | } 362 | 363 | const updateCommentSQL = ` 364 | UPDATE comments SET message = $1 WHERE photoID = $2 AND commentID = $3; 365 | ` 366 | const minMessageLen = 10 367 | const maxMessageLen = 200 368 | message := randString(minMessageLen + rand.Intn(maxMessageLen-minMessageLen)) 369 | if _, err := tx.ExecContext(ctx, updateCommentSQL, message, photoID, commentID); err != nil { 370 | return err 371 | } 372 | return nil 373 | } 374 | 375 | func deletePhoto(ctx context.Context, tx *sql.Tx, userID int) error { 376 | photoID, err := chooseRandomPhoto(ctx, tx, userID) 377 | if err != nil { 378 | return err 379 | } 380 | const deletePhotoSQL = ` 381 | DELETE FROM photos WHERE id = $1; 382 | ` 383 | if _, err := tx.ExecContext(ctx, deletePhotoSQL, photoID); err != nil { 384 | return err 385 | } 386 | 387 | const updateSQL = ` 388 | UPDATE users SET photoCount = photoCount - 1 WHERE id = $1; 389 | ` 390 | if _, err := tx.ExecContext(ctx, updateSQL, userID); err != nil { 391 | return err 392 | } 393 | return nil 394 | } 395 | 396 | func deleteComment(ctx context.Context, tx *sql.Tx, userID int) error { 397 | photoID, commentID, err := chooseRandomComment(ctx, tx, userID) 398 | if err != nil { 399 | return err 400 | } 401 | const deleteCommentSQL = ` 402 | DELETE FROM comments WHERE photoID = $1 AND commentID = $2; 403 | ` 404 | if _, err := tx.ExecContext(ctx, deleteCommentSQL, photoID, commentID); err != nil { 405 | return err 406 | } 407 | 408 | const updatePhotoSQL = ` 409 | UPDATE photos SET commentCount = commentCount - 1 WHERE id = $1; 410 | ` 411 | if _, err := tx.ExecContext(ctx, updatePhotoSQL, photoID); err != nil { 412 | return err 413 | } 414 | 415 | const updateUserSQL = ` 416 | UPDATE users SET commentCount = commentCount - 1 WHERE id = $1; 417 | ` 418 | if _, err := tx.ExecContext(ctx, updateUserSQL, userID); err != nil { 419 | return err 420 | } 421 | return nil 422 | } 423 | 424 | // listMostCommentedPhotos queries the top 100 most commented on photos for the 425 | // first user with ID >= userID. 426 | func listMostCommentedPhotos(ctx context.Context, tx *sql.Tx, userID int) error { 427 | var err error 428 | userID, err = findClosestUserByID(ctx, tx, userID) 429 | if err != nil { 430 | return err 431 | } 432 | const selectSQL = ` 433 | SELECT id, caption, commentCount, latitude, longitude, timestamp 434 | FROM photos 435 | WHERE userID = $1 436 | ORDER BY commentcount DESC LIMIT 100 437 | ` 438 | rows, err := tx.QueryContext(ctx, selectSQL, userID) 439 | switch { 440 | case err == sql.ErrNoRows: 441 | return nil 442 | case err != nil: 443 | return err 444 | } 445 | defer func() { _ = rows.Close() }() 446 | 447 | for rows.Next() { 448 | if err := rows.Err(); err != nil { 449 | return err 450 | } 451 | var id []byte 452 | var caption string 453 | var cCount int 454 | var lat, lon float64 455 | var ts time.Time 456 | if err := rows.Scan(&id, &caption, &cCount, &lat, &lon, &ts); err != nil { 457 | return errors.Errorf("failed to scan result set for user %d: %s", userID, err) 458 | } 459 | } 460 | 461 | return nil 462 | } 463 | 464 | // listCommentsAlphabeticallyOp retrieves the first 100 comments for a photo 465 | // in alphabetical order for the first user with ID >= userID. This query is 466 | // semantically useless but tests sorting by a non-indexed column, which 467 | // triggers distributed SQL. 468 | func listCommentsAlphabetically(ctx context.Context, tx *sql.Tx, userID int) error { 469 | photoID, err := chooseRandomPhoto(ctx, tx, userID) 470 | if err != nil { 471 | return err 472 | } 473 | const selectSQL = ` 474 | SELECT commentID, userID, message, timestamp 475 | FROM comments 476 | WHERE photoID = $1 477 | ORDER BY message LIMIT 100 478 | ` 479 | rows, err := tx.QueryContext(ctx, selectSQL, photoID) 480 | switch { 481 | case err == sql.ErrNoRows: 482 | return nil 483 | case err != nil: 484 | return err 485 | } 486 | defer func() { _ = rows.Close() }() 487 | 488 | // Process all rows. 489 | for rows.Next() { 490 | if err := rows.Err(); err != nil { 491 | return err 492 | } 493 | var commentID []byte 494 | var message string 495 | var userID int 496 | var ts time.Time 497 | if err := rows.Scan(&commentID, &userID, &message, &ts); err != nil { 498 | return errors.Errorf("failed to scan result set for photo %q: %s", photoID, err) 499 | } 500 | } 501 | 502 | return nil 503 | } 504 | 505 | const ( 506 | // top10Commenters scans all the comments, grouping by userid, to find the 507 | // top commenters. Note that this query is artificial, as this value can be 508 | // queried directly from the user table. 509 | selectTop10Commenters = `SELECT count(*) AS post_count FROM comments GROUP BY userid ORDER BY post_count DESC LIMIT 10;` 510 | // top10Posters scans all the photos, grouping by userid, to find the 511 | // top photo posters. Note that this query is artificial, as this value can 512 | // be queried directly from the user table. 513 | selectTop10Posters = `SELECT count(*) AS photos_count FROM photos GROUP BY userid ORDER BY photos_count DESC LIMIT 10;` 514 | // top10Photos scans all the photos ordered by commentcount, to find the most 515 | // commented photos. It does this directly on the photos table. 516 | selectTop10Photos = `SELECT commentcount FROM photos ORDER BY commentcount DESC LIMIT 10` 517 | // top10PhotoPostersNames finds the top photos, but joins this on the users table 518 | // to return the names of the users. 519 | selectTop10PhotoPostersNames = `SELECT users.name FROM photos JOIN users ON userid = userid ORDER BY photos.commentcount DESC LIMIT 10;` 520 | ) 521 | 522 | // analyticsQuery runs the selected 523 | func analyticsQuery(ctx context.Context, tx *sql.Tx, analyticsOpType int) error { 524 | var selectSQL string 525 | var outputTypeString bool 526 | switch analyticsOpType { 527 | case topCommentersAnalyticsOp: 528 | selectSQL = selectTop10Commenters 529 | case topPostersAnalyticsOp: 530 | selectSQL = selectTop10Posters 531 | case topPhotosAnalyticsOp: 532 | selectSQL = selectTop10Photos 533 | case top10PhotoPostersNamesAnalytcsOp: 534 | selectSQL = selectTop10PhotoPostersNames 535 | outputTypeString = true 536 | } 537 | 538 | rows, err := tx.QueryContext(ctx, selectSQL) 539 | switch { 540 | case err == sql.ErrNoRows: 541 | return nil 542 | case err != nil: 543 | return err 544 | } 545 | 546 | for rows.Next() { 547 | if err := rows.Err(); err != nil { 548 | return err 549 | } 550 | if outputTypeString { 551 | var user string 552 | if err := rows.Scan(&user); err != nil { 553 | return errors.Errorf("failed to scan string result set for query '%s': %s", 554 | selectSQL, err) 555 | } 556 | } else { 557 | var count int 558 | if err := rows.Scan(&count); err != nil { 559 | return errors.Errorf("failed to scan int result set for query '%s': %s", 560 | selectSQL, err) 561 | } 562 | } 563 | } 564 | 565 | return nil 566 | } 567 | -------------------------------------------------------------------------------- /photos/main.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. 14 | // 15 | // Author: Spencer Kimball (spencer.kimball@gmail.com) 16 | 17 | package main 18 | 19 | import ( 20 | "database/sql" 21 | "flag" 22 | "fmt" 23 | "log" 24 | "math/rand" 25 | "os" 26 | "os/signal" 27 | "reflect" 28 | "strconv" 29 | "strings" 30 | "syscall" 31 | "time" 32 | 33 | "github.com/spf13/cobra" 34 | "golang.org/x/net/context" 35 | 36 | "github.com/cockroachdb/cockroach/pkg/util/uuid" 37 | ) 38 | 39 | // pflagValue wraps flag.Value and implements the extra methods of the 40 | // pflag.Value interface. 41 | type pflagValue struct { 42 | flag.Value 43 | } 44 | 45 | func (v pflagValue) Type() string { 46 | t := reflect.TypeOf(v.Value).Elem() 47 | return t.Kind().String() 48 | } 49 | 50 | func (v pflagValue) IsBoolFlag() bool { 51 | t := reflect.TypeOf(v.Value).Elem() 52 | return t.Kind() == reflect.Bool 53 | } 54 | 55 | func normalizeStdFlagName(s string) string { 56 | return strings.Replace(s, "_", "-", -1) 57 | } 58 | 59 | var usage = map[string]string{ 60 | "db": "URL to the CockroachDB cluster", 61 | "users": "number of concurrent simulated users", 62 | "benchmark-name": "name of benchmark to report for Go benchmark results", 63 | "analytics": "true/false to indicate if analytics queries should be occasionally run (default false)", 64 | "analytics-wait-seconds": "the wait time between successive analytics queries in seconds. Note that this is measured from the end of the last query to the start of the next query. (default 30s)", 65 | } 66 | 67 | // A Config holds configuration data. 68 | type Config struct { 69 | // DBUrl is the URL to the database server. 70 | DBUrl string 71 | // NumUsers is the number of concurrent users generating load. 72 | NumUsers int 73 | // 74 | DB *sql.DB 75 | // Name of benchmark to use in benchmark results outputted upon process 76 | // termination. Used for analyzing performance over time. 77 | BenchmarkName string 78 | // AnalyticsQueries controls if analytics queries are periodically run. 79 | // Used for testing DistSQL code paths. 80 | AnalyticsQueries bool 81 | // AnalyticsQueriesWaitSeconds is the wait time between successive 82 | // analytics queries in seconds. Note that this is measured from the end 83 | // of the last query to the start of the next query. 84 | AnalyticsQueriesWaitSeconds int 85 | } 86 | 87 | var cfg = Config{ 88 | DBUrl: "postgresql://root@localhost:26257/photos?sslmode=disable", 89 | NumUsers: 1, 90 | BenchmarkName: "BenchmarkPhotos", 91 | AnalyticsQueries: false, 92 | AnalyticsQueriesWaitSeconds: 30, 93 | } 94 | 95 | var loadCmd = &cobra.Command{ 96 | Use: "photos", 97 | Short: "generate artifical load using a simple three-table schema with indexes", 98 | Long: ` 99 | Create artificial load using a simple database schema containing 100 | users, photos and comments. Users have photos, photos have comments. 101 | Users can author comments on any photos. User actions are simulated 102 | using an exponential distribution on user IDs, so lower IDs see 103 | more activity than high ones. 104 | `, 105 | Example: ` photos --db=postgresql://root@localhost:26257/photos?sslmode=disable`, 106 | RunE: runLoad, 107 | } 108 | 109 | func runLoad(c *cobra.Command, args []string) error { 110 | ctx, cancel := context.WithCancel(context.Background()) 111 | defer cancel() 112 | 113 | log.Printf("generating load for %d concurrent users...", cfg.NumUsers) 114 | db, err := openDB(cfg) 115 | if err != nil { 116 | log.Fatal(err) 117 | } 118 | defer func() { _ = db.Close() }() 119 | if err := initSchema(ctx, db); err != nil { 120 | log.Fatal(err) 121 | } 122 | cfg.DB = db 123 | 124 | signalCh := make(chan os.Signal, 1) 125 | signal.Notify(signalCh, os.Interrupt, os.Kill) 126 | signal.Notify(signalCh, syscall.SIGTERM) 127 | 128 | go func() { 129 | <-signalCh 130 | cancel() 131 | }() 132 | 133 | errChan := make(chan error, 1+cfg.NumUsers) 134 | go func() { 135 | errChan <- startStats(ctx) 136 | }() 137 | for i := 0; i < cfg.NumUsers; i++ { 138 | go func() { 139 | errChan <- startUser(ctx, cfg) 140 | }() 141 | } 142 | if cfg.AnalyticsQueries { 143 | go func() { 144 | errChan <- startAnalytics(ctx, cfg) 145 | }() 146 | } 147 | 148 | for i := 0; i < 1+cfg.NumUsers; i++ { 149 | if err := <-errChan; err != ctx.Err() { 150 | return err 151 | } 152 | } 153 | 154 | log.Println("load generation complete") 155 | 156 | // Output results that mimic Go's built-in benchmark format. 157 | stats.Lock() 158 | elapsed := time.Now().Sub(stats.start) 159 | fmt.Println("Go benchmark results:") 160 | fmt.Printf("%s\t%8d\t%12.1f ns/op\n", 161 | cfg.BenchmarkName, stats.totalOps, float64(elapsed.Nanoseconds())/float64(stats.totalOps)) 162 | stats.Unlock() 163 | 164 | return nil 165 | } 166 | 167 | var dropCmd = &cobra.Command{ 168 | Use: "drop", 169 | Short: "drop the photos database", 170 | Long: ` 171 | Drop the photos database to start fresh. 172 | `, 173 | Example: ` photos drop --db=`, 174 | RunE: runDrop, 175 | } 176 | 177 | var splitCmd = &cobra.Command{ 178 | Use: "split", 179 | Short: "split the photos database", 180 | Long: ` 181 | Split all tables in the photos database to start fresh. 182 | `, 183 | Example: ` photos split --db= `, 184 | RunE: runSplit, 185 | } 186 | 187 | func runDrop(c *cobra.Command, args []string) error { 188 | ctx := context.Background() 189 | 190 | log.Printf("dropping photos database") 191 | db, err := openDB(cfg) 192 | if err != nil { 193 | log.Fatal(err) 194 | } 195 | defer func() { _ = db.Close() }() 196 | if err := dropDatabase(ctx, db); err != nil { 197 | log.Fatal(err) 198 | } 199 | return nil 200 | } 201 | 202 | func splitByUUID(db *sql.DB, numSplits int, tableName string, statementString string) { 203 | log.Printf("splitting table %q", tableName) 204 | for count := 0; count < numSplits; { 205 | if _, err := db.Exec(statementString, uuid.MakeV4().GetBytes()); err != nil { 206 | log.Printf("problem splitting: %v", err) 207 | } else { 208 | count++ 209 | } 210 | } 211 | } 212 | 213 | func runSplit(c *cobra.Command, args []string) error { 214 | ctx := context.Background() 215 | 216 | if len(args) != 1 { 217 | return fmt.Errorf("argument required: ") 218 | } 219 | n, err := strconv.ParseUint(args[0], 10, 32) 220 | if err != nil { 221 | return fmt.Errorf("unable to parse argument : %v", err) 222 | } 223 | numSplits := int(n) 224 | log.Printf("splitting photos database into %d chunks", numSplits) 225 | 226 | db, err := openDB(cfg) 227 | if err != nil { 228 | log.Fatal(err) 229 | } 230 | defer func() { _ = db.Close() }() 231 | 232 | if err := initSchema(ctx, db); err != nil { 233 | log.Fatal(err) 234 | } 235 | cfg.DB = db 236 | 237 | log.Printf(`splitting table "users"`) 238 | for count := 0; count < numSplits; { 239 | // Use the userID generation logic. 240 | userID := 1 + int(rand.ExpFloat64()/rate) 241 | if _, err := db.Exec(`ALTER TABLE users SPLIT AT VALUES ($1)`, userID); err != nil { 242 | log.Printf("problem splitting: %v", err) 243 | } else { 244 | count++ 245 | } 246 | } 247 | 248 | splitByUUID(db, numSplits, "photos", `ALTER TABLE photos SPLIT AT VALUES ($1)`) 249 | splitByUUID(db, numSplits, "comments", `ALTER TABLE comments SPLIT AT VALUES ($1, '2016-01-01', '')`) 250 | return nil 251 | } 252 | 253 | func init() { 254 | rand.Seed(time.Now().UnixNano()) 255 | loadCmd.AddCommand( 256 | dropCmd, 257 | splitCmd, 258 | ) 259 | // Map any flags registered in the standard "flag" package into the 260 | // top-level command. 261 | pf := loadCmd.PersistentFlags() 262 | flag.VisitAll(func(f *flag.Flag) { 263 | pf.Var(pflagValue{f.Value}, normalizeStdFlagName(f.Name), f.Usage) 264 | }) 265 | // Add persistent flags to the top-level command. 266 | loadCmd.PersistentFlags().IntVarP(&cfg.NumUsers, "users", "", cfg.NumUsers, usage["users"]) 267 | loadCmd.PersistentFlags().StringVarP(&cfg.DBUrl, "db", "", cfg.DBUrl, usage["db"]) 268 | loadCmd.PersistentFlags().StringVarP(&cfg.BenchmarkName, "benchmark-name", "", cfg.BenchmarkName, 269 | usage["benchmark-name"]) 270 | loadCmd.PersistentFlags().BoolVarP(&cfg.AnalyticsQueries, "analytics", "", cfg.AnalyticsQueries, usage["analytics"]) 271 | loadCmd.PersistentFlags().IntVarP(&cfg.AnalyticsQueriesWaitSeconds, "analytics-wait-seconds", "", cfg.AnalyticsQueriesWaitSeconds, usage["analytics-wait-seconds"]) 272 | } 273 | 274 | // Run ... 275 | func Run(args []string) error { 276 | loadCmd.SetArgs(args) 277 | return loadCmd.Execute() 278 | } 279 | 280 | func main() { 281 | if err := Run(os.Args[1:]); err != nil { 282 | fmt.Fprintf(os.Stderr, "failed running command %q: %v\n", os.Args[1:], err) 283 | os.Exit(1) 284 | } 285 | } 286 | -------------------------------------------------------------------------------- /photos/user.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. 14 | // 15 | // Author: Spencer Kimball (spencer.kimball@gmail.com) 16 | 17 | package main 18 | 19 | import ( 20 | "database/sql" 21 | "encoding/binary" 22 | "hash/fnv" 23 | "log" 24 | "math" 25 | "math/rand" 26 | "sync" 27 | "time" 28 | 29 | "github.com/cockroachdb/cockroach-go/crdb" 30 | "github.com/codahale/hdrhistogram" 31 | "github.com/pkg/errors" 32 | "golang.org/x/net/context" 33 | ) 34 | 35 | const ( 36 | meanUserID = 1 << 15 37 | rate = 1.0 / (meanUserID * 2) 38 | statsInterval = 10 * time.Second 39 | ) 40 | 41 | const ( 42 | createUserOp = iota 43 | createPhotoOp 44 | createCommentOp 45 | listPhotosOp 46 | listCommentsOp 47 | updatePhotoOp 48 | updateCommentOp 49 | deleteCommentOp 50 | deletePhotoOp 51 | listMostCommentedPhotosOp 52 | listCommentsAlphabeticallyOp 53 | ) 54 | 55 | const ( 56 | topCommentersAnalyticsOp = iota 57 | topPostersAnalyticsOp 58 | topPhotosAnalyticsOp 59 | top10PhotoPostersNamesAnalytcsOp 60 | numAnalyticsOps 61 | ) 62 | 63 | type opDesc struct { 64 | typ int 65 | name string 66 | relFreq float64 67 | normFreq float64 68 | } 69 | 70 | // Note that tests care about the order here: running each command 71 | // once in this order is expected to succeed (so users must be created 72 | // before photos which must be created before comments, with deletion in 73 | // the reverse order). 74 | var ops = []*opDesc{ 75 | {createUserOp, "create user", 1, 0}, 76 | {createPhotoOp, "create photo", 10, 0}, 77 | {createCommentOp, "create comment", 50, 0}, 78 | {listPhotosOp, "list photos", 20, 0}, 79 | {listCommentsOp, "list comments", 20, 0}, 80 | {updatePhotoOp, "update photo", 2.5, 0}, 81 | {updateCommentOp, "update comment", 5, 0}, 82 | {listMostCommentedPhotosOp, "list most commented on photos", 25, 0}, 83 | {listCommentsAlphabeticallyOp, "list comments alphabetically", 15, 0}, 84 | {deleteCommentOp, "delete comment", 2.5, 0}, 85 | {deletePhotoOp, "delete photo", 1.25, 0}, 86 | } 87 | 88 | var stats struct { 89 | sync.Mutex 90 | start time.Time 91 | computing bool 92 | totalOps int 93 | noUserOps int 94 | noPhotoOps int 95 | noAnalyticsOps int 96 | failedOps int 97 | hist *hdrhistogram.Histogram 98 | opCounts map[int]int 99 | analyticsOpCounts map[int]int 100 | } 101 | 102 | func init() { 103 | stats.hist = hdrhistogram.New(0, 0x7fffffff, 1) 104 | stats.start = time.Now() 105 | stats.opCounts = map[int]int{} 106 | stats.analyticsOpCounts = map[int]int{} 107 | 108 | // Compute the total of all op relative frequencies. 109 | var relFreqTotal float64 110 | for _, op := range ops { 111 | relFreqTotal += op.relFreq 112 | } 113 | // Normalize frequencies. 114 | var normFreqTotal float64 115 | for _, op := range ops { 116 | normFreq := op.relFreq / relFreqTotal 117 | op.normFreq = normFreqTotal + normFreq 118 | normFreqTotal += normFreq 119 | } 120 | } 121 | 122 | // randomOp chooses a random operation from the ops slice. 123 | func randomOp() *opDesc { 124 | r := rand.Float64() 125 | for _, op := range ops { 126 | if r < op.normFreq { 127 | return op 128 | } 129 | } 130 | return ops[len(ops)-1] 131 | } 132 | 133 | func randomAnalyticsOp() int { 134 | return rand.Intn(numAnalyticsOps) 135 | } 136 | 137 | func startStats(ctx context.Context) error { 138 | var lastOps int 139 | ticker := time.NewTicker(statsInterval) 140 | for { 141 | select { 142 | case <-ticker.C: 143 | stats.Lock() 144 | opsPerSec := float64(stats.totalOps-lastOps) / float64(statsInterval/1E9) 145 | log.Printf("%d ops, %d no-user, %d no-photo, %d analytics, %d errs (%.2f/s)", 146 | stats.totalOps, stats.noUserOps, stats.noPhotoOps, 147 | stats.noAnalyticsOps, stats.failedOps, opsPerSec, 148 | ) 149 | lastOps = stats.totalOps 150 | stats.Unlock() 151 | case <-ctx.Done(): 152 | stats.Lock() 153 | if !stats.computing { 154 | stats.computing = true 155 | //showHistogram() 156 | } 157 | stats.Unlock() 158 | return ctx.Err() 159 | } 160 | } 161 | } 162 | 163 | // startAnalytics simulates periodic analytics queries until the context 164 | // indicates it's time to exit. 165 | func startAnalytics(ctx context.Context, cfg Config) error { 166 | for { 167 | opType := randomAnalyticsOp() 168 | if err := ctx.Err(); err != nil { 169 | return err 170 | } 171 | err := runAnalyticsOp(ctx, cfg, opType) 172 | stats.Lock() 173 | stats.totalOps++ 174 | stats.noAnalyticsOps++ 175 | stats.analyticsOpCounts[opType]++ 176 | if err != nil { 177 | stats.failedOps++ 178 | log.Printf("failed to run analytics op: %d: %s", opType, err) 179 | } 180 | stats.Unlock() 181 | 182 | time.Sleep(time.Second * time.Duration(cfg.AnalyticsQueriesWaitSeconds)) 183 | } 184 | } 185 | 186 | // startUser simulates a stream of user events until the context indicates 187 | // it's time to exit. 188 | func startUser(ctx context.Context, cfg Config) error { 189 | h := fnv.New32() 190 | var buf [8]byte 191 | 192 | randomUser := func() int { 193 | // Use an exponential distribution to skew the user ID generation, but 194 | // hash the randomly generated value so that the "hot" users are spread 195 | // throughout the user ID key space (and thus not all on 1 range). 196 | binary.BigEndian.PutUint64(buf[:8], math.Float64bits(rand.ExpFloat64()/rate)) 197 | h.Reset() 198 | h.Write(buf[:8]) 199 | return int(h.Sum32()) 200 | } 201 | 202 | for { 203 | userID := randomUser() 204 | op := randomOp() 205 | 206 | if err := ctx.Err(); err != nil { 207 | return err 208 | } 209 | err := runUserOp(ctx, cfg, userID, op.typ) 210 | stats.Lock() 211 | _ = stats.hist.RecordValue(int64(userID)) 212 | stats.totalOps++ 213 | stats.opCounts[op.typ]++ 214 | switch { 215 | case err == errNoUser: 216 | stats.noUserOps++ 217 | case err == errNoPhoto: 218 | stats.noPhotoOps++ 219 | case err != nil: 220 | stats.failedOps++ 221 | log.Printf("failed to run %s op for %d: %s", op.name, userID, err) 222 | } 223 | stats.Unlock() 224 | } 225 | } 226 | 227 | // runUserOp starts a transaction and creates the user if it doesn't 228 | // yet exist. 229 | func runUserOp(ctx context.Context, cfg Config, userID, opType int) error { 230 | return crdb.ExecuteTx(ctx, cfg.DB, nil /* txopts */, func(tx *sql.Tx) error { 231 | switch opType { 232 | case createUserOp: 233 | return createUser(ctx, tx, userID) 234 | case createPhotoOp: 235 | return createPhoto(ctx, tx, userID) 236 | case createCommentOp: 237 | return createComment(ctx, tx, userID) 238 | case listPhotosOp: 239 | return listPhotos(ctx, tx, userID, nil) 240 | case listCommentsOp: 241 | _, err := listComments(ctx, tx, userID, nil) 242 | return err 243 | case updatePhotoOp: 244 | return updatePhoto(ctx, tx, userID) 245 | case updateCommentOp: 246 | return updateComment(ctx, tx, userID) 247 | case deletePhotoOp: 248 | return deletePhoto(ctx, tx, userID) 249 | case deleteCommentOp: 250 | return deleteComment(ctx, tx, userID) 251 | case listMostCommentedPhotosOp: 252 | return listMostCommentedPhotos(ctx, tx, userID) 253 | case listCommentsAlphabeticallyOp: 254 | return listCommentsAlphabetically(ctx, tx, userID) 255 | default: 256 | return errors.Errorf("unsupported op type: %d", opType) 257 | } 258 | }) 259 | } 260 | 261 | func runAnalyticsOp(ctx context.Context, cfg Config, analyticsOpType int) error { 262 | return crdb.ExecuteTx(ctx, cfg.DB, nil /* txopts */, func(tx *sql.Tx) error { 263 | return analyticsQuery(ctx, tx, analyticsOpType) 264 | }) 265 | } 266 | 267 | func showHistogram() { 268 | log.Printf("**** histogram of user op counts (minUserID=%d, maxUserID=%d, userCount=%d)", 269 | stats.hist.Min(), stats.hist.Max(), stats.hist.TotalCount()) 270 | for _, b := range stats.hist.Distribution() { 271 | log.Printf("** users %d-%d (%d)", b.From, b.To, b.Count) 272 | } 273 | } 274 | -------------------------------------------------------------------------------- /photos/user_test.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Cockroach Authors. 2 | // 3 | // Licensed under the Apache License, Version 2.0 (the "License"); 4 | // you may not use this file except in compliance with the License. 5 | // You may obtain a copy of the License at 6 | // 7 | // http://www.apache.org/licenses/LICENSE-2.0 8 | // 9 | // Unless required by applicable law or agreed to in writing, software 10 | // distributed under the License is distributed on an "AS IS" BASIS, 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 12 | // implied. See the License for the specific language governing 13 | // permissions and limitations under the License. See the AUTHORS file 14 | // for names of contributors. 15 | // 16 | // Author: Ben Darnell 17 | 18 | package main 19 | 20 | import ( 21 | "context" 22 | "testing" 23 | 24 | "github.com/cockroachdb/cockroach-go/testserver" 25 | ) 26 | 27 | // TestAllOps runs every operation once and ensures that they complete 28 | // without error. 29 | func TestAllOps(t *testing.T) { 30 | db, stop := testserver.NewDBForTestWithDatabase(t, "photos") 31 | defer stop() 32 | 33 | ctx := context.Background() 34 | 35 | if err := initSchema(ctx, db); err != nil { 36 | t.Fatal(err) 37 | } 38 | cfg := Config{ 39 | DB: db, 40 | NumUsers: 1, 41 | } 42 | 43 | for _, op := range ops { 44 | t.Logf("running %s", op.name) 45 | if err := runUserOp(ctx, cfg, 1, op.typ); err != nil { 46 | t.Error(err) 47 | } 48 | } 49 | } 50 | 51 | func TestCommentWithoutPhotos(t *testing.T) { 52 | db, stop := testserver.NewDBForTestWithDatabase(t, "photos") 53 | defer stop() 54 | 55 | ctx := context.Background() 56 | 57 | if err := initSchema(ctx, db); err != nil { 58 | t.Fatal(err) 59 | } 60 | cfg := Config{ 61 | DB: db, 62 | NumUsers: 1, 63 | } 64 | 65 | if err := runUserOp(ctx, cfg, 1, createUserOp); err != nil { 66 | t.Error(err) 67 | } 68 | 69 | if err := runUserOp(ctx, cfg, 1, createCommentOp); err == nil { 70 | t.Error("unexpected success creating comment with no photos") 71 | } else if err != errNoPhoto { 72 | t.Errorf("expected errNoPhoto, got %s", err) 73 | } 74 | } 75 | -------------------------------------------------------------------------------- /teamcity-push.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | set -euxo pipefail 4 | 5 | VERSION=$(git describe || git rev-parse --short HEAD) 6 | 7 | # Don't do this in Docker to avoid creating root-owned directories in GOPATH. 8 | make deps 9 | 10 | echo "Deploying ${VERSION}..." 11 | aws configure set region us-east-1 12 | 13 | BUCKET_NAME=cockroach 14 | LATEST_SUFFIX=.LATEST 15 | REPO_NAME=examples-go 16 | SHA=$(git rev-parse HEAD) 17 | 18 | # push_one_binary takes the path to the binary inside the repo. 19 | # eg: push_one_binary sql/sql.test 20 | # The file will be pushed to: s3://BUCKET_NAME/REPO_NAME/sql.test.SHA 21 | # The binary's sha will be stored in s3://BUCKET_NAME/REPO_NAME/sql.test.LATEST 22 | # The .LATEST file will also redirect to the latest binary when fetching through 23 | # the S3 static-website. 24 | function push_one_binary { 25 | rel_path=$1 26 | binary_name=$(basename "$1") 27 | 28 | time aws s3 cp "${rel_path}" s3://${BUCKET_NAME}/${REPO_NAME}/"${binary_name}"."${SHA}" 29 | 30 | # Upload LATEST file. 31 | tmpfile=$(mktemp /tmp/cockroach-push.XXXXXX) 32 | echo "${SHA}" > "${tmpfile}" 33 | time aws s3 cp --website-redirect /${REPO_NAME}/"${binary_name}"."${SHA}" "${tmpfile}" s3://${BUCKET_NAME}/${REPO_NAME}/"${binary_name}"${LATEST_SUFFIX} 34 | rm -f "${tmpfile}" 35 | } 36 | 37 | for proj in bank ledger block_writer fakerealtime filesystem photos; do 38 | docker run \ 39 | --workdir=/go/src/github.com/cockroachdb/examples-go \ 40 | --volume="${GOPATH%%:*}/src":/go/src \ 41 | --volume="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)":/go/src/github.com/cockroachdb/examples-go \ 42 | --rm \ 43 | cockroachdb/builder:20170422-212842 make ${proj} STATIC=1 44 | push_one_binary ${proj}/${proj} 45 | done 46 | -------------------------------------------------------------------------------- /teamcity-test.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | set -euxo pipefail 4 | 5 | # Don't do this in Docker to avoid creating root-owned directories in GOPATH. 6 | make deps 7 | 8 | docker run \ 9 | --workdir=/go/src/github.com/cockroachdb/examples-go \ 10 | --volume="${GOPATH%%:*}/src":/go/src \ 11 | --volume="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)":/go/src/github.com/cockroachdb/examples-go \ 12 | --rm \ 13 | cockroachdb/builder:20170422-212842 make test | go-test-teamcity 14 | --------------------------------------------------------------------------------