├── .gitignore ├── AUTHORS ├── CONTRIBUTORS ├── LICENSE ├── README ├── cmd └── snappytool │ └── main.go ├── decode.go ├── decode_amd64.s ├── decode_arm64.s ├── decode_asm.go ├── decode_other.go ├── encode.go ├── encode_amd64.s ├── encode_arm64.s ├── encode_asm.go ├── encode_other.go ├── go.mod ├── golden_test.go ├── misc └── main.cpp ├── snappy.go ├── snappy_test.go └── testdata ├── Isaac.Newton-Opticks.txt └── Isaac.Newton-Opticks.txt.rawsnappy /.gitignore: -------------------------------------------------------------------------------- 1 | cmd/snappytool/snappytool 2 | testdata/bench 3 | 4 | # These explicitly listed benchmark data files are for an obsolete version of 5 | # snappy_test.go. 6 | testdata/alice29.txt 7 | testdata/asyoulik.txt 8 | testdata/fireworks.jpeg 9 | testdata/geo.protodata 10 | testdata/html 11 | testdata/html_x_4 12 | testdata/kppkn.gtb 13 | testdata/lcet10.txt 14 | testdata/paper-100k.pdf 15 | testdata/plrabn12.txt 16 | testdata/urls.10K 17 | -------------------------------------------------------------------------------- /AUTHORS: -------------------------------------------------------------------------------- 1 | # This is the official list of Snappy-Go authors for copyright purposes. 2 | # This file is distinct from the CONTRIBUTORS files. 3 | # See the latter for an explanation. 4 | 5 | # Names should be added to this file as 6 | # Name or Organization 7 | # The email address is not required for organizations. 8 | 9 | # Please keep the list sorted. 10 | 11 | Amazon.com, Inc 12 | Damian Gryski 13 | Eric Buth 14 | Google Inc. 15 | Jan Mercl <0xjnml@gmail.com> 16 | Klaus Post 17 | Rodolfo Carvalho 18 | Sebastien Binet 19 | -------------------------------------------------------------------------------- /CONTRIBUTORS: -------------------------------------------------------------------------------- 1 | # This is the official list of people who can contribute 2 | # (and typically have contributed) code to the Snappy-Go repository. 3 | # The AUTHORS file lists the copyright holders; this file 4 | # lists people. For example, Google employees are listed here 5 | # but not in AUTHORS, because Google holds the copyright. 6 | # 7 | # The submission process automatically checks to make sure 8 | # that people submitting code are listed in this file (by email address). 9 | # 10 | # Names should be added to this file only after verifying that 11 | # the individual or the individual's organization has agreed to 12 | # the appropriate Contributor License Agreement, found here: 13 | # 14 | # http://code.google.com/legal/individual-cla-v1.0.html 15 | # http://code.google.com/legal/corporate-cla-v1.0.html 16 | # 17 | # The agreement for individuals can be filled out on the web. 18 | # 19 | # When adding J Random Contributor's name to this file, 20 | # either J's name or J's organization's name should be 21 | # added to the AUTHORS file, depending on whether the 22 | # individual or corporate CLA was used. 23 | 24 | # Names should be added to this file like so: 25 | # Name 26 | 27 | # Please keep the list sorted. 28 | 29 | Alex Legg 30 | Damian Gryski 31 | Eric Buth 32 | Jan Mercl <0xjnml@gmail.com> 33 | Jonathan Swinney 34 | Kai Backman 35 | Klaus Post 36 | Marc-Antoine Ruel 37 | Nigel Tao 38 | Rob Pike 39 | Rodolfo Carvalho 40 | Russ Cox 41 | Sebastien Binet 42 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2011 The Snappy-Go Authors. All rights reserved. 2 | 3 | Redistribution and use in source and binary forms, with or without 4 | modification, are permitted provided that the following conditions are 5 | met: 6 | 7 | * Redistributions of source code must retain the above copyright 8 | notice, this list of conditions and the following disclaimer. 9 | * Redistributions in binary form must reproduce the above 10 | copyright notice, this list of conditions and the following disclaimer 11 | in the documentation and/or other materials provided with the 12 | distribution. 13 | * Neither the name of Google Inc. nor the names of its 14 | contributors may be used to endorse or promote products derived from 15 | this software without specific prior written permission. 16 | 17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 18 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 19 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 20 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 21 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 22 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 23 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 24 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 25 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 26 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | The Snappy compression format in the Go programming language. 2 | 3 | To use as a library: 4 | $ go get github.com/golang/snappy 5 | 6 | To use as a binary: 7 | $ go install github.com/golang/snappy/cmd/snappytool@latest 8 | $ cat decoded | ~/go/bin/snappytool -e > encoded 9 | $ cat encoded | ~/go/bin/snappytool -d > decoded 10 | 11 | Unless otherwise noted, the Snappy-Go source files are distributed 12 | under the BSD-style license found in the LICENSE file. 13 | 14 | 15 | 16 | Benchmarks. 17 | 18 | The golang/snappy benchmarks include compressing (Z) and decompressing (U) ten 19 | or so files, the same set used by the C++ Snappy code (github.com/google/snappy 20 | and note the "google", not "golang"). On an "Intel(R) Core(TM) i7-3770 CPU @ 21 | 3.40GHz", Go's GOARCH=amd64 numbers as of 2016-05-29: 22 | 23 | "go test -test.bench=." 24 | 25 | _UFlat0-8 2.19GB/s ± 0% html 26 | _UFlat1-8 1.41GB/s ± 0% urls 27 | _UFlat2-8 23.5GB/s ± 2% jpg 28 | _UFlat3-8 1.91GB/s ± 0% jpg_200 29 | _UFlat4-8 14.0GB/s ± 1% pdf 30 | _UFlat5-8 1.97GB/s ± 0% html4 31 | _UFlat6-8 814MB/s ± 0% txt1 32 | _UFlat7-8 785MB/s ± 0% txt2 33 | _UFlat8-8 857MB/s ± 0% txt3 34 | _UFlat9-8 719MB/s ± 1% txt4 35 | _UFlat10-8 2.84GB/s ± 0% pb 36 | _UFlat11-8 1.05GB/s ± 0% gaviota 37 | 38 | _ZFlat0-8 1.04GB/s ± 0% html 39 | _ZFlat1-8 534MB/s ± 0% urls 40 | _ZFlat2-8 15.7GB/s ± 1% jpg 41 | _ZFlat3-8 740MB/s ± 3% jpg_200 42 | _ZFlat4-8 9.20GB/s ± 1% pdf 43 | _ZFlat5-8 991MB/s ± 0% html4 44 | _ZFlat6-8 379MB/s ± 0% txt1 45 | _ZFlat7-8 352MB/s ± 0% txt2 46 | _ZFlat8-8 396MB/s ± 1% txt3 47 | _ZFlat9-8 327MB/s ± 1% txt4 48 | _ZFlat10-8 1.33GB/s ± 1% pb 49 | _ZFlat11-8 605MB/s ± 1% gaviota 50 | 51 | 52 | 53 | "go test -test.bench=. -tags=noasm" 54 | 55 | _UFlat0-8 621MB/s ± 2% html 56 | _UFlat1-8 494MB/s ± 1% urls 57 | _UFlat2-8 23.2GB/s ± 1% jpg 58 | _UFlat3-8 1.12GB/s ± 1% jpg_200 59 | _UFlat4-8 4.35GB/s ± 1% pdf 60 | _UFlat5-8 609MB/s ± 0% html4 61 | _UFlat6-8 296MB/s ± 0% txt1 62 | _UFlat7-8 288MB/s ± 0% txt2 63 | _UFlat8-8 309MB/s ± 1% txt3 64 | _UFlat9-8 280MB/s ± 1% txt4 65 | _UFlat10-8 753MB/s ± 0% pb 66 | _UFlat11-8 400MB/s ± 0% gaviota 67 | 68 | _ZFlat0-8 409MB/s ± 1% html 69 | _ZFlat1-8 250MB/s ± 1% urls 70 | _ZFlat2-8 12.3GB/s ± 1% jpg 71 | _ZFlat3-8 132MB/s ± 0% jpg_200 72 | _ZFlat4-8 2.92GB/s ± 0% pdf 73 | _ZFlat5-8 405MB/s ± 1% html4 74 | _ZFlat6-8 179MB/s ± 1% txt1 75 | _ZFlat7-8 170MB/s ± 1% txt2 76 | _ZFlat8-8 189MB/s ± 1% txt3 77 | _ZFlat9-8 164MB/s ± 1% txt4 78 | _ZFlat10-8 479MB/s ± 1% pb 79 | _ZFlat11-8 270MB/s ± 1% gaviota 80 | 81 | 82 | 83 | For comparison (Go's encoded output is byte-for-byte identical to C++'s), here 84 | are the numbers from C++ Snappy's 85 | 86 | make CXXFLAGS="-O2 -DNDEBUG -g" clean snappy_unittest.log && cat snappy_unittest.log 87 | 88 | BM_UFlat/0 2.4GB/s html 89 | BM_UFlat/1 1.4GB/s urls 90 | BM_UFlat/2 21.8GB/s jpg 91 | BM_UFlat/3 1.5GB/s jpg_200 92 | BM_UFlat/4 13.3GB/s pdf 93 | BM_UFlat/5 2.1GB/s html4 94 | BM_UFlat/6 1.0GB/s txt1 95 | BM_UFlat/7 959.4MB/s txt2 96 | BM_UFlat/8 1.0GB/s txt3 97 | BM_UFlat/9 864.5MB/s txt4 98 | BM_UFlat/10 2.9GB/s pb 99 | BM_UFlat/11 1.2GB/s gaviota 100 | 101 | BM_ZFlat/0 944.3MB/s html (22.31 %) 102 | BM_ZFlat/1 501.6MB/s urls (47.78 %) 103 | BM_ZFlat/2 14.3GB/s jpg (99.95 %) 104 | BM_ZFlat/3 538.3MB/s jpg_200 (73.00 %) 105 | BM_ZFlat/4 8.3GB/s pdf (83.30 %) 106 | BM_ZFlat/5 903.5MB/s html4 (22.52 %) 107 | BM_ZFlat/6 336.0MB/s txt1 (57.88 %) 108 | BM_ZFlat/7 312.3MB/s txt2 (61.91 %) 109 | BM_ZFlat/8 353.1MB/s txt3 (54.99 %) 110 | BM_ZFlat/9 289.9MB/s txt4 (66.26 %) 111 | BM_ZFlat/10 1.2GB/s pb (19.68 %) 112 | BM_ZFlat/11 527.4MB/s gaviota (37.72 %) 113 | -------------------------------------------------------------------------------- /cmd/snappytool/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "errors" 5 | "flag" 6 | "io/ioutil" 7 | "os" 8 | 9 | "github.com/golang/snappy" 10 | ) 11 | 12 | var ( 13 | decode = flag.Bool("d", false, "decode") 14 | encode = flag.Bool("e", false, "encode") 15 | ) 16 | 17 | func run() error { 18 | flag.Parse() 19 | if *decode == *encode { 20 | return errors.New("exactly one of -d or -e must be given") 21 | } 22 | 23 | in, err := ioutil.ReadAll(os.Stdin) 24 | if err != nil { 25 | return err 26 | } 27 | 28 | out := []byte(nil) 29 | if *decode { 30 | out, err = snappy.Decode(nil, in) 31 | if err != nil { 32 | return err 33 | } 34 | } else { 35 | out = snappy.Encode(nil, in) 36 | } 37 | _, err = os.Stdout.Write(out) 38 | return err 39 | } 40 | 41 | func main() { 42 | if err := run(); err != nil { 43 | os.Stderr.WriteString(err.Error() + "\n") 44 | os.Exit(1) 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /decode.go: -------------------------------------------------------------------------------- 1 | // Copyright 2011 The Snappy-Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | package snappy 6 | 7 | import ( 8 | "encoding/binary" 9 | "errors" 10 | "io" 11 | ) 12 | 13 | var ( 14 | // ErrCorrupt reports that the input is invalid. 15 | ErrCorrupt = errors.New("snappy: corrupt input") 16 | // ErrTooLarge reports that the uncompressed length is too large. 17 | ErrTooLarge = errors.New("snappy: decoded block is too large") 18 | // ErrUnsupported reports that the input isn't supported. 19 | ErrUnsupported = errors.New("snappy: unsupported input") 20 | 21 | errUnsupportedLiteralLength = errors.New("snappy: unsupported literal length") 22 | ) 23 | 24 | // DecodedLen returns the length of the decoded block. 25 | func DecodedLen(src []byte) (int, error) { 26 | v, _, err := decodedLen(src) 27 | return v, err 28 | } 29 | 30 | // decodedLen returns the length of the decoded block and the number of bytes 31 | // that the length header occupied. 32 | func decodedLen(src []byte) (blockLen, headerLen int, err error) { 33 | v, n := binary.Uvarint(src) 34 | if n <= 0 || v > 0xffffffff { 35 | return 0, 0, ErrCorrupt 36 | } 37 | 38 | const wordSize = 32 << (^uint(0) >> 32 & 1) 39 | if wordSize == 32 && v > 0x7fffffff { 40 | return 0, 0, ErrTooLarge 41 | } 42 | return int(v), n, nil 43 | } 44 | 45 | const ( 46 | decodeErrCodeCorrupt = 1 47 | decodeErrCodeUnsupportedLiteralLength = 2 48 | ) 49 | 50 | // Decode returns the decoded form of src. The returned slice may be a sub- 51 | // slice of dst if dst was large enough to hold the entire decoded block. 52 | // Otherwise, a newly allocated slice will be returned. 53 | // 54 | // The dst and src must not overlap. It is valid to pass a nil dst. 55 | // 56 | // Decode handles the Snappy block format, not the Snappy stream format. 57 | func Decode(dst, src []byte) ([]byte, error) { 58 | dLen, s, err := decodedLen(src) 59 | if err != nil { 60 | return nil, err 61 | } 62 | if dLen <= len(dst) { 63 | dst = dst[:dLen] 64 | } else { 65 | dst = make([]byte, dLen) 66 | } 67 | switch decode(dst, src[s:]) { 68 | case 0: 69 | return dst, nil 70 | case decodeErrCodeUnsupportedLiteralLength: 71 | return nil, errUnsupportedLiteralLength 72 | } 73 | return nil, ErrCorrupt 74 | } 75 | 76 | // NewReader returns a new Reader that decompresses from r, using the framing 77 | // format described at 78 | // https://github.com/google/snappy/blob/master/framing_format.txt 79 | func NewReader(r io.Reader) *Reader { 80 | return &Reader{ 81 | r: r, 82 | decoded: make([]byte, maxBlockSize), 83 | buf: make([]byte, maxEncodedLenOfMaxBlockSize+checksumSize), 84 | } 85 | } 86 | 87 | // Reader is an io.Reader that can read Snappy-compressed bytes. 88 | // 89 | // Reader handles the Snappy stream format, not the Snappy block format. 90 | type Reader struct { 91 | r io.Reader 92 | err error 93 | decoded []byte 94 | buf []byte 95 | // decoded[i:j] contains decoded bytes that have not yet been passed on. 96 | i, j int 97 | readHeader bool 98 | } 99 | 100 | // Reset discards any buffered data, resets all state, and switches the Snappy 101 | // reader to read from r. This permits reusing a Reader rather than allocating 102 | // a new one. 103 | func (r *Reader) Reset(reader io.Reader) { 104 | r.r = reader 105 | r.err = nil 106 | r.i = 0 107 | r.j = 0 108 | r.readHeader = false 109 | } 110 | 111 | func (r *Reader) readFull(p []byte, allowEOF bool) (ok bool) { 112 | if _, r.err = io.ReadFull(r.r, p); r.err != nil { 113 | if r.err == io.ErrUnexpectedEOF || (r.err == io.EOF && !allowEOF) { 114 | r.err = ErrCorrupt 115 | } 116 | return false 117 | } 118 | return true 119 | } 120 | 121 | func (r *Reader) fill() error { 122 | for r.i >= r.j { 123 | if !r.readFull(r.buf[:4], true) { 124 | return r.err 125 | } 126 | chunkType := r.buf[0] 127 | if !r.readHeader { 128 | if chunkType != chunkTypeStreamIdentifier { 129 | r.err = ErrCorrupt 130 | return r.err 131 | } 132 | r.readHeader = true 133 | } 134 | chunkLen := int(r.buf[1]) | int(r.buf[2])<<8 | int(r.buf[3])<<16 135 | if chunkLen > len(r.buf) { 136 | r.err = ErrUnsupported 137 | return r.err 138 | } 139 | 140 | // The chunk types are specified at 141 | // https://github.com/google/snappy/blob/master/framing_format.txt 142 | switch chunkType { 143 | case chunkTypeCompressedData: 144 | // Section 4.2. Compressed data (chunk type 0x00). 145 | if chunkLen < checksumSize { 146 | r.err = ErrCorrupt 147 | return r.err 148 | } 149 | buf := r.buf[:chunkLen] 150 | if !r.readFull(buf, false) { 151 | return r.err 152 | } 153 | checksum := uint32(buf[0]) | uint32(buf[1])<<8 | uint32(buf[2])<<16 | uint32(buf[3])<<24 154 | buf = buf[checksumSize:] 155 | 156 | n, err := DecodedLen(buf) 157 | if err != nil { 158 | r.err = err 159 | return r.err 160 | } 161 | if n > len(r.decoded) { 162 | r.err = ErrCorrupt 163 | return r.err 164 | } 165 | if _, err := Decode(r.decoded, buf); err != nil { 166 | r.err = err 167 | return r.err 168 | } 169 | if crc(r.decoded[:n]) != checksum { 170 | r.err = ErrCorrupt 171 | return r.err 172 | } 173 | r.i, r.j = 0, n 174 | continue 175 | 176 | case chunkTypeUncompressedData: 177 | // Section 4.3. Uncompressed data (chunk type 0x01). 178 | if chunkLen < checksumSize { 179 | r.err = ErrCorrupt 180 | return r.err 181 | } 182 | buf := r.buf[:checksumSize] 183 | if !r.readFull(buf, false) { 184 | return r.err 185 | } 186 | checksum := uint32(buf[0]) | uint32(buf[1])<<8 | uint32(buf[2])<<16 | uint32(buf[3])<<24 187 | // Read directly into r.decoded instead of via r.buf. 188 | n := chunkLen - checksumSize 189 | if n > len(r.decoded) { 190 | r.err = ErrCorrupt 191 | return r.err 192 | } 193 | if !r.readFull(r.decoded[:n], false) { 194 | return r.err 195 | } 196 | if crc(r.decoded[:n]) != checksum { 197 | r.err = ErrCorrupt 198 | return r.err 199 | } 200 | r.i, r.j = 0, n 201 | continue 202 | 203 | case chunkTypeStreamIdentifier: 204 | // Section 4.1. Stream identifier (chunk type 0xff). 205 | if chunkLen != len(magicBody) { 206 | r.err = ErrCorrupt 207 | return r.err 208 | } 209 | if !r.readFull(r.buf[:len(magicBody)], false) { 210 | return r.err 211 | } 212 | for i := 0; i < len(magicBody); i++ { 213 | if r.buf[i] != magicBody[i] { 214 | r.err = ErrCorrupt 215 | return r.err 216 | } 217 | } 218 | continue 219 | } 220 | 221 | if chunkType <= 0x7f { 222 | // Section 4.5. Reserved unskippable chunks (chunk types 0x02-0x7f). 223 | r.err = ErrUnsupported 224 | return r.err 225 | } 226 | // Section 4.4 Padding (chunk type 0xfe). 227 | // Section 4.6. Reserved skippable chunks (chunk types 0x80-0xfd). 228 | if !r.readFull(r.buf[:chunkLen], false) { 229 | return r.err 230 | } 231 | } 232 | 233 | return nil 234 | } 235 | 236 | // Read satisfies the io.Reader interface. 237 | func (r *Reader) Read(p []byte) (int, error) { 238 | if r.err != nil { 239 | return 0, r.err 240 | } 241 | 242 | if err := r.fill(); err != nil { 243 | return 0, err 244 | } 245 | 246 | n := copy(p, r.decoded[r.i:r.j]) 247 | r.i += n 248 | return n, nil 249 | } 250 | 251 | // ReadByte satisfies the io.ByteReader interface. 252 | func (r *Reader) ReadByte() (byte, error) { 253 | if r.err != nil { 254 | return 0, r.err 255 | } 256 | 257 | if err := r.fill(); err != nil { 258 | return 0, err 259 | } 260 | 261 | c := r.decoded[r.i] 262 | r.i++ 263 | return c, nil 264 | } 265 | -------------------------------------------------------------------------------- /decode_amd64.s: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // +build !appengine 6 | // +build gc 7 | // +build !noasm 8 | 9 | #include "textflag.h" 10 | 11 | // The asm code generally follows the pure Go code in decode_other.go, except 12 | // where marked with a "!!!". 13 | 14 | // func decode(dst, src []byte) int 15 | // 16 | // All local variables fit into registers. The non-zero stack size is only to 17 | // spill registers and push args when issuing a CALL. The register allocation: 18 | // - AX scratch 19 | // - BX scratch 20 | // - CX length or x 21 | // - DX offset 22 | // - SI &src[s] 23 | // - DI &dst[d] 24 | // + R8 dst_base 25 | // + R9 dst_len 26 | // + R10 dst_base + dst_len 27 | // + R11 src_base 28 | // + R12 src_len 29 | // + R13 src_base + src_len 30 | // - R14 used by doCopy 31 | // - R15 used by doCopy 32 | // 33 | // The registers R8-R13 (marked with a "+") are set at the start of the 34 | // function, and after a CALL returns, and are not otherwise modified. 35 | // 36 | // The d variable is implicitly DI - R8, and len(dst)-d is R10 - DI. 37 | // The s variable is implicitly SI - R11, and len(src)-s is R13 - SI. 38 | TEXT ·decode(SB), NOSPLIT, $48-56 39 | // Initialize SI, DI and R8-R13. 40 | MOVQ dst_base+0(FP), R8 41 | MOVQ dst_len+8(FP), R9 42 | MOVQ R8, DI 43 | MOVQ R8, R10 44 | ADDQ R9, R10 45 | MOVQ src_base+24(FP), R11 46 | MOVQ src_len+32(FP), R12 47 | MOVQ R11, SI 48 | MOVQ R11, R13 49 | ADDQ R12, R13 50 | 51 | loop: 52 | // for s < len(src) 53 | CMPQ SI, R13 54 | JEQ end 55 | 56 | // CX = uint32(src[s]) 57 | // 58 | // switch src[s] & 0x03 59 | MOVBLZX (SI), CX 60 | MOVL CX, BX 61 | ANDL $3, BX 62 | CMPL BX, $1 63 | JAE tagCopy 64 | 65 | // ---------------------------------------- 66 | // The code below handles literal tags. 67 | 68 | // case tagLiteral: 69 | // x := uint32(src[s] >> 2) 70 | // switch 71 | SHRL $2, CX 72 | CMPL CX, $60 73 | JAE tagLit60Plus 74 | 75 | // case x < 60: 76 | // s++ 77 | INCQ SI 78 | 79 | doLit: 80 | // This is the end of the inner "switch", when we have a literal tag. 81 | // 82 | // We assume that CX == x and x fits in a uint32, where x is the variable 83 | // used in the pure Go decode_other.go code. 84 | 85 | // length = int(x) + 1 86 | // 87 | // Unlike the pure Go code, we don't need to check if length <= 0 because 88 | // CX can hold 64 bits, so the increment cannot overflow. 89 | INCQ CX 90 | 91 | // Prepare to check if copying length bytes will run past the end of dst or 92 | // src. 93 | // 94 | // AX = len(dst) - d 95 | // BX = len(src) - s 96 | MOVQ R10, AX 97 | SUBQ DI, AX 98 | MOVQ R13, BX 99 | SUBQ SI, BX 100 | 101 | // !!! Try a faster technique for short (16 or fewer bytes) copies. 102 | // 103 | // if length > 16 || len(dst)-d < 16 || len(src)-s < 16 { 104 | // goto callMemmove // Fall back on calling runtime·memmove. 105 | // } 106 | // 107 | // The C++ snappy code calls this TryFastAppend. It also checks len(src)-s 108 | // against 21 instead of 16, because it cannot assume that all of its input 109 | // is contiguous in memory and so it needs to leave enough source bytes to 110 | // read the next tag without refilling buffers, but Go's Decode assumes 111 | // contiguousness (the src argument is a []byte). 112 | CMPQ CX, $16 113 | JGT callMemmove 114 | CMPQ AX, $16 115 | JLT callMemmove 116 | CMPQ BX, $16 117 | JLT callMemmove 118 | 119 | // !!! Implement the copy from src to dst as a 16-byte load and store. 120 | // (Decode's documentation says that dst and src must not overlap.) 121 | // 122 | // This always copies 16 bytes, instead of only length bytes, but that's 123 | // OK. If the input is a valid Snappy encoding then subsequent iterations 124 | // will fix up the overrun. Otherwise, Decode returns a nil []byte (and a 125 | // non-nil error), so the overrun will be ignored. 126 | // 127 | // Note that on amd64, it is legal and cheap to issue unaligned 8-byte or 128 | // 16-byte loads and stores. This technique probably wouldn't be as 129 | // effective on architectures that are fussier about alignment. 130 | MOVOU 0(SI), X0 131 | MOVOU X0, 0(DI) 132 | 133 | // d += length 134 | // s += length 135 | ADDQ CX, DI 136 | ADDQ CX, SI 137 | JMP loop 138 | 139 | callMemmove: 140 | // if length > len(dst)-d || length > len(src)-s { etc } 141 | CMPQ CX, AX 142 | JGT errCorrupt 143 | CMPQ CX, BX 144 | JGT errCorrupt 145 | 146 | // copy(dst[d:], src[s:s+length]) 147 | // 148 | // This means calling runtime·memmove(&dst[d], &src[s], length), so we push 149 | // DI, SI and CX as arguments. Coincidentally, we also need to spill those 150 | // three registers to the stack, to save local variables across the CALL. 151 | MOVQ DI, 0(SP) 152 | MOVQ SI, 8(SP) 153 | MOVQ CX, 16(SP) 154 | MOVQ DI, 24(SP) 155 | MOVQ SI, 32(SP) 156 | MOVQ CX, 40(SP) 157 | CALL runtime·memmove(SB) 158 | 159 | // Restore local variables: unspill registers from the stack and 160 | // re-calculate R8-R13. 161 | MOVQ 24(SP), DI 162 | MOVQ 32(SP), SI 163 | MOVQ 40(SP), CX 164 | MOVQ dst_base+0(FP), R8 165 | MOVQ dst_len+8(FP), R9 166 | MOVQ R8, R10 167 | ADDQ R9, R10 168 | MOVQ src_base+24(FP), R11 169 | MOVQ src_len+32(FP), R12 170 | MOVQ R11, R13 171 | ADDQ R12, R13 172 | 173 | // d += length 174 | // s += length 175 | ADDQ CX, DI 176 | ADDQ CX, SI 177 | JMP loop 178 | 179 | tagLit60Plus: 180 | // !!! This fragment does the 181 | // 182 | // s += x - 58; if uint(s) > uint(len(src)) { etc } 183 | // 184 | // checks. In the asm version, we code it once instead of once per switch case. 185 | ADDQ CX, SI 186 | SUBQ $58, SI 187 | MOVQ SI, BX 188 | SUBQ R11, BX 189 | CMPQ BX, R12 190 | JA errCorrupt 191 | 192 | // case x == 60: 193 | CMPL CX, $61 194 | JEQ tagLit61 195 | JA tagLit62Plus 196 | 197 | // x = uint32(src[s-1]) 198 | MOVBLZX -1(SI), CX 199 | JMP doLit 200 | 201 | tagLit61: 202 | // case x == 61: 203 | // x = uint32(src[s-2]) | uint32(src[s-1])<<8 204 | MOVWLZX -2(SI), CX 205 | JMP doLit 206 | 207 | tagLit62Plus: 208 | CMPL CX, $62 209 | JA tagLit63 210 | 211 | // case x == 62: 212 | // x = uint32(src[s-3]) | uint32(src[s-2])<<8 | uint32(src[s-1])<<16 213 | MOVWLZX -3(SI), CX 214 | MOVBLZX -1(SI), BX 215 | SHLL $16, BX 216 | ORL BX, CX 217 | JMP doLit 218 | 219 | tagLit63: 220 | // case x == 63: 221 | // x = uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24 222 | MOVL -4(SI), CX 223 | JMP doLit 224 | 225 | // The code above handles literal tags. 226 | // ---------------------------------------- 227 | // The code below handles copy tags. 228 | 229 | tagCopy4: 230 | // case tagCopy4: 231 | // s += 5 232 | ADDQ $5, SI 233 | 234 | // if uint(s) > uint(len(src)) { etc } 235 | MOVQ SI, BX 236 | SUBQ R11, BX 237 | CMPQ BX, R12 238 | JA errCorrupt 239 | 240 | // length = 1 + int(src[s-5])>>2 241 | SHRQ $2, CX 242 | INCQ CX 243 | 244 | // offset = int(uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24) 245 | MOVLQZX -4(SI), DX 246 | JMP doCopy 247 | 248 | tagCopy2: 249 | // case tagCopy2: 250 | // s += 3 251 | ADDQ $3, SI 252 | 253 | // if uint(s) > uint(len(src)) { etc } 254 | MOVQ SI, BX 255 | SUBQ R11, BX 256 | CMPQ BX, R12 257 | JA errCorrupt 258 | 259 | // length = 1 + int(src[s-3])>>2 260 | SHRQ $2, CX 261 | INCQ CX 262 | 263 | // offset = int(uint32(src[s-2]) | uint32(src[s-1])<<8) 264 | MOVWQZX -2(SI), DX 265 | JMP doCopy 266 | 267 | tagCopy: 268 | // We have a copy tag. We assume that: 269 | // - BX == src[s] & 0x03 270 | // - CX == src[s] 271 | CMPQ BX, $2 272 | JEQ tagCopy2 273 | JA tagCopy4 274 | 275 | // case tagCopy1: 276 | // s += 2 277 | ADDQ $2, SI 278 | 279 | // if uint(s) > uint(len(src)) { etc } 280 | MOVQ SI, BX 281 | SUBQ R11, BX 282 | CMPQ BX, R12 283 | JA errCorrupt 284 | 285 | // offset = int(uint32(src[s-2])&0xe0<<3 | uint32(src[s-1])) 286 | MOVQ CX, DX 287 | ANDQ $0xe0, DX 288 | SHLQ $3, DX 289 | MOVBQZX -1(SI), BX 290 | ORQ BX, DX 291 | 292 | // length = 4 + int(src[s-2])>>2&0x7 293 | SHRQ $2, CX 294 | ANDQ $7, CX 295 | ADDQ $4, CX 296 | 297 | doCopy: 298 | // This is the end of the outer "switch", when we have a copy tag. 299 | // 300 | // We assume that: 301 | // - CX == length && CX > 0 302 | // - DX == offset 303 | 304 | // if offset <= 0 { etc } 305 | CMPQ DX, $0 306 | JLE errCorrupt 307 | 308 | // if d < offset { etc } 309 | MOVQ DI, BX 310 | SUBQ R8, BX 311 | CMPQ BX, DX 312 | JLT errCorrupt 313 | 314 | // if length > len(dst)-d { etc } 315 | MOVQ R10, BX 316 | SUBQ DI, BX 317 | CMPQ CX, BX 318 | JGT errCorrupt 319 | 320 | // forwardCopy(dst[d:d+length], dst[d-offset:]); d += length 321 | // 322 | // Set: 323 | // - R14 = len(dst)-d 324 | // - R15 = &dst[d-offset] 325 | MOVQ R10, R14 326 | SUBQ DI, R14 327 | MOVQ DI, R15 328 | SUBQ DX, R15 329 | 330 | // !!! Try a faster technique for short (16 or fewer bytes) forward copies. 331 | // 332 | // First, try using two 8-byte load/stores, similar to the doLit technique 333 | // above. Even if dst[d:d+length] and dst[d-offset:] can overlap, this is 334 | // still OK if offset >= 8. Note that this has to be two 8-byte load/stores 335 | // and not one 16-byte load/store, and the first store has to be before the 336 | // second load, due to the overlap if offset is in the range [8, 16). 337 | // 338 | // if length > 16 || offset < 8 || len(dst)-d < 16 { 339 | // goto slowForwardCopy 340 | // } 341 | // copy 16 bytes 342 | // d += length 343 | CMPQ CX, $16 344 | JGT slowForwardCopy 345 | CMPQ DX, $8 346 | JLT slowForwardCopy 347 | CMPQ R14, $16 348 | JLT slowForwardCopy 349 | MOVQ 0(R15), AX 350 | MOVQ AX, 0(DI) 351 | MOVQ 8(R15), BX 352 | MOVQ BX, 8(DI) 353 | ADDQ CX, DI 354 | JMP loop 355 | 356 | slowForwardCopy: 357 | // !!! If the forward copy is longer than 16 bytes, or if offset < 8, we 358 | // can still try 8-byte load stores, provided we can overrun up to 10 extra 359 | // bytes. As above, the overrun will be fixed up by subsequent iterations 360 | // of the outermost loop. 361 | // 362 | // The C++ snappy code calls this technique IncrementalCopyFastPath. Its 363 | // commentary says: 364 | // 365 | // ---- 366 | // 367 | // The main part of this loop is a simple copy of eight bytes at a time 368 | // until we've copied (at least) the requested amount of bytes. However, 369 | // if d and d-offset are less than eight bytes apart (indicating a 370 | // repeating pattern of length < 8), we first need to expand the pattern in 371 | // order to get the correct results. For instance, if the buffer looks like 372 | // this, with the eight-byte and patterns marked as 373 | // intervals: 374 | // 375 | // abxxxxxxxxxxxx 376 | // [------] d-offset 377 | // [------] d 378 | // 379 | // a single eight-byte copy from to will repeat the pattern 380 | // once, after which we can move two bytes without moving : 381 | // 382 | // ababxxxxxxxxxx 383 | // [------] d-offset 384 | // [------] d 385 | // 386 | // and repeat the exercise until the two no longer overlap. 387 | // 388 | // This allows us to do very well in the special case of one single byte 389 | // repeated many times, without taking a big hit for more general cases. 390 | // 391 | // The worst case of extra writing past the end of the match occurs when 392 | // offset == 1 and length == 1; the last copy will read from byte positions 393 | // [0..7] and write to [4..11], whereas it was only supposed to write to 394 | // position 1. Thus, ten excess bytes. 395 | // 396 | // ---- 397 | // 398 | // That "10 byte overrun" worst case is confirmed by Go's 399 | // TestSlowForwardCopyOverrun, which also tests the fixUpSlowForwardCopy 400 | // and finishSlowForwardCopy algorithm. 401 | // 402 | // if length > len(dst)-d-10 { 403 | // goto verySlowForwardCopy 404 | // } 405 | SUBQ $10, R14 406 | CMPQ CX, R14 407 | JGT verySlowForwardCopy 408 | 409 | makeOffsetAtLeast8: 410 | // !!! As above, expand the pattern so that offset >= 8 and we can use 411 | // 8-byte load/stores. 412 | // 413 | // for offset < 8 { 414 | // copy 8 bytes from dst[d-offset:] to dst[d:] 415 | // length -= offset 416 | // d += offset 417 | // offset += offset 418 | // // The two previous lines together means that d-offset, and therefore 419 | // // R15, is unchanged. 420 | // } 421 | CMPQ DX, $8 422 | JGE fixUpSlowForwardCopy 423 | MOVQ (R15), BX 424 | MOVQ BX, (DI) 425 | SUBQ DX, CX 426 | ADDQ DX, DI 427 | ADDQ DX, DX 428 | JMP makeOffsetAtLeast8 429 | 430 | fixUpSlowForwardCopy: 431 | // !!! Add length (which might be negative now) to d (implied by DI being 432 | // &dst[d]) so that d ends up at the right place when we jump back to the 433 | // top of the loop. Before we do that, though, we save DI to AX so that, if 434 | // length is positive, copying the remaining length bytes will write to the 435 | // right place. 436 | MOVQ DI, AX 437 | ADDQ CX, DI 438 | 439 | finishSlowForwardCopy: 440 | // !!! Repeat 8-byte load/stores until length <= 0. Ending with a negative 441 | // length means that we overrun, but as above, that will be fixed up by 442 | // subsequent iterations of the outermost loop. 443 | CMPQ CX, $0 444 | JLE loop 445 | MOVQ (R15), BX 446 | MOVQ BX, (AX) 447 | ADDQ $8, R15 448 | ADDQ $8, AX 449 | SUBQ $8, CX 450 | JMP finishSlowForwardCopy 451 | 452 | verySlowForwardCopy: 453 | // verySlowForwardCopy is a simple implementation of forward copy. In C 454 | // parlance, this is a do/while loop instead of a while loop, since we know 455 | // that length > 0. In Go syntax: 456 | // 457 | // for { 458 | // dst[d] = dst[d - offset] 459 | // d++ 460 | // length-- 461 | // if length == 0 { 462 | // break 463 | // } 464 | // } 465 | MOVB (R15), BX 466 | MOVB BX, (DI) 467 | INCQ R15 468 | INCQ DI 469 | DECQ CX 470 | JNZ verySlowForwardCopy 471 | JMP loop 472 | 473 | // The code above handles copy tags. 474 | // ---------------------------------------- 475 | 476 | end: 477 | // This is the end of the "for s < len(src)". 478 | // 479 | // if d != len(dst) { etc } 480 | CMPQ DI, R10 481 | JNE errCorrupt 482 | 483 | // return 0 484 | MOVQ $0, ret+48(FP) 485 | RET 486 | 487 | errCorrupt: 488 | // return decodeErrCodeCorrupt 489 | MOVQ $1, ret+48(FP) 490 | RET 491 | -------------------------------------------------------------------------------- /decode_arm64.s: -------------------------------------------------------------------------------- 1 | // Copyright 2020 The Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // +build !appengine 6 | // +build gc 7 | // +build !noasm 8 | 9 | #include "textflag.h" 10 | 11 | // The asm code generally follows the pure Go code in decode_other.go, except 12 | // where marked with a "!!!". 13 | 14 | // func decode(dst, src []byte) int 15 | // 16 | // All local variables fit into registers. The non-zero stack size is only to 17 | // spill registers and push args when issuing a CALL. The register allocation: 18 | // - R2 scratch 19 | // - R3 scratch 20 | // - R4 length or x 21 | // - R5 offset 22 | // - R6 &src[s] 23 | // - R7 &dst[d] 24 | // + R8 dst_base 25 | // + R9 dst_len 26 | // + R10 dst_base + dst_len 27 | // + R11 src_base 28 | // + R12 src_len 29 | // + R13 src_base + src_len 30 | // - R14 used by doCopy 31 | // - R15 used by doCopy 32 | // 33 | // The registers R8-R13 (marked with a "+") are set at the start of the 34 | // function, and after a CALL returns, and are not otherwise modified. 35 | // 36 | // The d variable is implicitly R7 - R8, and len(dst)-d is R10 - R7. 37 | // The s variable is implicitly R6 - R11, and len(src)-s is R13 - R6. 38 | TEXT ·decode(SB), NOSPLIT, $56-56 39 | // Initialize R6, R7 and R8-R13. 40 | MOVD dst_base+0(FP), R8 41 | MOVD dst_len+8(FP), R9 42 | MOVD R8, R7 43 | MOVD R8, R10 44 | ADD R9, R10, R10 45 | MOVD src_base+24(FP), R11 46 | MOVD src_len+32(FP), R12 47 | MOVD R11, R6 48 | MOVD R11, R13 49 | ADD R12, R13, R13 50 | 51 | loop: 52 | // for s < len(src) 53 | CMP R13, R6 54 | BEQ end 55 | 56 | // R4 = uint32(src[s]) 57 | // 58 | // switch src[s] & 0x03 59 | MOVBU (R6), R4 60 | MOVW R4, R3 61 | ANDW $3, R3 62 | MOVW $1, R1 63 | CMPW R1, R3 64 | BGE tagCopy 65 | 66 | // ---------------------------------------- 67 | // The code below handles literal tags. 68 | 69 | // case tagLiteral: 70 | // x := uint32(src[s] >> 2) 71 | // switch 72 | MOVW $60, R1 73 | LSRW $2, R4, R4 74 | CMPW R4, R1 75 | BLS tagLit60Plus 76 | 77 | // case x < 60: 78 | // s++ 79 | ADD $1, R6, R6 80 | 81 | doLit: 82 | // This is the end of the inner "switch", when we have a literal tag. 83 | // 84 | // We assume that R4 == x and x fits in a uint32, where x is the variable 85 | // used in the pure Go decode_other.go code. 86 | 87 | // length = int(x) + 1 88 | // 89 | // Unlike the pure Go code, we don't need to check if length <= 0 because 90 | // R4 can hold 64 bits, so the increment cannot overflow. 91 | ADD $1, R4, R4 92 | 93 | // Prepare to check if copying length bytes will run past the end of dst or 94 | // src. 95 | // 96 | // R2 = len(dst) - d 97 | // R3 = len(src) - s 98 | MOVD R10, R2 99 | SUB R7, R2, R2 100 | MOVD R13, R3 101 | SUB R6, R3, R3 102 | 103 | // !!! Try a faster technique for short (16 or fewer bytes) copies. 104 | // 105 | // if length > 16 || len(dst)-d < 16 || len(src)-s < 16 { 106 | // goto callMemmove // Fall back on calling runtime·memmove. 107 | // } 108 | // 109 | // The C++ snappy code calls this TryFastAppend. It also checks len(src)-s 110 | // against 21 instead of 16, because it cannot assume that all of its input 111 | // is contiguous in memory and so it needs to leave enough source bytes to 112 | // read the next tag without refilling buffers, but Go's Decode assumes 113 | // contiguousness (the src argument is a []byte). 114 | CMP $16, R4 115 | BGT callMemmove 116 | CMP $16, R2 117 | BLT callMemmove 118 | CMP $16, R3 119 | BLT callMemmove 120 | 121 | // !!! Implement the copy from src to dst as a 16-byte load and store. 122 | // (Decode's documentation says that dst and src must not overlap.) 123 | // 124 | // This always copies 16 bytes, instead of only length bytes, but that's 125 | // OK. If the input is a valid Snappy encoding then subsequent iterations 126 | // will fix up the overrun. Otherwise, Decode returns a nil []byte (and a 127 | // non-nil error), so the overrun will be ignored. 128 | // 129 | // Note that on arm64, it is legal and cheap to issue unaligned 8-byte or 130 | // 16-byte loads and stores. This technique probably wouldn't be as 131 | // effective on architectures that are fussier about alignment. 132 | LDP 0(R6), (R14, R15) 133 | STP (R14, R15), 0(R7) 134 | 135 | // d += length 136 | // s += length 137 | ADD R4, R7, R7 138 | ADD R4, R6, R6 139 | B loop 140 | 141 | callMemmove: 142 | // if length > len(dst)-d || length > len(src)-s { etc } 143 | CMP R2, R4 144 | BGT errCorrupt 145 | CMP R3, R4 146 | BGT errCorrupt 147 | 148 | // copy(dst[d:], src[s:s+length]) 149 | // 150 | // This means calling runtime·memmove(&dst[d], &src[s], length), so we push 151 | // R7, R6 and R4 as arguments. Coincidentally, we also need to spill those 152 | // three registers to the stack, to save local variables across the CALL. 153 | MOVD R7, 8(RSP) 154 | MOVD R6, 16(RSP) 155 | MOVD R4, 24(RSP) 156 | MOVD R7, 32(RSP) 157 | MOVD R6, 40(RSP) 158 | MOVD R4, 48(RSP) 159 | CALL runtime·memmove(SB) 160 | 161 | // Restore local variables: unspill registers from the stack and 162 | // re-calculate R8-R13. 163 | MOVD 32(RSP), R7 164 | MOVD 40(RSP), R6 165 | MOVD 48(RSP), R4 166 | MOVD dst_base+0(FP), R8 167 | MOVD dst_len+8(FP), R9 168 | MOVD R8, R10 169 | ADD R9, R10, R10 170 | MOVD src_base+24(FP), R11 171 | MOVD src_len+32(FP), R12 172 | MOVD R11, R13 173 | ADD R12, R13, R13 174 | 175 | // d += length 176 | // s += length 177 | ADD R4, R7, R7 178 | ADD R4, R6, R6 179 | B loop 180 | 181 | tagLit60Plus: 182 | // !!! This fragment does the 183 | // 184 | // s += x - 58; if uint(s) > uint(len(src)) { etc } 185 | // 186 | // checks. In the asm version, we code it once instead of once per switch case. 187 | ADD R4, R6, R6 188 | SUB $58, R6, R6 189 | MOVD R6, R3 190 | SUB R11, R3, R3 191 | CMP R12, R3 192 | BGT errCorrupt 193 | 194 | // case x == 60: 195 | MOVW $61, R1 196 | CMPW R1, R4 197 | BEQ tagLit61 198 | BGT tagLit62Plus 199 | 200 | // x = uint32(src[s-1]) 201 | MOVBU -1(R6), R4 202 | B doLit 203 | 204 | tagLit61: 205 | // case x == 61: 206 | // x = uint32(src[s-2]) | uint32(src[s-1])<<8 207 | MOVHU -2(R6), R4 208 | B doLit 209 | 210 | tagLit62Plus: 211 | CMPW $62, R4 212 | BHI tagLit63 213 | 214 | // case x == 62: 215 | // x = uint32(src[s-3]) | uint32(src[s-2])<<8 | uint32(src[s-1])<<16 216 | MOVHU -3(R6), R4 217 | MOVBU -1(R6), R3 218 | ORR R3<<16, R4 219 | B doLit 220 | 221 | tagLit63: 222 | // case x == 63: 223 | // x = uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24 224 | MOVWU -4(R6), R4 225 | B doLit 226 | 227 | // The code above handles literal tags. 228 | // ---------------------------------------- 229 | // The code below handles copy tags. 230 | 231 | tagCopy4: 232 | // case tagCopy4: 233 | // s += 5 234 | ADD $5, R6, R6 235 | 236 | // if uint(s) > uint(len(src)) { etc } 237 | MOVD R6, R3 238 | SUB R11, R3, R3 239 | CMP R12, R3 240 | BGT errCorrupt 241 | 242 | // length = 1 + int(src[s-5])>>2 243 | MOVD $1, R1 244 | ADD R4>>2, R1, R4 245 | 246 | // offset = int(uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24) 247 | MOVWU -4(R6), R5 248 | B doCopy 249 | 250 | tagCopy2: 251 | // case tagCopy2: 252 | // s += 3 253 | ADD $3, R6, R6 254 | 255 | // if uint(s) > uint(len(src)) { etc } 256 | MOVD R6, R3 257 | SUB R11, R3, R3 258 | CMP R12, R3 259 | BGT errCorrupt 260 | 261 | // length = 1 + int(src[s-3])>>2 262 | MOVD $1, R1 263 | ADD R4>>2, R1, R4 264 | 265 | // offset = int(uint32(src[s-2]) | uint32(src[s-1])<<8) 266 | MOVHU -2(R6), R5 267 | B doCopy 268 | 269 | tagCopy: 270 | // We have a copy tag. We assume that: 271 | // - R3 == src[s] & 0x03 272 | // - R4 == src[s] 273 | CMP $2, R3 274 | BEQ tagCopy2 275 | BGT tagCopy4 276 | 277 | // case tagCopy1: 278 | // s += 2 279 | ADD $2, R6, R6 280 | 281 | // if uint(s) > uint(len(src)) { etc } 282 | MOVD R6, R3 283 | SUB R11, R3, R3 284 | CMP R12, R3 285 | BGT errCorrupt 286 | 287 | // offset = int(uint32(src[s-2])&0xe0<<3 | uint32(src[s-1])) 288 | MOVD R4, R5 289 | AND $0xe0, R5 290 | MOVBU -1(R6), R3 291 | ORR R5<<3, R3, R5 292 | 293 | // length = 4 + int(src[s-2])>>2&0x7 294 | MOVD $7, R1 295 | AND R4>>2, R1, R4 296 | ADD $4, R4, R4 297 | 298 | doCopy: 299 | // This is the end of the outer "switch", when we have a copy tag. 300 | // 301 | // We assume that: 302 | // - R4 == length && R4 > 0 303 | // - R5 == offset 304 | 305 | // if offset <= 0 { etc } 306 | MOVD $0, R1 307 | CMP R1, R5 308 | BLE errCorrupt 309 | 310 | // if d < offset { etc } 311 | MOVD R7, R3 312 | SUB R8, R3, R3 313 | CMP R5, R3 314 | BLT errCorrupt 315 | 316 | // if length > len(dst)-d { etc } 317 | MOVD R10, R3 318 | SUB R7, R3, R3 319 | CMP R3, R4 320 | BGT errCorrupt 321 | 322 | // forwardCopy(dst[d:d+length], dst[d-offset:]); d += length 323 | // 324 | // Set: 325 | // - R14 = len(dst)-d 326 | // - R15 = &dst[d-offset] 327 | MOVD R10, R14 328 | SUB R7, R14, R14 329 | MOVD R7, R15 330 | SUB R5, R15, R15 331 | 332 | // !!! Try a faster technique for short (16 or fewer bytes) forward copies. 333 | // 334 | // First, try using two 8-byte load/stores, similar to the doLit technique 335 | // above. Even if dst[d:d+length] and dst[d-offset:] can overlap, this is 336 | // still OK if offset >= 8. Note that this has to be two 8-byte load/stores 337 | // and not one 16-byte load/store, and the first store has to be before the 338 | // second load, due to the overlap if offset is in the range [8, 16). 339 | // 340 | // if length > 16 || offset < 8 || len(dst)-d < 16 { 341 | // goto slowForwardCopy 342 | // } 343 | // copy 16 bytes 344 | // d += length 345 | CMP $16, R4 346 | BGT slowForwardCopy 347 | CMP $8, R5 348 | BLT slowForwardCopy 349 | CMP $16, R14 350 | BLT slowForwardCopy 351 | MOVD 0(R15), R2 352 | MOVD R2, 0(R7) 353 | MOVD 8(R15), R3 354 | MOVD R3, 8(R7) 355 | ADD R4, R7, R7 356 | B loop 357 | 358 | slowForwardCopy: 359 | // !!! If the forward copy is longer than 16 bytes, or if offset < 8, we 360 | // can still try 8-byte load stores, provided we can overrun up to 10 extra 361 | // bytes. As above, the overrun will be fixed up by subsequent iterations 362 | // of the outermost loop. 363 | // 364 | // The C++ snappy code calls this technique IncrementalCopyFastPath. Its 365 | // commentary says: 366 | // 367 | // ---- 368 | // 369 | // The main part of this loop is a simple copy of eight bytes at a time 370 | // until we've copied (at least) the requested amount of bytes. However, 371 | // if d and d-offset are less than eight bytes apart (indicating a 372 | // repeating pattern of length < 8), we first need to expand the pattern in 373 | // order to get the correct results. For instance, if the buffer looks like 374 | // this, with the eight-byte and patterns marked as 375 | // intervals: 376 | // 377 | // abxxxxxxxxxxxx 378 | // [------] d-offset 379 | // [------] d 380 | // 381 | // a single eight-byte copy from to will repeat the pattern 382 | // once, after which we can move two bytes without moving : 383 | // 384 | // ababxxxxxxxxxx 385 | // [------] d-offset 386 | // [------] d 387 | // 388 | // and repeat the exercise until the two no longer overlap. 389 | // 390 | // This allows us to do very well in the special case of one single byte 391 | // repeated many times, without taking a big hit for more general cases. 392 | // 393 | // The worst case of extra writing past the end of the match occurs when 394 | // offset == 1 and length == 1; the last copy will read from byte positions 395 | // [0..7] and write to [4..11], whereas it was only supposed to write to 396 | // position 1. Thus, ten excess bytes. 397 | // 398 | // ---- 399 | // 400 | // That "10 byte overrun" worst case is confirmed by Go's 401 | // TestSlowForwardCopyOverrun, which also tests the fixUpSlowForwardCopy 402 | // and finishSlowForwardCopy algorithm. 403 | // 404 | // if length > len(dst)-d-10 { 405 | // goto verySlowForwardCopy 406 | // } 407 | SUB $10, R14, R14 408 | CMP R14, R4 409 | BGT verySlowForwardCopy 410 | 411 | makeOffsetAtLeast8: 412 | // !!! As above, expand the pattern so that offset >= 8 and we can use 413 | // 8-byte load/stores. 414 | // 415 | // for offset < 8 { 416 | // copy 8 bytes from dst[d-offset:] to dst[d:] 417 | // length -= offset 418 | // d += offset 419 | // offset += offset 420 | // // The two previous lines together means that d-offset, and therefore 421 | // // R15, is unchanged. 422 | // } 423 | CMP $8, R5 424 | BGE fixUpSlowForwardCopy 425 | MOVD (R15), R3 426 | MOVD R3, (R7) 427 | SUB R5, R4, R4 428 | ADD R5, R7, R7 429 | ADD R5, R5, R5 430 | B makeOffsetAtLeast8 431 | 432 | fixUpSlowForwardCopy: 433 | // !!! Add length (which might be negative now) to d (implied by R7 being 434 | // &dst[d]) so that d ends up at the right place when we jump back to the 435 | // top of the loop. Before we do that, though, we save R7 to R2 so that, if 436 | // length is positive, copying the remaining length bytes will write to the 437 | // right place. 438 | MOVD R7, R2 439 | ADD R4, R7, R7 440 | 441 | finishSlowForwardCopy: 442 | // !!! Repeat 8-byte load/stores until length <= 0. Ending with a negative 443 | // length means that we overrun, but as above, that will be fixed up by 444 | // subsequent iterations of the outermost loop. 445 | MOVD $0, R1 446 | CMP R1, R4 447 | BLE loop 448 | MOVD (R15), R3 449 | MOVD R3, (R2) 450 | ADD $8, R15, R15 451 | ADD $8, R2, R2 452 | SUB $8, R4, R4 453 | B finishSlowForwardCopy 454 | 455 | verySlowForwardCopy: 456 | // verySlowForwardCopy is a simple implementation of forward copy. In C 457 | // parlance, this is a do/while loop instead of a while loop, since we know 458 | // that length > 0. In Go syntax: 459 | // 460 | // for { 461 | // dst[d] = dst[d - offset] 462 | // d++ 463 | // length-- 464 | // if length == 0 { 465 | // break 466 | // } 467 | // } 468 | MOVB (R15), R3 469 | MOVB R3, (R7) 470 | ADD $1, R15, R15 471 | ADD $1, R7, R7 472 | SUB $1, R4, R4 473 | CBNZ R4, verySlowForwardCopy 474 | B loop 475 | 476 | // The code above handles copy tags. 477 | // ---------------------------------------- 478 | 479 | end: 480 | // This is the end of the "for s < len(src)". 481 | // 482 | // if d != len(dst) { etc } 483 | CMP R10, R7 484 | BNE errCorrupt 485 | 486 | // return 0 487 | MOVD $0, ret+48(FP) 488 | RET 489 | 490 | errCorrupt: 491 | // return decodeErrCodeCorrupt 492 | MOVD $1, R2 493 | MOVD R2, ret+48(FP) 494 | RET 495 | -------------------------------------------------------------------------------- /decode_asm.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Snappy-Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // +build !appengine 6 | // +build gc 7 | // +build !noasm 8 | // +build amd64 arm64 9 | 10 | package snappy 11 | 12 | // decode has the same semantics as in decode_other.go. 13 | // 14 | //go:noescape 15 | func decode(dst, src []byte) int 16 | -------------------------------------------------------------------------------- /decode_other.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Snappy-Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // +build !amd64,!arm64 appengine !gc noasm 6 | 7 | package snappy 8 | 9 | // decode writes the decoding of src to dst. It assumes that the varint-encoded 10 | // length of the decompressed bytes has already been read, and that len(dst) 11 | // equals that length. 12 | // 13 | // It returns 0 on success or a decodeErrCodeXxx error code on failure. 14 | func decode(dst, src []byte) int { 15 | var d, s, offset, length int 16 | for s < len(src) { 17 | switch src[s] & 0x03 { 18 | case tagLiteral: 19 | x := uint32(src[s] >> 2) 20 | switch { 21 | case x < 60: 22 | s++ 23 | case x == 60: 24 | s += 2 25 | if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line. 26 | return decodeErrCodeCorrupt 27 | } 28 | x = uint32(src[s-1]) 29 | case x == 61: 30 | s += 3 31 | if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line. 32 | return decodeErrCodeCorrupt 33 | } 34 | x = uint32(src[s-2]) | uint32(src[s-1])<<8 35 | case x == 62: 36 | s += 4 37 | if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line. 38 | return decodeErrCodeCorrupt 39 | } 40 | x = uint32(src[s-3]) | uint32(src[s-2])<<8 | uint32(src[s-1])<<16 41 | case x == 63: 42 | s += 5 43 | if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line. 44 | return decodeErrCodeCorrupt 45 | } 46 | x = uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24 47 | } 48 | length = int(x) + 1 49 | if length <= 0 { 50 | return decodeErrCodeUnsupportedLiteralLength 51 | } 52 | if length > len(dst)-d || length > len(src)-s { 53 | return decodeErrCodeCorrupt 54 | } 55 | copy(dst[d:], src[s:s+length]) 56 | d += length 57 | s += length 58 | continue 59 | 60 | case tagCopy1: 61 | s += 2 62 | if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line. 63 | return decodeErrCodeCorrupt 64 | } 65 | length = 4 + int(src[s-2])>>2&0x7 66 | offset = int(uint32(src[s-2])&0xe0<<3 | uint32(src[s-1])) 67 | 68 | case tagCopy2: 69 | s += 3 70 | if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line. 71 | return decodeErrCodeCorrupt 72 | } 73 | length = 1 + int(src[s-3])>>2 74 | offset = int(uint32(src[s-2]) | uint32(src[s-1])<<8) 75 | 76 | case tagCopy4: 77 | s += 5 78 | if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line. 79 | return decodeErrCodeCorrupt 80 | } 81 | length = 1 + int(src[s-5])>>2 82 | offset = int(uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24) 83 | } 84 | 85 | if offset <= 0 || d < offset || length > len(dst)-d { 86 | return decodeErrCodeCorrupt 87 | } 88 | // Copy from an earlier sub-slice of dst to a later sub-slice. 89 | // If no overlap, use the built-in copy: 90 | if offset >= length { 91 | copy(dst[d:d+length], dst[d-offset:]) 92 | d += length 93 | continue 94 | } 95 | 96 | // Unlike the built-in copy function, this byte-by-byte copy always runs 97 | // forwards, even if the slices overlap. Conceptually, this is: 98 | // 99 | // d += forwardCopy(dst[d:d+length], dst[d-offset:]) 100 | // 101 | // We align the slices into a and b and show the compiler they are the same size. 102 | // This allows the loop to run without bounds checks. 103 | a := dst[d : d+length] 104 | b := dst[d-offset:] 105 | b = b[:len(a)] 106 | for i := range a { 107 | a[i] = b[i] 108 | } 109 | d += length 110 | } 111 | if d != len(dst) { 112 | return decodeErrCodeCorrupt 113 | } 114 | return 0 115 | } 116 | -------------------------------------------------------------------------------- /encode.go: -------------------------------------------------------------------------------- 1 | // Copyright 2011 The Snappy-Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | package snappy 6 | 7 | import ( 8 | "encoding/binary" 9 | "errors" 10 | "io" 11 | ) 12 | 13 | // Encode returns the encoded form of src. The returned slice may be a sub- 14 | // slice of dst if dst was large enough to hold the entire encoded block. 15 | // Otherwise, a newly allocated slice will be returned. 16 | // 17 | // The dst and src must not overlap. It is valid to pass a nil dst. 18 | // 19 | // Encode handles the Snappy block format, not the Snappy stream format. 20 | func Encode(dst, src []byte) []byte { 21 | if n := MaxEncodedLen(len(src)); n < 0 { 22 | panic(ErrTooLarge) 23 | } else if len(dst) < n { 24 | dst = make([]byte, n) 25 | } 26 | 27 | // The block starts with the varint-encoded length of the decompressed bytes. 28 | d := binary.PutUvarint(dst, uint64(len(src))) 29 | 30 | for len(src) > 0 { 31 | p := src 32 | src = nil 33 | if len(p) > maxBlockSize { 34 | p, src = p[:maxBlockSize], p[maxBlockSize:] 35 | } 36 | if len(p) < minNonLiteralBlockSize { 37 | d += emitLiteral(dst[d:], p) 38 | } else { 39 | d += encodeBlock(dst[d:], p) 40 | } 41 | } 42 | return dst[:d] 43 | } 44 | 45 | // inputMargin is the minimum number of extra input bytes to keep, inside 46 | // encodeBlock's inner loop. On some architectures, this margin lets us 47 | // implement a fast path for emitLiteral, where the copy of short (<= 16 byte) 48 | // literals can be implemented as a single load to and store from a 16-byte 49 | // register. That literal's actual length can be as short as 1 byte, so this 50 | // can copy up to 15 bytes too much, but that's OK as subsequent iterations of 51 | // the encoding loop will fix up the copy overrun, and this inputMargin ensures 52 | // that we don't overrun the dst and src buffers. 53 | const inputMargin = 16 - 1 54 | 55 | // minNonLiteralBlockSize is the minimum size of the input to encodeBlock that 56 | // could be encoded with a copy tag. This is the minimum with respect to the 57 | // algorithm used by encodeBlock, not a minimum enforced by the file format. 58 | // 59 | // The encoded output must start with at least a 1 byte literal, as there are 60 | // no previous bytes to copy. A minimal (1 byte) copy after that, generated 61 | // from an emitCopy call in encodeBlock's main loop, would require at least 62 | // another inputMargin bytes, for the reason above: we want any emitLiteral 63 | // calls inside encodeBlock's main loop to use the fast path if possible, which 64 | // requires being able to overrun by inputMargin bytes. Thus, 65 | // minNonLiteralBlockSize equals 1 + 1 + inputMargin. 66 | // 67 | // The C++ code doesn't use this exact threshold, but it could, as discussed at 68 | // https://groups.google.com/d/topic/snappy-compression/oGbhsdIJSJ8/discussion 69 | // The difference between Go (2+inputMargin) and C++ (inputMargin) is purely an 70 | // optimization. It should not affect the encoded form. This is tested by 71 | // TestSameEncodingAsCppShortCopies. 72 | const minNonLiteralBlockSize = 1 + 1 + inputMargin 73 | 74 | // MaxEncodedLen returns the maximum length of a snappy block, given its 75 | // uncompressed length. 76 | // 77 | // It will return a negative value if srcLen is too large to encode. 78 | func MaxEncodedLen(srcLen int) int { 79 | n := uint64(srcLen) 80 | if n > 0xffffffff { 81 | return -1 82 | } 83 | // Compressed data can be defined as: 84 | // compressed := item* literal* 85 | // item := literal* copy 86 | // 87 | // The trailing literal sequence has a space blowup of at most 62/60 88 | // since a literal of length 60 needs one tag byte + one extra byte 89 | // for length information. 90 | // 91 | // Item blowup is trickier to measure. Suppose the "copy" op copies 92 | // 4 bytes of data. Because of a special check in the encoding code, 93 | // we produce a 4-byte copy only if the offset is < 65536. Therefore 94 | // the copy op takes 3 bytes to encode, and this type of item leads 95 | // to at most the 62/60 blowup for representing literals. 96 | // 97 | // Suppose the "copy" op copies 5 bytes of data. If the offset is big 98 | // enough, it will take 5 bytes to encode the copy op. Therefore the 99 | // worst case here is a one-byte literal followed by a five-byte copy. 100 | // That is, 6 bytes of input turn into 7 bytes of "compressed" data. 101 | // 102 | // This last factor dominates the blowup, so the final estimate is: 103 | n = 32 + n + n/6 104 | if n > 0xffffffff { 105 | return -1 106 | } 107 | return int(n) 108 | } 109 | 110 | var errClosed = errors.New("snappy: Writer is closed") 111 | 112 | // NewWriter returns a new Writer that compresses to w. 113 | // 114 | // The Writer returned does not buffer writes. There is no need to Flush or 115 | // Close such a Writer. 116 | // 117 | // Deprecated: the Writer returned is not suitable for many small writes, only 118 | // for few large writes. Use NewBufferedWriter instead, which is efficient 119 | // regardless of the frequency and shape of the writes, and remember to Close 120 | // that Writer when done. 121 | func NewWriter(w io.Writer) *Writer { 122 | return &Writer{ 123 | w: w, 124 | obuf: make([]byte, obufLen), 125 | } 126 | } 127 | 128 | // NewBufferedWriter returns a new Writer that compresses to w, using the 129 | // framing format described at 130 | // https://github.com/google/snappy/blob/master/framing_format.txt 131 | // 132 | // The Writer returned buffers writes. Users must call Close to guarantee all 133 | // data has been forwarded to the underlying io.Writer. They may also call 134 | // Flush zero or more times before calling Close. 135 | func NewBufferedWriter(w io.Writer) *Writer { 136 | return &Writer{ 137 | w: w, 138 | ibuf: make([]byte, 0, maxBlockSize), 139 | obuf: make([]byte, obufLen), 140 | } 141 | } 142 | 143 | // Writer is an io.Writer that can write Snappy-compressed bytes. 144 | // 145 | // Writer handles the Snappy stream format, not the Snappy block format. 146 | type Writer struct { 147 | w io.Writer 148 | err error 149 | 150 | // ibuf is a buffer for the incoming (uncompressed) bytes. 151 | // 152 | // Its use is optional. For backwards compatibility, Writers created by the 153 | // NewWriter function have ibuf == nil, do not buffer incoming bytes, and 154 | // therefore do not need to be Flush'ed or Close'd. 155 | ibuf []byte 156 | 157 | // obuf is a buffer for the outgoing (compressed) bytes. 158 | obuf []byte 159 | 160 | // wroteStreamHeader is whether we have written the stream header. 161 | wroteStreamHeader bool 162 | } 163 | 164 | // Reset discards the writer's state and switches the Snappy writer to write to 165 | // w. This permits reusing a Writer rather than allocating a new one. 166 | func (w *Writer) Reset(writer io.Writer) { 167 | w.w = writer 168 | w.err = nil 169 | if w.ibuf != nil { 170 | w.ibuf = w.ibuf[:0] 171 | } 172 | w.wroteStreamHeader = false 173 | } 174 | 175 | // Write satisfies the io.Writer interface. 176 | func (w *Writer) Write(p []byte) (nRet int, errRet error) { 177 | if w.ibuf == nil { 178 | // Do not buffer incoming bytes. This does not perform or compress well 179 | // if the caller of Writer.Write writes many small slices. This 180 | // behavior is therefore deprecated, but still supported for backwards 181 | // compatibility with code that doesn't explicitly Flush or Close. 182 | return w.write(p) 183 | } 184 | 185 | // The remainder of this method is based on bufio.Writer.Write from the 186 | // standard library. 187 | 188 | for len(p) > (cap(w.ibuf)-len(w.ibuf)) && w.err == nil { 189 | var n int 190 | if len(w.ibuf) == 0 { 191 | // Large write, empty buffer. 192 | // Write directly from p to avoid copy. 193 | n, _ = w.write(p) 194 | } else { 195 | n = copy(w.ibuf[len(w.ibuf):cap(w.ibuf)], p) 196 | w.ibuf = w.ibuf[:len(w.ibuf)+n] 197 | w.Flush() 198 | } 199 | nRet += n 200 | p = p[n:] 201 | } 202 | if w.err != nil { 203 | return nRet, w.err 204 | } 205 | n := copy(w.ibuf[len(w.ibuf):cap(w.ibuf)], p) 206 | w.ibuf = w.ibuf[:len(w.ibuf)+n] 207 | nRet += n 208 | return nRet, nil 209 | } 210 | 211 | func (w *Writer) write(p []byte) (nRet int, errRet error) { 212 | if w.err != nil { 213 | return 0, w.err 214 | } 215 | for len(p) > 0 { 216 | obufStart := len(magicChunk) 217 | if !w.wroteStreamHeader { 218 | w.wroteStreamHeader = true 219 | copy(w.obuf, magicChunk) 220 | obufStart = 0 221 | } 222 | 223 | var uncompressed []byte 224 | if len(p) > maxBlockSize { 225 | uncompressed, p = p[:maxBlockSize], p[maxBlockSize:] 226 | } else { 227 | uncompressed, p = p, nil 228 | } 229 | checksum := crc(uncompressed) 230 | 231 | // Compress the buffer, discarding the result if the improvement 232 | // isn't at least 12.5%. 233 | compressed := Encode(w.obuf[obufHeaderLen:], uncompressed) 234 | chunkType := uint8(chunkTypeCompressedData) 235 | chunkLen := 4 + len(compressed) 236 | obufEnd := obufHeaderLen + len(compressed) 237 | if len(compressed) >= len(uncompressed)-len(uncompressed)/8 { 238 | chunkType = chunkTypeUncompressedData 239 | chunkLen = 4 + len(uncompressed) 240 | obufEnd = obufHeaderLen 241 | } 242 | 243 | // Fill in the per-chunk header that comes before the body. 244 | w.obuf[len(magicChunk)+0] = chunkType 245 | w.obuf[len(magicChunk)+1] = uint8(chunkLen >> 0) 246 | w.obuf[len(magicChunk)+2] = uint8(chunkLen >> 8) 247 | w.obuf[len(magicChunk)+3] = uint8(chunkLen >> 16) 248 | w.obuf[len(magicChunk)+4] = uint8(checksum >> 0) 249 | w.obuf[len(magicChunk)+5] = uint8(checksum >> 8) 250 | w.obuf[len(magicChunk)+6] = uint8(checksum >> 16) 251 | w.obuf[len(magicChunk)+7] = uint8(checksum >> 24) 252 | 253 | if _, err := w.w.Write(w.obuf[obufStart:obufEnd]); err != nil { 254 | w.err = err 255 | return nRet, err 256 | } 257 | if chunkType == chunkTypeUncompressedData { 258 | if _, err := w.w.Write(uncompressed); err != nil { 259 | w.err = err 260 | return nRet, err 261 | } 262 | } 263 | nRet += len(uncompressed) 264 | } 265 | return nRet, nil 266 | } 267 | 268 | // Flush flushes the Writer to its underlying io.Writer. 269 | func (w *Writer) Flush() error { 270 | if w.err != nil { 271 | return w.err 272 | } 273 | if len(w.ibuf) == 0 { 274 | return nil 275 | } 276 | w.write(w.ibuf) 277 | w.ibuf = w.ibuf[:0] 278 | return w.err 279 | } 280 | 281 | // Close calls Flush and then closes the Writer. 282 | func (w *Writer) Close() error { 283 | w.Flush() 284 | ret := w.err 285 | if w.err == nil { 286 | w.err = errClosed 287 | } 288 | return ret 289 | } 290 | -------------------------------------------------------------------------------- /encode_amd64.s: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // +build !appengine 6 | // +build gc 7 | // +build !noasm 8 | 9 | #include "textflag.h" 10 | 11 | // The XXX lines assemble on Go 1.4, 1.5 and 1.7, but not 1.6, due to a 12 | // Go toolchain regression. See https://github.com/golang/go/issues/15426 and 13 | // https://github.com/golang/snappy/issues/29 14 | // 15 | // As a workaround, the package was built with a known good assembler, and 16 | // those instructions were disassembled by "objdump -d" to yield the 17 | // 4e 0f b7 7c 5c 78 movzwq 0x78(%rsp,%r11,2),%r15 18 | // style comments, in AT&T asm syntax. Note that rsp here is a physical 19 | // register, not Go/asm's SP pseudo-register (see https://golang.org/doc/asm). 20 | // The instructions were then encoded as "BYTE $0x.." sequences, which assemble 21 | // fine on Go 1.6. 22 | 23 | // The asm code generally follows the pure Go code in encode_other.go, except 24 | // where marked with a "!!!". 25 | 26 | // ---------------------------------------------------------------------------- 27 | 28 | // func emitLiteral(dst, lit []byte) int 29 | // 30 | // All local variables fit into registers. The register allocation: 31 | // - AX len(lit) 32 | // - BX n 33 | // - DX return value 34 | // - DI &dst[i] 35 | // - R10 &lit[0] 36 | // 37 | // The 24 bytes of stack space is to call runtime·memmove. 38 | // 39 | // The unusual register allocation of local variables, such as R10 for the 40 | // source pointer, matches the allocation used at the call site in encodeBlock, 41 | // which makes it easier to manually inline this function. 42 | TEXT ·emitLiteral(SB), NOSPLIT, $24-56 43 | MOVQ dst_base+0(FP), DI 44 | MOVQ lit_base+24(FP), R10 45 | MOVQ lit_len+32(FP), AX 46 | MOVQ AX, DX 47 | MOVL AX, BX 48 | SUBL $1, BX 49 | 50 | CMPL BX, $60 51 | JLT oneByte 52 | CMPL BX, $256 53 | JLT twoBytes 54 | 55 | threeBytes: 56 | MOVB $0xf4, 0(DI) 57 | MOVW BX, 1(DI) 58 | ADDQ $3, DI 59 | ADDQ $3, DX 60 | JMP memmove 61 | 62 | twoBytes: 63 | MOVB $0xf0, 0(DI) 64 | MOVB BX, 1(DI) 65 | ADDQ $2, DI 66 | ADDQ $2, DX 67 | JMP memmove 68 | 69 | oneByte: 70 | SHLB $2, BX 71 | MOVB BX, 0(DI) 72 | ADDQ $1, DI 73 | ADDQ $1, DX 74 | 75 | memmove: 76 | MOVQ DX, ret+48(FP) 77 | 78 | // copy(dst[i:], lit) 79 | // 80 | // This means calling runtime·memmove(&dst[i], &lit[0], len(lit)), so we push 81 | // DI, R10 and AX as arguments. 82 | MOVQ DI, 0(SP) 83 | MOVQ R10, 8(SP) 84 | MOVQ AX, 16(SP) 85 | CALL runtime·memmove(SB) 86 | RET 87 | 88 | // ---------------------------------------------------------------------------- 89 | 90 | // func emitCopy(dst []byte, offset, length int) int 91 | // 92 | // All local variables fit into registers. The register allocation: 93 | // - AX length 94 | // - SI &dst[0] 95 | // - DI &dst[i] 96 | // - R11 offset 97 | // 98 | // The unusual register allocation of local variables, such as R11 for the 99 | // offset, matches the allocation used at the call site in encodeBlock, which 100 | // makes it easier to manually inline this function. 101 | TEXT ·emitCopy(SB), NOSPLIT, $0-48 102 | MOVQ dst_base+0(FP), DI 103 | MOVQ DI, SI 104 | MOVQ offset+24(FP), R11 105 | MOVQ length+32(FP), AX 106 | 107 | loop0: 108 | // for length >= 68 { etc } 109 | CMPL AX, $68 110 | JLT step1 111 | 112 | // Emit a length 64 copy, encoded as 3 bytes. 113 | MOVB $0xfe, 0(DI) 114 | MOVW R11, 1(DI) 115 | ADDQ $3, DI 116 | SUBL $64, AX 117 | JMP loop0 118 | 119 | step1: 120 | // if length > 64 { etc } 121 | CMPL AX, $64 122 | JLE step2 123 | 124 | // Emit a length 60 copy, encoded as 3 bytes. 125 | MOVB $0xee, 0(DI) 126 | MOVW R11, 1(DI) 127 | ADDQ $3, DI 128 | SUBL $60, AX 129 | 130 | step2: 131 | // if length >= 12 || offset >= 2048 { goto step3 } 132 | CMPL AX, $12 133 | JGE step3 134 | CMPL R11, $2048 135 | JGE step3 136 | 137 | // Emit the remaining copy, encoded as 2 bytes. 138 | MOVB R11, 1(DI) 139 | SHRL $8, R11 140 | SHLB $5, R11 141 | SUBB $4, AX 142 | SHLB $2, AX 143 | ORB AX, R11 144 | ORB $1, R11 145 | MOVB R11, 0(DI) 146 | ADDQ $2, DI 147 | 148 | // Return the number of bytes written. 149 | SUBQ SI, DI 150 | MOVQ DI, ret+40(FP) 151 | RET 152 | 153 | step3: 154 | // Emit the remaining copy, encoded as 3 bytes. 155 | SUBL $1, AX 156 | SHLB $2, AX 157 | ORB $2, AX 158 | MOVB AX, 0(DI) 159 | MOVW R11, 1(DI) 160 | ADDQ $3, DI 161 | 162 | // Return the number of bytes written. 163 | SUBQ SI, DI 164 | MOVQ DI, ret+40(FP) 165 | RET 166 | 167 | // ---------------------------------------------------------------------------- 168 | 169 | // func extendMatch(src []byte, i, j int) int 170 | // 171 | // All local variables fit into registers. The register allocation: 172 | // - DX &src[0] 173 | // - SI &src[j] 174 | // - R13 &src[len(src) - 8] 175 | // - R14 &src[len(src)] 176 | // - R15 &src[i] 177 | // 178 | // The unusual register allocation of local variables, such as R15 for a source 179 | // pointer, matches the allocation used at the call site in encodeBlock, which 180 | // makes it easier to manually inline this function. 181 | TEXT ·extendMatch(SB), NOSPLIT, $0-48 182 | MOVQ src_base+0(FP), DX 183 | MOVQ src_len+8(FP), R14 184 | MOVQ i+24(FP), R15 185 | MOVQ j+32(FP), SI 186 | ADDQ DX, R14 187 | ADDQ DX, R15 188 | ADDQ DX, SI 189 | MOVQ R14, R13 190 | SUBQ $8, R13 191 | 192 | cmp8: 193 | // As long as we are 8 or more bytes before the end of src, we can load and 194 | // compare 8 bytes at a time. If those 8 bytes are equal, repeat. 195 | CMPQ SI, R13 196 | JA cmp1 197 | MOVQ (R15), AX 198 | MOVQ (SI), BX 199 | CMPQ AX, BX 200 | JNE bsf 201 | ADDQ $8, R15 202 | ADDQ $8, SI 203 | JMP cmp8 204 | 205 | bsf: 206 | // If those 8 bytes were not equal, XOR the two 8 byte values, and return 207 | // the index of the first byte that differs. The BSF instruction finds the 208 | // least significant 1 bit, the amd64 architecture is little-endian, and 209 | // the shift by 3 converts a bit index to a byte index. 210 | XORQ AX, BX 211 | BSFQ BX, BX 212 | SHRQ $3, BX 213 | ADDQ BX, SI 214 | 215 | // Convert from &src[ret] to ret. 216 | SUBQ DX, SI 217 | MOVQ SI, ret+40(FP) 218 | RET 219 | 220 | cmp1: 221 | // In src's tail, compare 1 byte at a time. 222 | CMPQ SI, R14 223 | JAE extendMatchEnd 224 | MOVB (R15), AX 225 | MOVB (SI), BX 226 | CMPB AX, BX 227 | JNE extendMatchEnd 228 | ADDQ $1, R15 229 | ADDQ $1, SI 230 | JMP cmp1 231 | 232 | extendMatchEnd: 233 | // Convert from &src[ret] to ret. 234 | SUBQ DX, SI 235 | MOVQ SI, ret+40(FP) 236 | RET 237 | 238 | // ---------------------------------------------------------------------------- 239 | 240 | // func encodeBlock(dst, src []byte) (d int) 241 | // 242 | // All local variables fit into registers, other than "var table". The register 243 | // allocation: 244 | // - AX . . 245 | // - BX . . 246 | // - CX 56 shift (note that amd64 shifts by non-immediates must use CX). 247 | // - DX 64 &src[0], tableSize 248 | // - SI 72 &src[s] 249 | // - DI 80 &dst[d] 250 | // - R9 88 sLimit 251 | // - R10 . &src[nextEmit] 252 | // - R11 96 prevHash, currHash, nextHash, offset 253 | // - R12 104 &src[base], skip 254 | // - R13 . &src[nextS], &src[len(src) - 8] 255 | // - R14 . len(src), bytesBetweenHashLookups, &src[len(src)], x 256 | // - R15 112 candidate 257 | // 258 | // The second column (56, 64, etc) is the stack offset to spill the registers 259 | // when calling other functions. We could pack this slightly tighter, but it's 260 | // simpler to have a dedicated spill map independent of the function called. 261 | // 262 | // "var table [maxTableSize]uint16" takes up 32768 bytes of stack space. An 263 | // extra 56 bytes, to call other functions, and an extra 64 bytes, to spill 264 | // local variables (registers) during calls gives 32768 + 56 + 64 = 32888. 265 | TEXT ·encodeBlock(SB), 0, $32888-56 266 | MOVQ dst_base+0(FP), DI 267 | MOVQ src_base+24(FP), SI 268 | MOVQ src_len+32(FP), R14 269 | 270 | // shift, tableSize := uint32(32-8), 1<<8 271 | MOVQ $24, CX 272 | MOVQ $256, DX 273 | 274 | calcShift: 275 | // for ; tableSize < maxTableSize && tableSize < len(src); tableSize *= 2 { 276 | // shift-- 277 | // } 278 | CMPQ DX, $16384 279 | JGE varTable 280 | CMPQ DX, R14 281 | JGE varTable 282 | SUBQ $1, CX 283 | SHLQ $1, DX 284 | JMP calcShift 285 | 286 | varTable: 287 | // var table [maxTableSize]uint16 288 | // 289 | // In the asm code, unlike the Go code, we can zero-initialize only the 290 | // first tableSize elements. Each uint16 element is 2 bytes and each MOVOU 291 | // writes 16 bytes, so we can do only tableSize/8 writes instead of the 292 | // 2048 writes that would zero-initialize all of table's 32768 bytes. 293 | SHRQ $3, DX 294 | LEAQ table-32768(SP), BX 295 | PXOR X0, X0 296 | 297 | memclr: 298 | MOVOU X0, 0(BX) 299 | ADDQ $16, BX 300 | SUBQ $1, DX 301 | JNZ memclr 302 | 303 | // !!! DX = &src[0] 304 | MOVQ SI, DX 305 | 306 | // sLimit := len(src) - inputMargin 307 | MOVQ R14, R9 308 | SUBQ $15, R9 309 | 310 | // !!! Pre-emptively spill CX, DX and R9 to the stack. Their values don't 311 | // change for the rest of the function. 312 | MOVQ CX, 56(SP) 313 | MOVQ DX, 64(SP) 314 | MOVQ R9, 88(SP) 315 | 316 | // nextEmit := 0 317 | MOVQ DX, R10 318 | 319 | // s := 1 320 | ADDQ $1, SI 321 | 322 | // nextHash := hash(load32(src, s), shift) 323 | MOVL 0(SI), R11 324 | IMULL $0x1e35a7bd, R11 325 | SHRL CX, R11 326 | 327 | outer: 328 | // for { etc } 329 | 330 | // skip := 32 331 | MOVQ $32, R12 332 | 333 | // nextS := s 334 | MOVQ SI, R13 335 | 336 | // candidate := 0 337 | MOVQ $0, R15 338 | 339 | inner0: 340 | // for { etc } 341 | 342 | // s := nextS 343 | MOVQ R13, SI 344 | 345 | // bytesBetweenHashLookups := skip >> 5 346 | MOVQ R12, R14 347 | SHRQ $5, R14 348 | 349 | // nextS = s + bytesBetweenHashLookups 350 | ADDQ R14, R13 351 | 352 | // skip += bytesBetweenHashLookups 353 | ADDQ R14, R12 354 | 355 | // if nextS > sLimit { goto emitRemainder } 356 | MOVQ R13, AX 357 | SUBQ DX, AX 358 | CMPQ AX, R9 359 | JA emitRemainder 360 | 361 | // candidate = int(table[nextHash]) 362 | // XXX: MOVWQZX table-32768(SP)(R11*2), R15 363 | // XXX: 4e 0f b7 7c 5c 78 movzwq 0x78(%rsp,%r11,2),%r15 364 | BYTE $0x4e 365 | BYTE $0x0f 366 | BYTE $0xb7 367 | BYTE $0x7c 368 | BYTE $0x5c 369 | BYTE $0x78 370 | 371 | // table[nextHash] = uint16(s) 372 | MOVQ SI, AX 373 | SUBQ DX, AX 374 | 375 | // XXX: MOVW AX, table-32768(SP)(R11*2) 376 | // XXX: 66 42 89 44 5c 78 mov %ax,0x78(%rsp,%r11,2) 377 | BYTE $0x66 378 | BYTE $0x42 379 | BYTE $0x89 380 | BYTE $0x44 381 | BYTE $0x5c 382 | BYTE $0x78 383 | 384 | // nextHash = hash(load32(src, nextS), shift) 385 | MOVL 0(R13), R11 386 | IMULL $0x1e35a7bd, R11 387 | SHRL CX, R11 388 | 389 | // if load32(src, s) != load32(src, candidate) { continue } break 390 | MOVL 0(SI), AX 391 | MOVL (DX)(R15*1), BX 392 | CMPL AX, BX 393 | JNE inner0 394 | 395 | fourByteMatch: 396 | // As per the encode_other.go code: 397 | // 398 | // A 4-byte match has been found. We'll later see etc. 399 | 400 | // !!! Jump to a fast path for short (<= 16 byte) literals. See the comment 401 | // on inputMargin in encode.go. 402 | MOVQ SI, AX 403 | SUBQ R10, AX 404 | CMPQ AX, $16 405 | JLE emitLiteralFastPath 406 | 407 | // ---------------------------------------- 408 | // Begin inline of the emitLiteral call. 409 | // 410 | // d += emitLiteral(dst[d:], src[nextEmit:s]) 411 | 412 | MOVL AX, BX 413 | SUBL $1, BX 414 | 415 | CMPL BX, $60 416 | JLT inlineEmitLiteralOneByte 417 | CMPL BX, $256 418 | JLT inlineEmitLiteralTwoBytes 419 | 420 | inlineEmitLiteralThreeBytes: 421 | MOVB $0xf4, 0(DI) 422 | MOVW BX, 1(DI) 423 | ADDQ $3, DI 424 | JMP inlineEmitLiteralMemmove 425 | 426 | inlineEmitLiteralTwoBytes: 427 | MOVB $0xf0, 0(DI) 428 | MOVB BX, 1(DI) 429 | ADDQ $2, DI 430 | JMP inlineEmitLiteralMemmove 431 | 432 | inlineEmitLiteralOneByte: 433 | SHLB $2, BX 434 | MOVB BX, 0(DI) 435 | ADDQ $1, DI 436 | 437 | inlineEmitLiteralMemmove: 438 | // Spill local variables (registers) onto the stack; call; unspill. 439 | // 440 | // copy(dst[i:], lit) 441 | // 442 | // This means calling runtime·memmove(&dst[i], &lit[0], len(lit)), so we push 443 | // DI, R10 and AX as arguments. 444 | MOVQ DI, 0(SP) 445 | MOVQ R10, 8(SP) 446 | MOVQ AX, 16(SP) 447 | ADDQ AX, DI // Finish the "d +=" part of "d += emitLiteral(etc)". 448 | MOVQ SI, 72(SP) 449 | MOVQ DI, 80(SP) 450 | MOVQ R15, 112(SP) 451 | CALL runtime·memmove(SB) 452 | MOVQ 56(SP), CX 453 | MOVQ 64(SP), DX 454 | MOVQ 72(SP), SI 455 | MOVQ 80(SP), DI 456 | MOVQ 88(SP), R9 457 | MOVQ 112(SP), R15 458 | JMP inner1 459 | 460 | inlineEmitLiteralEnd: 461 | // End inline of the emitLiteral call. 462 | // ---------------------------------------- 463 | 464 | emitLiteralFastPath: 465 | // !!! Emit the 1-byte encoding "uint8(len(lit)-1)<<2". 466 | MOVB AX, BX 467 | SUBB $1, BX 468 | SHLB $2, BX 469 | MOVB BX, (DI) 470 | ADDQ $1, DI 471 | 472 | // !!! Implement the copy from lit to dst as a 16-byte load and store. 473 | // (Encode's documentation says that dst and src must not overlap.) 474 | // 475 | // This always copies 16 bytes, instead of only len(lit) bytes, but that's 476 | // OK. Subsequent iterations will fix up the overrun. 477 | // 478 | // Note that on amd64, it is legal and cheap to issue unaligned 8-byte or 479 | // 16-byte loads and stores. This technique probably wouldn't be as 480 | // effective on architectures that are fussier about alignment. 481 | MOVOU 0(R10), X0 482 | MOVOU X0, 0(DI) 483 | ADDQ AX, DI 484 | 485 | inner1: 486 | // for { etc } 487 | 488 | // base := s 489 | MOVQ SI, R12 490 | 491 | // !!! offset := base - candidate 492 | MOVQ R12, R11 493 | SUBQ R15, R11 494 | SUBQ DX, R11 495 | 496 | // ---------------------------------------- 497 | // Begin inline of the extendMatch call. 498 | // 499 | // s = extendMatch(src, candidate+4, s+4) 500 | 501 | // !!! R14 = &src[len(src)] 502 | MOVQ src_len+32(FP), R14 503 | ADDQ DX, R14 504 | 505 | // !!! R13 = &src[len(src) - 8] 506 | MOVQ R14, R13 507 | SUBQ $8, R13 508 | 509 | // !!! R15 = &src[candidate + 4] 510 | ADDQ $4, R15 511 | ADDQ DX, R15 512 | 513 | // !!! s += 4 514 | ADDQ $4, SI 515 | 516 | inlineExtendMatchCmp8: 517 | // As long as we are 8 or more bytes before the end of src, we can load and 518 | // compare 8 bytes at a time. If those 8 bytes are equal, repeat. 519 | CMPQ SI, R13 520 | JA inlineExtendMatchCmp1 521 | MOVQ (R15), AX 522 | MOVQ (SI), BX 523 | CMPQ AX, BX 524 | JNE inlineExtendMatchBSF 525 | ADDQ $8, R15 526 | ADDQ $8, SI 527 | JMP inlineExtendMatchCmp8 528 | 529 | inlineExtendMatchBSF: 530 | // If those 8 bytes were not equal, XOR the two 8 byte values, and return 531 | // the index of the first byte that differs. The BSF instruction finds the 532 | // least significant 1 bit, the amd64 architecture is little-endian, and 533 | // the shift by 3 converts a bit index to a byte index. 534 | XORQ AX, BX 535 | BSFQ BX, BX 536 | SHRQ $3, BX 537 | ADDQ BX, SI 538 | JMP inlineExtendMatchEnd 539 | 540 | inlineExtendMatchCmp1: 541 | // In src's tail, compare 1 byte at a time. 542 | CMPQ SI, R14 543 | JAE inlineExtendMatchEnd 544 | MOVB (R15), AX 545 | MOVB (SI), BX 546 | CMPB AX, BX 547 | JNE inlineExtendMatchEnd 548 | ADDQ $1, R15 549 | ADDQ $1, SI 550 | JMP inlineExtendMatchCmp1 551 | 552 | inlineExtendMatchEnd: 553 | // End inline of the extendMatch call. 554 | // ---------------------------------------- 555 | 556 | // ---------------------------------------- 557 | // Begin inline of the emitCopy call. 558 | // 559 | // d += emitCopy(dst[d:], base-candidate, s-base) 560 | 561 | // !!! length := s - base 562 | MOVQ SI, AX 563 | SUBQ R12, AX 564 | 565 | inlineEmitCopyLoop0: 566 | // for length >= 68 { etc } 567 | CMPL AX, $68 568 | JLT inlineEmitCopyStep1 569 | 570 | // Emit a length 64 copy, encoded as 3 bytes. 571 | MOVB $0xfe, 0(DI) 572 | MOVW R11, 1(DI) 573 | ADDQ $3, DI 574 | SUBL $64, AX 575 | JMP inlineEmitCopyLoop0 576 | 577 | inlineEmitCopyStep1: 578 | // if length > 64 { etc } 579 | CMPL AX, $64 580 | JLE inlineEmitCopyStep2 581 | 582 | // Emit a length 60 copy, encoded as 3 bytes. 583 | MOVB $0xee, 0(DI) 584 | MOVW R11, 1(DI) 585 | ADDQ $3, DI 586 | SUBL $60, AX 587 | 588 | inlineEmitCopyStep2: 589 | // if length >= 12 || offset >= 2048 { goto inlineEmitCopyStep3 } 590 | CMPL AX, $12 591 | JGE inlineEmitCopyStep3 592 | CMPL R11, $2048 593 | JGE inlineEmitCopyStep3 594 | 595 | // Emit the remaining copy, encoded as 2 bytes. 596 | MOVB R11, 1(DI) 597 | SHRL $8, R11 598 | SHLB $5, R11 599 | SUBB $4, AX 600 | SHLB $2, AX 601 | ORB AX, R11 602 | ORB $1, R11 603 | MOVB R11, 0(DI) 604 | ADDQ $2, DI 605 | JMP inlineEmitCopyEnd 606 | 607 | inlineEmitCopyStep3: 608 | // Emit the remaining copy, encoded as 3 bytes. 609 | SUBL $1, AX 610 | SHLB $2, AX 611 | ORB $2, AX 612 | MOVB AX, 0(DI) 613 | MOVW R11, 1(DI) 614 | ADDQ $3, DI 615 | 616 | inlineEmitCopyEnd: 617 | // End inline of the emitCopy call. 618 | // ---------------------------------------- 619 | 620 | // nextEmit = s 621 | MOVQ SI, R10 622 | 623 | // if s >= sLimit { goto emitRemainder } 624 | MOVQ SI, AX 625 | SUBQ DX, AX 626 | CMPQ AX, R9 627 | JAE emitRemainder 628 | 629 | // As per the encode_other.go code: 630 | // 631 | // We could immediately etc. 632 | 633 | // x := load64(src, s-1) 634 | MOVQ -1(SI), R14 635 | 636 | // prevHash := hash(uint32(x>>0), shift) 637 | MOVL R14, R11 638 | IMULL $0x1e35a7bd, R11 639 | SHRL CX, R11 640 | 641 | // table[prevHash] = uint16(s-1) 642 | MOVQ SI, AX 643 | SUBQ DX, AX 644 | SUBQ $1, AX 645 | 646 | // XXX: MOVW AX, table-32768(SP)(R11*2) 647 | // XXX: 66 42 89 44 5c 78 mov %ax,0x78(%rsp,%r11,2) 648 | BYTE $0x66 649 | BYTE $0x42 650 | BYTE $0x89 651 | BYTE $0x44 652 | BYTE $0x5c 653 | BYTE $0x78 654 | 655 | // currHash := hash(uint32(x>>8), shift) 656 | SHRQ $8, R14 657 | MOVL R14, R11 658 | IMULL $0x1e35a7bd, R11 659 | SHRL CX, R11 660 | 661 | // candidate = int(table[currHash]) 662 | // XXX: MOVWQZX table-32768(SP)(R11*2), R15 663 | // XXX: 4e 0f b7 7c 5c 78 movzwq 0x78(%rsp,%r11,2),%r15 664 | BYTE $0x4e 665 | BYTE $0x0f 666 | BYTE $0xb7 667 | BYTE $0x7c 668 | BYTE $0x5c 669 | BYTE $0x78 670 | 671 | // table[currHash] = uint16(s) 672 | ADDQ $1, AX 673 | 674 | // XXX: MOVW AX, table-32768(SP)(R11*2) 675 | // XXX: 66 42 89 44 5c 78 mov %ax,0x78(%rsp,%r11,2) 676 | BYTE $0x66 677 | BYTE $0x42 678 | BYTE $0x89 679 | BYTE $0x44 680 | BYTE $0x5c 681 | BYTE $0x78 682 | 683 | // if uint32(x>>8) == load32(src, candidate) { continue } 684 | MOVL (DX)(R15*1), BX 685 | CMPL R14, BX 686 | JEQ inner1 687 | 688 | // nextHash = hash(uint32(x>>16), shift) 689 | SHRQ $8, R14 690 | MOVL R14, R11 691 | IMULL $0x1e35a7bd, R11 692 | SHRL CX, R11 693 | 694 | // s++ 695 | ADDQ $1, SI 696 | 697 | // break out of the inner1 for loop, i.e. continue the outer loop. 698 | JMP outer 699 | 700 | emitRemainder: 701 | // if nextEmit < len(src) { etc } 702 | MOVQ src_len+32(FP), AX 703 | ADDQ DX, AX 704 | CMPQ R10, AX 705 | JEQ encodeBlockEnd 706 | 707 | // d += emitLiteral(dst[d:], src[nextEmit:]) 708 | // 709 | // Push args. 710 | MOVQ DI, 0(SP) 711 | MOVQ $0, 8(SP) // Unnecessary, as the callee ignores it, but conservative. 712 | MOVQ $0, 16(SP) // Unnecessary, as the callee ignores it, but conservative. 713 | MOVQ R10, 24(SP) 714 | SUBQ R10, AX 715 | MOVQ AX, 32(SP) 716 | MOVQ AX, 40(SP) // Unnecessary, as the callee ignores it, but conservative. 717 | 718 | // Spill local variables (registers) onto the stack; call; unspill. 719 | MOVQ DI, 80(SP) 720 | CALL ·emitLiteral(SB) 721 | MOVQ 80(SP), DI 722 | 723 | // Finish the "d +=" part of "d += emitLiteral(etc)". 724 | ADDQ 48(SP), DI 725 | 726 | encodeBlockEnd: 727 | MOVQ dst_base+0(FP), AX 728 | SUBQ AX, DI 729 | MOVQ DI, d+48(FP) 730 | RET 731 | -------------------------------------------------------------------------------- /encode_arm64.s: -------------------------------------------------------------------------------- 1 | // Copyright 2020 The Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // +build !appengine 6 | // +build gc 7 | // +build !noasm 8 | 9 | #include "textflag.h" 10 | 11 | // The asm code generally follows the pure Go code in encode_other.go, except 12 | // where marked with a "!!!". 13 | 14 | // ---------------------------------------------------------------------------- 15 | 16 | // func emitLiteral(dst, lit []byte) int 17 | // 18 | // All local variables fit into registers. The register allocation: 19 | // - R3 len(lit) 20 | // - R4 n 21 | // - R6 return value 22 | // - R8 &dst[i] 23 | // - R10 &lit[0] 24 | // 25 | // The 32 bytes of stack space is to call runtime·memmove. 26 | // 27 | // The unusual register allocation of local variables, such as R10 for the 28 | // source pointer, matches the allocation used at the call site in encodeBlock, 29 | // which makes it easier to manually inline this function. 30 | TEXT ·emitLiteral(SB), NOSPLIT, $40-56 31 | MOVD dst_base+0(FP), R8 32 | MOVD lit_base+24(FP), R10 33 | MOVD lit_len+32(FP), R3 34 | MOVD R3, R6 35 | MOVW R3, R4 36 | SUBW $1, R4, R4 37 | 38 | CMPW $60, R4 39 | BLT oneByte 40 | CMPW $256, R4 41 | BLT twoBytes 42 | 43 | threeBytes: 44 | MOVD $0xf4, R2 45 | MOVB R2, 0(R8) 46 | MOVW R4, 1(R8) 47 | ADD $3, R8, R8 48 | ADD $3, R6, R6 49 | B memmove 50 | 51 | twoBytes: 52 | MOVD $0xf0, R2 53 | MOVB R2, 0(R8) 54 | MOVB R4, 1(R8) 55 | ADD $2, R8, R8 56 | ADD $2, R6, R6 57 | B memmove 58 | 59 | oneByte: 60 | LSLW $2, R4, R4 61 | MOVB R4, 0(R8) 62 | ADD $1, R8, R8 63 | ADD $1, R6, R6 64 | 65 | memmove: 66 | MOVD R6, ret+48(FP) 67 | 68 | // copy(dst[i:], lit) 69 | // 70 | // This means calling runtime·memmove(&dst[i], &lit[0], len(lit)), so we push 71 | // R8, R10 and R3 as arguments. 72 | MOVD R8, 8(RSP) 73 | MOVD R10, 16(RSP) 74 | MOVD R3, 24(RSP) 75 | CALL runtime·memmove(SB) 76 | RET 77 | 78 | // ---------------------------------------------------------------------------- 79 | 80 | // func emitCopy(dst []byte, offset, length int) int 81 | // 82 | // All local variables fit into registers. The register allocation: 83 | // - R3 length 84 | // - R7 &dst[0] 85 | // - R8 &dst[i] 86 | // - R11 offset 87 | // 88 | // The unusual register allocation of local variables, such as R11 for the 89 | // offset, matches the allocation used at the call site in encodeBlock, which 90 | // makes it easier to manually inline this function. 91 | TEXT ·emitCopy(SB), NOSPLIT, $0-48 92 | MOVD dst_base+0(FP), R8 93 | MOVD R8, R7 94 | MOVD offset+24(FP), R11 95 | MOVD length+32(FP), R3 96 | 97 | loop0: 98 | // for length >= 68 { etc } 99 | CMPW $68, R3 100 | BLT step1 101 | 102 | // Emit a length 64 copy, encoded as 3 bytes. 103 | MOVD $0xfe, R2 104 | MOVB R2, 0(R8) 105 | MOVW R11, 1(R8) 106 | ADD $3, R8, R8 107 | SUB $64, R3, R3 108 | B loop0 109 | 110 | step1: 111 | // if length > 64 { etc } 112 | CMP $64, R3 113 | BLE step2 114 | 115 | // Emit a length 60 copy, encoded as 3 bytes. 116 | MOVD $0xee, R2 117 | MOVB R2, 0(R8) 118 | MOVW R11, 1(R8) 119 | ADD $3, R8, R8 120 | SUB $60, R3, R3 121 | 122 | step2: 123 | // if length >= 12 || offset >= 2048 { goto step3 } 124 | CMP $12, R3 125 | BGE step3 126 | CMPW $2048, R11 127 | BGE step3 128 | 129 | // Emit the remaining copy, encoded as 2 bytes. 130 | MOVB R11, 1(R8) 131 | LSRW $3, R11, R11 132 | AND $0xe0, R11, R11 133 | SUB $4, R3, R3 134 | LSLW $2, R3 135 | AND $0xff, R3, R3 136 | ORRW R3, R11, R11 137 | ORRW $1, R11, R11 138 | MOVB R11, 0(R8) 139 | ADD $2, R8, R8 140 | 141 | // Return the number of bytes written. 142 | SUB R7, R8, R8 143 | MOVD R8, ret+40(FP) 144 | RET 145 | 146 | step3: 147 | // Emit the remaining copy, encoded as 3 bytes. 148 | SUB $1, R3, R3 149 | AND $0xff, R3, R3 150 | LSLW $2, R3, R3 151 | ORRW $2, R3, R3 152 | MOVB R3, 0(R8) 153 | MOVW R11, 1(R8) 154 | ADD $3, R8, R8 155 | 156 | // Return the number of bytes written. 157 | SUB R7, R8, R8 158 | MOVD R8, ret+40(FP) 159 | RET 160 | 161 | // ---------------------------------------------------------------------------- 162 | 163 | // func extendMatch(src []byte, i, j int) int 164 | // 165 | // All local variables fit into registers. The register allocation: 166 | // - R6 &src[0] 167 | // - R7 &src[j] 168 | // - R13 &src[len(src) - 8] 169 | // - R14 &src[len(src)] 170 | // - R15 &src[i] 171 | // 172 | // The unusual register allocation of local variables, such as R15 for a source 173 | // pointer, matches the allocation used at the call site in encodeBlock, which 174 | // makes it easier to manually inline this function. 175 | TEXT ·extendMatch(SB), NOSPLIT, $0-48 176 | MOVD src_base+0(FP), R6 177 | MOVD src_len+8(FP), R14 178 | MOVD i+24(FP), R15 179 | MOVD j+32(FP), R7 180 | ADD R6, R14, R14 181 | ADD R6, R15, R15 182 | ADD R6, R7, R7 183 | MOVD R14, R13 184 | SUB $8, R13, R13 185 | 186 | cmp8: 187 | // As long as we are 8 or more bytes before the end of src, we can load and 188 | // compare 8 bytes at a time. If those 8 bytes are equal, repeat. 189 | CMP R13, R7 190 | BHI cmp1 191 | MOVD (R15), R3 192 | MOVD (R7), R4 193 | CMP R4, R3 194 | BNE bsf 195 | ADD $8, R15, R15 196 | ADD $8, R7, R7 197 | B cmp8 198 | 199 | bsf: 200 | // If those 8 bytes were not equal, XOR the two 8 byte values, and return 201 | // the index of the first byte that differs. 202 | // RBIT reverses the bit order, then CLZ counts the leading zeros, the 203 | // combination of which finds the least significant bit which is set. 204 | // The arm64 architecture is little-endian, and the shift by 3 converts 205 | // a bit index to a byte index. 206 | EOR R3, R4, R4 207 | RBIT R4, R4 208 | CLZ R4, R4 209 | ADD R4>>3, R7, R7 210 | 211 | // Convert from &src[ret] to ret. 212 | SUB R6, R7, R7 213 | MOVD R7, ret+40(FP) 214 | RET 215 | 216 | cmp1: 217 | // In src's tail, compare 1 byte at a time. 218 | CMP R7, R14 219 | BLS extendMatchEnd 220 | MOVB (R15), R3 221 | MOVB (R7), R4 222 | CMP R4, R3 223 | BNE extendMatchEnd 224 | ADD $1, R15, R15 225 | ADD $1, R7, R7 226 | B cmp1 227 | 228 | extendMatchEnd: 229 | // Convert from &src[ret] to ret. 230 | SUB R6, R7, R7 231 | MOVD R7, ret+40(FP) 232 | RET 233 | 234 | // ---------------------------------------------------------------------------- 235 | 236 | // func encodeBlock(dst, src []byte) (d int) 237 | // 238 | // All local variables fit into registers, other than "var table". The register 239 | // allocation: 240 | // - R3 . . 241 | // - R4 . . 242 | // - R5 64 shift 243 | // - R6 72 &src[0], tableSize 244 | // - R7 80 &src[s] 245 | // - R8 88 &dst[d] 246 | // - R9 96 sLimit 247 | // - R10 . &src[nextEmit] 248 | // - R11 104 prevHash, currHash, nextHash, offset 249 | // - R12 112 &src[base], skip 250 | // - R13 . &src[nextS], &src[len(src) - 8] 251 | // - R14 . len(src), bytesBetweenHashLookups, &src[len(src)], x 252 | // - R15 120 candidate 253 | // - R16 . hash constant, 0x1e35a7bd 254 | // - R17 . &table 255 | // - . 128 table 256 | // 257 | // The second column (64, 72, etc) is the stack offset to spill the registers 258 | // when calling other functions. We could pack this slightly tighter, but it's 259 | // simpler to have a dedicated spill map independent of the function called. 260 | // 261 | // "var table [maxTableSize]uint16" takes up 32768 bytes of stack space. An 262 | // extra 64 bytes, to call other functions, and an extra 64 bytes, to spill 263 | // local variables (registers) during calls gives 32768 + 64 + 64 = 32896. 264 | TEXT ·encodeBlock(SB), 0, $32904-56 265 | MOVD dst_base+0(FP), R8 266 | MOVD src_base+24(FP), R7 267 | MOVD src_len+32(FP), R14 268 | 269 | // shift, tableSize := uint32(32-8), 1<<8 270 | MOVD $24, R5 271 | MOVD $256, R6 272 | MOVW $0xa7bd, R16 273 | MOVKW $(0x1e35<<16), R16 274 | 275 | calcShift: 276 | // for ; tableSize < maxTableSize && tableSize < len(src); tableSize *= 2 { 277 | // shift-- 278 | // } 279 | MOVD $16384, R2 280 | CMP R2, R6 281 | BGE varTable 282 | CMP R14, R6 283 | BGE varTable 284 | SUB $1, R5, R5 285 | LSL $1, R6, R6 286 | B calcShift 287 | 288 | varTable: 289 | // var table [maxTableSize]uint16 290 | // 291 | // In the asm code, unlike the Go code, we can zero-initialize only the 292 | // first tableSize elements. Each uint16 element is 2 bytes and each 293 | // iterations writes 64 bytes, so we can do only tableSize/32 writes 294 | // instead of the 2048 writes that would zero-initialize all of table's 295 | // 32768 bytes. This clear could overrun the first tableSize elements, but 296 | // it won't overrun the allocated stack size. 297 | ADD $128, RSP, R17 298 | MOVD R17, R4 299 | 300 | // !!! R6 = &src[tableSize] 301 | ADD R6<<1, R17, R6 302 | 303 | memclr: 304 | STP.P (ZR, ZR), 64(R4) 305 | STP (ZR, ZR), -48(R4) 306 | STP (ZR, ZR), -32(R4) 307 | STP (ZR, ZR), -16(R4) 308 | CMP R4, R6 309 | BHI memclr 310 | 311 | // !!! R6 = &src[0] 312 | MOVD R7, R6 313 | 314 | // sLimit := len(src) - inputMargin 315 | MOVD R14, R9 316 | SUB $15, R9, R9 317 | 318 | // !!! Pre-emptively spill R5, R6 and R9 to the stack. Their values don't 319 | // change for the rest of the function. 320 | MOVD R5, 64(RSP) 321 | MOVD R6, 72(RSP) 322 | MOVD R9, 96(RSP) 323 | 324 | // nextEmit := 0 325 | MOVD R6, R10 326 | 327 | // s := 1 328 | ADD $1, R7, R7 329 | 330 | // nextHash := hash(load32(src, s), shift) 331 | MOVW 0(R7), R11 332 | MULW R16, R11, R11 333 | LSRW R5, R11, R11 334 | 335 | outer: 336 | // for { etc } 337 | 338 | // skip := 32 339 | MOVD $32, R12 340 | 341 | // nextS := s 342 | MOVD R7, R13 343 | 344 | // candidate := 0 345 | MOVD $0, R15 346 | 347 | inner0: 348 | // for { etc } 349 | 350 | // s := nextS 351 | MOVD R13, R7 352 | 353 | // bytesBetweenHashLookups := skip >> 5 354 | MOVD R12, R14 355 | LSR $5, R14, R14 356 | 357 | // nextS = s + bytesBetweenHashLookups 358 | ADD R14, R13, R13 359 | 360 | // skip += bytesBetweenHashLookups 361 | ADD R14, R12, R12 362 | 363 | // if nextS > sLimit { goto emitRemainder } 364 | MOVD R13, R3 365 | SUB R6, R3, R3 366 | CMP R9, R3 367 | BHI emitRemainder 368 | 369 | // candidate = int(table[nextHash]) 370 | MOVHU 0(R17)(R11<<1), R15 371 | 372 | // table[nextHash] = uint16(s) 373 | MOVD R7, R3 374 | SUB R6, R3, R3 375 | 376 | MOVH R3, 0(R17)(R11<<1) 377 | 378 | // nextHash = hash(load32(src, nextS), shift) 379 | MOVW 0(R13), R11 380 | MULW R16, R11 381 | LSRW R5, R11, R11 382 | 383 | // if load32(src, s) != load32(src, candidate) { continue } break 384 | MOVW 0(R7), R3 385 | MOVW (R6)(R15), R4 386 | CMPW R4, R3 387 | BNE inner0 388 | 389 | fourByteMatch: 390 | // As per the encode_other.go code: 391 | // 392 | // A 4-byte match has been found. We'll later see etc. 393 | 394 | // !!! Jump to a fast path for short (<= 16 byte) literals. See the comment 395 | // on inputMargin in encode.go. 396 | MOVD R7, R3 397 | SUB R10, R3, R3 398 | CMP $16, R3 399 | BLE emitLiteralFastPath 400 | 401 | // ---------------------------------------- 402 | // Begin inline of the emitLiteral call. 403 | // 404 | // d += emitLiteral(dst[d:], src[nextEmit:s]) 405 | 406 | MOVW R3, R4 407 | SUBW $1, R4, R4 408 | 409 | MOVW $60, R2 410 | CMPW R2, R4 411 | BLT inlineEmitLiteralOneByte 412 | MOVW $256, R2 413 | CMPW R2, R4 414 | BLT inlineEmitLiteralTwoBytes 415 | 416 | inlineEmitLiteralThreeBytes: 417 | MOVD $0xf4, R1 418 | MOVB R1, 0(R8) 419 | MOVW R4, 1(R8) 420 | ADD $3, R8, R8 421 | B inlineEmitLiteralMemmove 422 | 423 | inlineEmitLiteralTwoBytes: 424 | MOVD $0xf0, R1 425 | MOVB R1, 0(R8) 426 | MOVB R4, 1(R8) 427 | ADD $2, R8, R8 428 | B inlineEmitLiteralMemmove 429 | 430 | inlineEmitLiteralOneByte: 431 | LSLW $2, R4, R4 432 | MOVB R4, 0(R8) 433 | ADD $1, R8, R8 434 | 435 | inlineEmitLiteralMemmove: 436 | // Spill local variables (registers) onto the stack; call; unspill. 437 | // 438 | // copy(dst[i:], lit) 439 | // 440 | // This means calling runtime·memmove(&dst[i], &lit[0], len(lit)), so we push 441 | // R8, R10 and R3 as arguments. 442 | MOVD R8, 8(RSP) 443 | MOVD R10, 16(RSP) 444 | MOVD R3, 24(RSP) 445 | 446 | // Finish the "d +=" part of "d += emitLiteral(etc)". 447 | ADD R3, R8, R8 448 | MOVD R7, 80(RSP) 449 | MOVD R8, 88(RSP) 450 | MOVD R15, 120(RSP) 451 | CALL runtime·memmove(SB) 452 | MOVD 64(RSP), R5 453 | MOVD 72(RSP), R6 454 | MOVD 80(RSP), R7 455 | MOVD 88(RSP), R8 456 | MOVD 96(RSP), R9 457 | MOVD 120(RSP), R15 458 | ADD $128, RSP, R17 459 | MOVW $0xa7bd, R16 460 | MOVKW $(0x1e35<<16), R16 461 | B inner1 462 | 463 | inlineEmitLiteralEnd: 464 | // End inline of the emitLiteral call. 465 | // ---------------------------------------- 466 | 467 | emitLiteralFastPath: 468 | // !!! Emit the 1-byte encoding "uint8(len(lit)-1)<<2". 469 | MOVB R3, R4 470 | SUBW $1, R4, R4 471 | AND $0xff, R4, R4 472 | LSLW $2, R4, R4 473 | MOVB R4, (R8) 474 | ADD $1, R8, R8 475 | 476 | // !!! Implement the copy from lit to dst as a 16-byte load and store. 477 | // (Encode's documentation says that dst and src must not overlap.) 478 | // 479 | // This always copies 16 bytes, instead of only len(lit) bytes, but that's 480 | // OK. Subsequent iterations will fix up the overrun. 481 | // 482 | // Note that on arm64, it is legal and cheap to issue unaligned 8-byte or 483 | // 16-byte loads and stores. This technique probably wouldn't be as 484 | // effective on architectures that are fussier about alignment. 485 | LDP 0(R10), (R0, R1) 486 | STP (R0, R1), 0(R8) 487 | ADD R3, R8, R8 488 | 489 | inner1: 490 | // for { etc } 491 | 492 | // base := s 493 | MOVD R7, R12 494 | 495 | // !!! offset := base - candidate 496 | MOVD R12, R11 497 | SUB R15, R11, R11 498 | SUB R6, R11, R11 499 | 500 | // ---------------------------------------- 501 | // Begin inline of the extendMatch call. 502 | // 503 | // s = extendMatch(src, candidate+4, s+4) 504 | 505 | // !!! R14 = &src[len(src)] 506 | MOVD src_len+32(FP), R14 507 | ADD R6, R14, R14 508 | 509 | // !!! R13 = &src[len(src) - 8] 510 | MOVD R14, R13 511 | SUB $8, R13, R13 512 | 513 | // !!! R15 = &src[candidate + 4] 514 | ADD $4, R15, R15 515 | ADD R6, R15, R15 516 | 517 | // !!! s += 4 518 | ADD $4, R7, R7 519 | 520 | inlineExtendMatchCmp8: 521 | // As long as we are 8 or more bytes before the end of src, we can load and 522 | // compare 8 bytes at a time. If those 8 bytes are equal, repeat. 523 | CMP R13, R7 524 | BHI inlineExtendMatchCmp1 525 | MOVD (R15), R3 526 | MOVD (R7), R4 527 | CMP R4, R3 528 | BNE inlineExtendMatchBSF 529 | ADD $8, R15, R15 530 | ADD $8, R7, R7 531 | B inlineExtendMatchCmp8 532 | 533 | inlineExtendMatchBSF: 534 | // If those 8 bytes were not equal, XOR the two 8 byte values, and return 535 | // the index of the first byte that differs. 536 | // RBIT reverses the bit order, then CLZ counts the leading zeros, the 537 | // combination of which finds the least significant bit which is set. 538 | // The arm64 architecture is little-endian, and the shift by 3 converts 539 | // a bit index to a byte index. 540 | EOR R3, R4, R4 541 | RBIT R4, R4 542 | CLZ R4, R4 543 | ADD R4>>3, R7, R7 544 | B inlineExtendMatchEnd 545 | 546 | inlineExtendMatchCmp1: 547 | // In src's tail, compare 1 byte at a time. 548 | CMP R7, R14 549 | BLS inlineExtendMatchEnd 550 | MOVB (R15), R3 551 | MOVB (R7), R4 552 | CMP R4, R3 553 | BNE inlineExtendMatchEnd 554 | ADD $1, R15, R15 555 | ADD $1, R7, R7 556 | B inlineExtendMatchCmp1 557 | 558 | inlineExtendMatchEnd: 559 | // End inline of the extendMatch call. 560 | // ---------------------------------------- 561 | 562 | // ---------------------------------------- 563 | // Begin inline of the emitCopy call. 564 | // 565 | // d += emitCopy(dst[d:], base-candidate, s-base) 566 | 567 | // !!! length := s - base 568 | MOVD R7, R3 569 | SUB R12, R3, R3 570 | 571 | inlineEmitCopyLoop0: 572 | // for length >= 68 { etc } 573 | MOVW $68, R2 574 | CMPW R2, R3 575 | BLT inlineEmitCopyStep1 576 | 577 | // Emit a length 64 copy, encoded as 3 bytes. 578 | MOVD $0xfe, R1 579 | MOVB R1, 0(R8) 580 | MOVW R11, 1(R8) 581 | ADD $3, R8, R8 582 | SUBW $64, R3, R3 583 | B inlineEmitCopyLoop0 584 | 585 | inlineEmitCopyStep1: 586 | // if length > 64 { etc } 587 | MOVW $64, R2 588 | CMPW R2, R3 589 | BLE inlineEmitCopyStep2 590 | 591 | // Emit a length 60 copy, encoded as 3 bytes. 592 | MOVD $0xee, R1 593 | MOVB R1, 0(R8) 594 | MOVW R11, 1(R8) 595 | ADD $3, R8, R8 596 | SUBW $60, R3, R3 597 | 598 | inlineEmitCopyStep2: 599 | // if length >= 12 || offset >= 2048 { goto inlineEmitCopyStep3 } 600 | MOVW $12, R2 601 | CMPW R2, R3 602 | BGE inlineEmitCopyStep3 603 | MOVW $2048, R2 604 | CMPW R2, R11 605 | BGE inlineEmitCopyStep3 606 | 607 | // Emit the remaining copy, encoded as 2 bytes. 608 | MOVB R11, 1(R8) 609 | LSRW $8, R11, R11 610 | LSLW $5, R11, R11 611 | SUBW $4, R3, R3 612 | AND $0xff, R3, R3 613 | LSLW $2, R3, R3 614 | ORRW R3, R11, R11 615 | ORRW $1, R11, R11 616 | MOVB R11, 0(R8) 617 | ADD $2, R8, R8 618 | B inlineEmitCopyEnd 619 | 620 | inlineEmitCopyStep3: 621 | // Emit the remaining copy, encoded as 3 bytes. 622 | SUBW $1, R3, R3 623 | LSLW $2, R3, R3 624 | ORRW $2, R3, R3 625 | MOVB R3, 0(R8) 626 | MOVW R11, 1(R8) 627 | ADD $3, R8, R8 628 | 629 | inlineEmitCopyEnd: 630 | // End inline of the emitCopy call. 631 | // ---------------------------------------- 632 | 633 | // nextEmit = s 634 | MOVD R7, R10 635 | 636 | // if s >= sLimit { goto emitRemainder } 637 | MOVD R7, R3 638 | SUB R6, R3, R3 639 | CMP R3, R9 640 | BLS emitRemainder 641 | 642 | // As per the encode_other.go code: 643 | // 644 | // We could immediately etc. 645 | 646 | // x := load64(src, s-1) 647 | MOVD -1(R7), R14 648 | 649 | // prevHash := hash(uint32(x>>0), shift) 650 | MOVW R14, R11 651 | MULW R16, R11, R11 652 | LSRW R5, R11, R11 653 | 654 | // table[prevHash] = uint16(s-1) 655 | MOVD R7, R3 656 | SUB R6, R3, R3 657 | SUB $1, R3, R3 658 | 659 | MOVHU R3, 0(R17)(R11<<1) 660 | 661 | // currHash := hash(uint32(x>>8), shift) 662 | LSR $8, R14, R14 663 | MOVW R14, R11 664 | MULW R16, R11, R11 665 | LSRW R5, R11, R11 666 | 667 | // candidate = int(table[currHash]) 668 | MOVHU 0(R17)(R11<<1), R15 669 | 670 | // table[currHash] = uint16(s) 671 | ADD $1, R3, R3 672 | MOVHU R3, 0(R17)(R11<<1) 673 | 674 | // if uint32(x>>8) == load32(src, candidate) { continue } 675 | MOVW (R6)(R15), R4 676 | CMPW R4, R14 677 | BEQ inner1 678 | 679 | // nextHash = hash(uint32(x>>16), shift) 680 | LSR $8, R14, R14 681 | MOVW R14, R11 682 | MULW R16, R11, R11 683 | LSRW R5, R11, R11 684 | 685 | // s++ 686 | ADD $1, R7, R7 687 | 688 | // break out of the inner1 for loop, i.e. continue the outer loop. 689 | B outer 690 | 691 | emitRemainder: 692 | // if nextEmit < len(src) { etc } 693 | MOVD src_len+32(FP), R3 694 | ADD R6, R3, R3 695 | CMP R3, R10 696 | BEQ encodeBlockEnd 697 | 698 | // d += emitLiteral(dst[d:], src[nextEmit:]) 699 | // 700 | // Push args. 701 | MOVD R8, 8(RSP) 702 | MOVD $0, 16(RSP) // Unnecessary, as the callee ignores it, but conservative. 703 | MOVD $0, 24(RSP) // Unnecessary, as the callee ignores it, but conservative. 704 | MOVD R10, 32(RSP) 705 | SUB R10, R3, R3 706 | MOVD R3, 40(RSP) 707 | MOVD R3, 48(RSP) // Unnecessary, as the callee ignores it, but conservative. 708 | 709 | // Spill local variables (registers) onto the stack; call; unspill. 710 | MOVD R8, 88(RSP) 711 | CALL ·emitLiteral(SB) 712 | MOVD 88(RSP), R8 713 | 714 | // Finish the "d +=" part of "d += emitLiteral(etc)". 715 | MOVD 56(RSP), R1 716 | ADD R1, R8, R8 717 | 718 | encodeBlockEnd: 719 | MOVD dst_base+0(FP), R3 720 | SUB R3, R8, R8 721 | MOVD R8, d+48(FP) 722 | RET 723 | -------------------------------------------------------------------------------- /encode_asm.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Snappy-Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // +build !appengine 6 | // +build gc 7 | // +build !noasm 8 | // +build amd64 arm64 9 | 10 | package snappy 11 | 12 | // emitLiteral has the same semantics as in encode_other.go. 13 | // 14 | //go:noescape 15 | func emitLiteral(dst, lit []byte) int 16 | 17 | // emitCopy has the same semantics as in encode_other.go. 18 | // 19 | //go:noescape 20 | func emitCopy(dst []byte, offset, length int) int 21 | 22 | // extendMatch has the same semantics as in encode_other.go. 23 | // 24 | //go:noescape 25 | func extendMatch(src []byte, i, j int) int 26 | 27 | // encodeBlock has the same semantics as in encode_other.go. 28 | // 29 | //go:noescape 30 | func encodeBlock(dst, src []byte) (d int) 31 | -------------------------------------------------------------------------------- /encode_other.go: -------------------------------------------------------------------------------- 1 | // Copyright 2016 The Snappy-Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // +build !amd64,!arm64 appengine !gc noasm 6 | 7 | package snappy 8 | 9 | func load32(b []byte, i int) uint32 { 10 | b = b[i : i+4 : len(b)] // Help the compiler eliminate bounds checks on the next line. 11 | return uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24 12 | } 13 | 14 | func load64(b []byte, i int) uint64 { 15 | b = b[i : i+8 : len(b)] // Help the compiler eliminate bounds checks on the next line. 16 | return uint64(b[0]) | uint64(b[1])<<8 | uint64(b[2])<<16 | uint64(b[3])<<24 | 17 | uint64(b[4])<<32 | uint64(b[5])<<40 | uint64(b[6])<<48 | uint64(b[7])<<56 18 | } 19 | 20 | // emitLiteral writes a literal chunk and returns the number of bytes written. 21 | // 22 | // It assumes that: 23 | // dst is long enough to hold the encoded bytes 24 | // 1 <= len(lit) && len(lit) <= 65536 25 | func emitLiteral(dst, lit []byte) int { 26 | i, n := 0, uint(len(lit)-1) 27 | switch { 28 | case n < 60: 29 | dst[0] = uint8(n)<<2 | tagLiteral 30 | i = 1 31 | case n < 1<<8: 32 | dst[0] = 60<<2 | tagLiteral 33 | dst[1] = uint8(n) 34 | i = 2 35 | default: 36 | dst[0] = 61<<2 | tagLiteral 37 | dst[1] = uint8(n) 38 | dst[2] = uint8(n >> 8) 39 | i = 3 40 | } 41 | return i + copy(dst[i:], lit) 42 | } 43 | 44 | // emitCopy writes a copy chunk and returns the number of bytes written. 45 | // 46 | // It assumes that: 47 | // dst is long enough to hold the encoded bytes 48 | // 1 <= offset && offset <= 65535 49 | // 4 <= length && length <= 65535 50 | func emitCopy(dst []byte, offset, length int) int { 51 | i := 0 52 | // The maximum length for a single tagCopy1 or tagCopy2 op is 64 bytes. The 53 | // threshold for this loop is a little higher (at 68 = 64 + 4), and the 54 | // length emitted down below is is a little lower (at 60 = 64 - 4), because 55 | // it's shorter to encode a length 67 copy as a length 60 tagCopy2 followed 56 | // by a length 7 tagCopy1 (which encodes as 3+2 bytes) than to encode it as 57 | // a length 64 tagCopy2 followed by a length 3 tagCopy2 (which encodes as 58 | // 3+3 bytes). The magic 4 in the 64±4 is because the minimum length for a 59 | // tagCopy1 op is 4 bytes, which is why a length 3 copy has to be an 60 | // encodes-as-3-bytes tagCopy2 instead of an encodes-as-2-bytes tagCopy1. 61 | for length >= 68 { 62 | // Emit a length 64 copy, encoded as 3 bytes. 63 | dst[i+0] = 63<<2 | tagCopy2 64 | dst[i+1] = uint8(offset) 65 | dst[i+2] = uint8(offset >> 8) 66 | i += 3 67 | length -= 64 68 | } 69 | if length > 64 { 70 | // Emit a length 60 copy, encoded as 3 bytes. 71 | dst[i+0] = 59<<2 | tagCopy2 72 | dst[i+1] = uint8(offset) 73 | dst[i+2] = uint8(offset >> 8) 74 | i += 3 75 | length -= 60 76 | } 77 | if length >= 12 || offset >= 2048 { 78 | // Emit the remaining copy, encoded as 3 bytes. 79 | dst[i+0] = uint8(length-1)<<2 | tagCopy2 80 | dst[i+1] = uint8(offset) 81 | dst[i+2] = uint8(offset >> 8) 82 | return i + 3 83 | } 84 | // Emit the remaining copy, encoded as 2 bytes. 85 | dst[i+0] = uint8(offset>>8)<<5 | uint8(length-4)<<2 | tagCopy1 86 | dst[i+1] = uint8(offset) 87 | return i + 2 88 | } 89 | 90 | // extendMatch returns the largest k such that k <= len(src) and that 91 | // src[i:i+k-j] and src[j:k] have the same contents. 92 | // 93 | // It assumes that: 94 | // 0 <= i && i < j && j <= len(src) 95 | func extendMatch(src []byte, i, j int) int { 96 | for ; j < len(src) && src[i] == src[j]; i, j = i+1, j+1 { 97 | } 98 | return j 99 | } 100 | 101 | func hash(u, shift uint32) uint32 { 102 | return (u * 0x1e35a7bd) >> shift 103 | } 104 | 105 | // encodeBlock encodes a non-empty src to a guaranteed-large-enough dst. It 106 | // assumes that the varint-encoded length of the decompressed bytes has already 107 | // been written. 108 | // 109 | // It also assumes that: 110 | // len(dst) >= MaxEncodedLen(len(src)) && 111 | // minNonLiteralBlockSize <= len(src) && len(src) <= maxBlockSize 112 | func encodeBlock(dst, src []byte) (d int) { 113 | // Initialize the hash table. Its size ranges from 1<<8 to 1<<14 inclusive. 114 | // The table element type is uint16, as s < sLimit and sLimit < len(src) 115 | // and len(src) <= maxBlockSize and maxBlockSize == 65536. 116 | const ( 117 | maxTableSize = 1 << 14 118 | // tableMask is redundant, but helps the compiler eliminate bounds 119 | // checks. 120 | tableMask = maxTableSize - 1 121 | ) 122 | shift := uint32(32 - 8) 123 | for tableSize := 1 << 8; tableSize < maxTableSize && tableSize < len(src); tableSize *= 2 { 124 | shift-- 125 | } 126 | // In Go, all array elements are zero-initialized, so there is no advantage 127 | // to a smaller tableSize per se. However, it matches the C++ algorithm, 128 | // and in the asm versions of this code, we can get away with zeroing only 129 | // the first tableSize elements. 130 | var table [maxTableSize]uint16 131 | 132 | // sLimit is when to stop looking for offset/length copies. The inputMargin 133 | // lets us use a fast path for emitLiteral in the main loop, while we are 134 | // looking for copies. 135 | sLimit := len(src) - inputMargin 136 | 137 | // nextEmit is where in src the next emitLiteral should start from. 138 | nextEmit := 0 139 | 140 | // The encoded form must start with a literal, as there are no previous 141 | // bytes to copy, so we start looking for hash matches at s == 1. 142 | s := 1 143 | nextHash := hash(load32(src, s), shift) 144 | 145 | for { 146 | // Copied from the C++ snappy implementation: 147 | // 148 | // Heuristic match skipping: If 32 bytes are scanned with no matches 149 | // found, start looking only at every other byte. If 32 more bytes are 150 | // scanned (or skipped), look at every third byte, etc.. When a match 151 | // is found, immediately go back to looking at every byte. This is a 152 | // small loss (~5% performance, ~0.1% density) for compressible data 153 | // due to more bookkeeping, but for non-compressible data (such as 154 | // JPEG) it's a huge win since the compressor quickly "realizes" the 155 | // data is incompressible and doesn't bother looking for matches 156 | // everywhere. 157 | // 158 | // The "skip" variable keeps track of how many bytes there are since 159 | // the last match; dividing it by 32 (ie. right-shifting by five) gives 160 | // the number of bytes to move ahead for each iteration. 161 | skip := 32 162 | 163 | nextS := s 164 | candidate := 0 165 | for { 166 | s = nextS 167 | bytesBetweenHashLookups := skip >> 5 168 | nextS = s + bytesBetweenHashLookups 169 | skip += bytesBetweenHashLookups 170 | if nextS > sLimit { 171 | goto emitRemainder 172 | } 173 | candidate = int(table[nextHash&tableMask]) 174 | table[nextHash&tableMask] = uint16(s) 175 | nextHash = hash(load32(src, nextS), shift) 176 | if load32(src, s) == load32(src, candidate) { 177 | break 178 | } 179 | } 180 | 181 | // A 4-byte match has been found. We'll later see if more than 4 bytes 182 | // match. But, prior to the match, src[nextEmit:s] are unmatched. Emit 183 | // them as literal bytes. 184 | d += emitLiteral(dst[d:], src[nextEmit:s]) 185 | 186 | // Call emitCopy, and then see if another emitCopy could be our next 187 | // move. Repeat until we find no match for the input immediately after 188 | // what was consumed by the last emitCopy call. 189 | // 190 | // If we exit this loop normally then we need to call emitLiteral next, 191 | // though we don't yet know how big the literal will be. We handle that 192 | // by proceeding to the next iteration of the main loop. We also can 193 | // exit this loop via goto if we get close to exhausting the input. 194 | for { 195 | // Invariant: we have a 4-byte match at s, and no need to emit any 196 | // literal bytes prior to s. 197 | base := s 198 | 199 | // Extend the 4-byte match as long as possible. 200 | // 201 | // This is an inlined version of: 202 | // s = extendMatch(src, candidate+4, s+4) 203 | s += 4 204 | for i := candidate + 4; s < len(src) && src[i] == src[s]; i, s = i+1, s+1 { 205 | } 206 | 207 | d += emitCopy(dst[d:], base-candidate, s-base) 208 | nextEmit = s 209 | if s >= sLimit { 210 | goto emitRemainder 211 | } 212 | 213 | // We could immediately start working at s now, but to improve 214 | // compression we first update the hash table at s-1 and at s. If 215 | // another emitCopy is not our next move, also calculate nextHash 216 | // at s+1. At least on GOARCH=amd64, these three hash calculations 217 | // are faster as one load64 call (with some shifts) instead of 218 | // three load32 calls. 219 | x := load64(src, s-1) 220 | prevHash := hash(uint32(x>>0), shift) 221 | table[prevHash&tableMask] = uint16(s - 1) 222 | currHash := hash(uint32(x>>8), shift) 223 | candidate = int(table[currHash&tableMask]) 224 | table[currHash&tableMask] = uint16(s) 225 | if uint32(x>>8) != load32(src, candidate) { 226 | nextHash = hash(uint32(x>>16), shift) 227 | s++ 228 | break 229 | } 230 | } 231 | } 232 | 233 | emitRemainder: 234 | if nextEmit < len(src) { 235 | d += emitLiteral(dst[d:], src[nextEmit:]) 236 | } 237 | return d 238 | } 239 | -------------------------------------------------------------------------------- /go.mod: -------------------------------------------------------------------------------- 1 | module github.com/golang/snappy 2 | -------------------------------------------------------------------------------- /misc/main.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | This is a C version of the cmd/snappytool Go program. 3 | 4 | To build the snappytool binary: 5 | g++ main.cpp /usr/lib/libsnappy.a -o snappytool 6 | or, if you have built the C++ snappy library from source: 7 | g++ main.cpp /path/to/your/snappy/.libs/libsnappy.a -o snappytool 8 | after running "make" from your snappy checkout directory. 9 | */ 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | 16 | #include "snappy.h" 17 | 18 | #define N 1000000 19 | 20 | char dst[N]; 21 | char src[N]; 22 | 23 | int main(int argc, char** argv) { 24 | // Parse args. 25 | if (argc != 2) { 26 | fprintf(stderr, "exactly one of -d or -e must be given\n"); 27 | return 1; 28 | } 29 | bool decode = strcmp(argv[1], "-d") == 0; 30 | bool encode = strcmp(argv[1], "-e") == 0; 31 | if (decode == encode) { 32 | fprintf(stderr, "exactly one of -d or -e must be given\n"); 33 | return 1; 34 | } 35 | 36 | // Read all of stdin into src[:s]. 37 | size_t s = 0; 38 | while (1) { 39 | if (s == N) { 40 | fprintf(stderr, "input too large\n"); 41 | return 1; 42 | } 43 | ssize_t n = read(0, src+s, N-s); 44 | if (n == 0) { 45 | break; 46 | } 47 | if (n < 0) { 48 | fprintf(stderr, "read error: %s\n", strerror(errno)); 49 | // TODO: handle EAGAIN, EINTR? 50 | return 1; 51 | } 52 | s += n; 53 | } 54 | 55 | // Encode or decode src[:s] to dst[:d], and write to stdout. 56 | size_t d = 0; 57 | if (encode) { 58 | if (N < snappy::MaxCompressedLength(s)) { 59 | fprintf(stderr, "input too large after encoding\n"); 60 | return 1; 61 | } 62 | snappy::RawCompress(src, s, dst, &d); 63 | } else { 64 | if (!snappy::GetUncompressedLength(src, s, &d)) { 65 | fprintf(stderr, "could not get uncompressed length\n"); 66 | return 1; 67 | } 68 | if (N < d) { 69 | fprintf(stderr, "input too large after decoding\n"); 70 | return 1; 71 | } 72 | if (!snappy::RawUncompress(src, s, dst)) { 73 | fprintf(stderr, "input was not valid Snappy-compressed data\n"); 74 | return 1; 75 | } 76 | } 77 | write(1, dst, d); 78 | return 0; 79 | } 80 | -------------------------------------------------------------------------------- /snappy.go: -------------------------------------------------------------------------------- 1 | // Copyright 2011 The Snappy-Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | // Package snappy implements the Snappy compression format. It aims for very 6 | // high speeds and reasonable compression. 7 | // 8 | // There are actually two Snappy formats: block and stream. They are related, 9 | // but different: trying to decompress block-compressed data as a Snappy stream 10 | // will fail, and vice versa. The block format is the Decode and Encode 11 | // functions and the stream format is the Reader and Writer types. 12 | // 13 | // The block format, the more common case, is used when the complete size (the 14 | // number of bytes) of the original data is known upfront, at the time 15 | // compression starts. The stream format, also known as the framing format, is 16 | // for when that isn't always true. 17 | // 18 | // The canonical, C++ implementation is at https://github.com/google/snappy and 19 | // it only implements the block format. 20 | package snappy // import "github.com/golang/snappy" 21 | 22 | import ( 23 | "hash/crc32" 24 | ) 25 | 26 | /* 27 | Each encoded block begins with the varint-encoded length of the decoded data, 28 | followed by a sequence of chunks. Chunks begin and end on byte boundaries. The 29 | first byte of each chunk is broken into its 2 least and 6 most significant bits 30 | called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag. 31 | Zero means a literal tag. All other values mean a copy tag. 32 | 33 | For literal tags: 34 | - If m < 60, the next 1 + m bytes are literal bytes. 35 | - Otherwise, let n be the little-endian unsigned integer denoted by the next 36 | m - 59 bytes. The next 1 + n bytes after that are literal bytes. 37 | 38 | For copy tags, length bytes are copied from offset bytes ago, in the style of 39 | Lempel-Ziv compression algorithms. In particular: 40 | - For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12). 41 | The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10 42 | of the offset. The next byte is bits 0-7 of the offset. 43 | - For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65). 44 | The length is 1 + m. The offset is the little-endian unsigned integer 45 | denoted by the next 2 bytes. 46 | - For l == 3, this tag is a legacy format that is no longer issued by most 47 | encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in 48 | [1, 65). The length is 1 + m. The offset is the little-endian unsigned 49 | integer denoted by the next 4 bytes. 50 | */ 51 | const ( 52 | tagLiteral = 0x00 53 | tagCopy1 = 0x01 54 | tagCopy2 = 0x02 55 | tagCopy4 = 0x03 56 | ) 57 | 58 | const ( 59 | checksumSize = 4 60 | chunkHeaderSize = 4 61 | magicChunk = "\xff\x06\x00\x00" + magicBody 62 | magicBody = "sNaPpY" 63 | 64 | // maxBlockSize is the maximum size of the input to encodeBlock. It is not 65 | // part of the wire format per se, but some parts of the encoder assume 66 | // that an offset fits into a uint16. 67 | // 68 | // Also, for the framing format (Writer type instead of Encode function), 69 | // https://github.com/google/snappy/blob/master/framing_format.txt says 70 | // that "the uncompressed data in a chunk must be no longer than 65536 71 | // bytes". 72 | maxBlockSize = 65536 73 | 74 | // maxEncodedLenOfMaxBlockSize equals MaxEncodedLen(maxBlockSize), but is 75 | // hard coded to be a const instead of a variable, so that obufLen can also 76 | // be a const. Their equivalence is confirmed by 77 | // TestMaxEncodedLenOfMaxBlockSize. 78 | maxEncodedLenOfMaxBlockSize = 76490 79 | 80 | obufHeaderLen = len(magicChunk) + checksumSize + chunkHeaderSize 81 | obufLen = obufHeaderLen + maxEncodedLenOfMaxBlockSize 82 | ) 83 | 84 | const ( 85 | chunkTypeCompressedData = 0x00 86 | chunkTypeUncompressedData = 0x01 87 | chunkTypePadding = 0xfe 88 | chunkTypeStreamIdentifier = 0xff 89 | ) 90 | 91 | var crcTable = crc32.MakeTable(crc32.Castagnoli) 92 | 93 | // crc implements the checksum specified in section 3 of 94 | // https://github.com/google/snappy/blob/master/framing_format.txt 95 | func crc(b []byte) uint32 { 96 | c := crc32.Update(0, crcTable, b) 97 | return uint32(c>>15|c<<17) + 0xa282ead8 98 | } 99 | -------------------------------------------------------------------------------- /snappy_test.go: -------------------------------------------------------------------------------- 1 | // Copyright 2011 The Snappy-Go Authors. All rights reserved. 2 | // Use of this source code is governed by a BSD-style 3 | // license that can be found in the LICENSE file. 4 | 5 | package snappy 6 | 7 | import ( 8 | "bytes" 9 | "encoding/binary" 10 | "flag" 11 | "fmt" 12 | "io" 13 | "io/ioutil" 14 | "math/rand" 15 | "net/http" 16 | "os" 17 | "os/exec" 18 | "path/filepath" 19 | "runtime" 20 | "strings" 21 | "testing" 22 | ) 23 | 24 | var ( 25 | download = flag.Bool("download", false, "If true, download any missing files before running benchmarks") 26 | testdataDir = flag.String("testdataDir", "testdata", "Directory containing the test data") 27 | benchdataDir = flag.String("benchdataDir", "testdata/bench", "Directory containing the benchmark data") 28 | ) 29 | 30 | // goEncoderShouldMatchCppEncoder is whether to test that the algorithm used by 31 | // Go's encoder matches byte-for-byte what the C++ snappy encoder produces, on 32 | // this GOARCH. There is more than one valid encoding of any given input, and 33 | // there is more than one good algorithm along the frontier of trading off 34 | // throughput for output size. Nonetheless, we presume that the C++ encoder's 35 | // algorithm is a good one and has been tested on a wide range of inputs, so 36 | // matching that exactly should mean that the Go encoder's algorithm is also 37 | // good, without needing to gather our own corpus of test data. 38 | // 39 | // The exact algorithm used by the C++ code is potentially endian dependent, as 40 | // it puns a byte pointer to a uint32 pointer to load, hash and compare 4 bytes 41 | // at a time. The Go implementation is endian agnostic, in that its output is 42 | // the same (as little-endian C++ code), regardless of the CPU's endianness. 43 | // 44 | // Thus, when comparing Go's output to C++ output generated beforehand, such as 45 | // the "testdata/pi.txt.rawsnappy" file generated by C++ code on a little- 46 | // endian system, we can run that test regardless of the runtime.GOARCH value. 47 | // 48 | // When comparing Go's output to dynamically generated C++ output, i.e. the 49 | // result of fork/exec'ing a C++ program, we can run that test only on 50 | // little-endian systems, because the C++ output might be different on 51 | // big-endian systems. The runtime package doesn't export endianness per se, 52 | // but we can restrict this match-C++ test to common little-endian systems. 53 | const goEncoderShouldMatchCppEncoder = runtime.GOARCH == "386" || runtime.GOARCH == "amd64" || runtime.GOARCH == "arm" 54 | 55 | func TestMaxEncodedLenOfMaxBlockSize(t *testing.T) { 56 | got := maxEncodedLenOfMaxBlockSize 57 | want := MaxEncodedLen(maxBlockSize) 58 | if got != want { 59 | t.Fatalf("got %d, want %d", got, want) 60 | } 61 | } 62 | 63 | func cmp(a, b []byte) error { 64 | if bytes.Equal(a, b) { 65 | return nil 66 | } 67 | if len(a) != len(b) { 68 | return fmt.Errorf("got %d bytes, want %d", len(a), len(b)) 69 | } 70 | for i := range a { 71 | if a[i] != b[i] { 72 | return fmt.Errorf("byte #%d: got 0x%02x, want 0x%02x", i, a[i], b[i]) 73 | } 74 | } 75 | return nil 76 | } 77 | 78 | func roundtrip(b, ebuf, dbuf []byte) error { 79 | d, err := Decode(dbuf, Encode(ebuf, b)) 80 | if err != nil { 81 | return fmt.Errorf("decoding error: %v", err) 82 | } 83 | if err := cmp(d, b); err != nil { 84 | return fmt.Errorf("roundtrip mismatch: %v", err) 85 | } 86 | return nil 87 | } 88 | 89 | func TestEmpty(t *testing.T) { 90 | if err := roundtrip(nil, nil, nil); err != nil { 91 | t.Fatal(err) 92 | } 93 | } 94 | 95 | func TestSmallCopy(t *testing.T) { 96 | for _, ebuf := range [][]byte{nil, make([]byte, 20), make([]byte, 64)} { 97 | for _, dbuf := range [][]byte{nil, make([]byte, 20), make([]byte, 64)} { 98 | for i := 0; i < 32; i++ { 99 | s := "aaaa" + strings.Repeat("b", i) + "aaaabbbb" 100 | if err := roundtrip([]byte(s), ebuf, dbuf); err != nil { 101 | t.Errorf("len(ebuf)=%d, len(dbuf)=%d, i=%d: %v", len(ebuf), len(dbuf), i, err) 102 | } 103 | } 104 | } 105 | } 106 | } 107 | 108 | func TestSmallRand(t *testing.T) { 109 | rng := rand.New(rand.NewSource(1)) 110 | for n := 1; n < 20000; n += 23 { 111 | b := make([]byte, n) 112 | for i := range b { 113 | b[i] = uint8(rng.Intn(256)) 114 | } 115 | if err := roundtrip(b, nil, nil); err != nil { 116 | t.Fatal(err) 117 | } 118 | } 119 | } 120 | 121 | func TestSmallRegular(t *testing.T) { 122 | for n := 1; n < 20000; n += 23 { 123 | b := make([]byte, n) 124 | for i := range b { 125 | b[i] = uint8(i%10 + 'a') 126 | } 127 | if err := roundtrip(b, nil, nil); err != nil { 128 | t.Fatal(err) 129 | } 130 | } 131 | } 132 | 133 | func TestInvalidVarint(t *testing.T) { 134 | testCases := []struct { 135 | desc string 136 | input string 137 | }{{ 138 | "invalid varint, final byte has continuation bit set", 139 | "\xff", 140 | }, { 141 | "invalid varint, value overflows uint64", 142 | "\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00", 143 | }, { 144 | // https://github.com/google/snappy/blob/master/format_description.txt 145 | // says that "the stream starts with the uncompressed length [as a 146 | // varint] (up to a maximum of 2^32 - 1)". 147 | "valid varint (as uint64), but value overflows uint32", 148 | "\x80\x80\x80\x80\x10", 149 | }} 150 | 151 | for _, tc := range testCases { 152 | input := []byte(tc.input) 153 | if _, err := DecodedLen(input); err != ErrCorrupt { 154 | t.Errorf("%s: DecodedLen: got %v, want ErrCorrupt", tc.desc, err) 155 | } 156 | if _, err := Decode(nil, input); err != ErrCorrupt { 157 | t.Errorf("%s: Decode: got %v, want ErrCorrupt", tc.desc, err) 158 | } 159 | } 160 | } 161 | 162 | func TestDecode(t *testing.T) { 163 | lit40Bytes := make([]byte, 40) 164 | for i := range lit40Bytes { 165 | lit40Bytes[i] = byte(i) 166 | } 167 | lit40 := string(lit40Bytes) 168 | 169 | testCases := []struct { 170 | desc string 171 | input string 172 | want string 173 | wantErr error 174 | }{{ 175 | `decodedLen=0; valid input`, 176 | "\x00", 177 | "", 178 | nil, 179 | }, { 180 | `decodedLen=3; tagLiteral, 0-byte length; length=3; valid input`, 181 | "\x03" + "\x08\xff\xff\xff", 182 | "\xff\xff\xff", 183 | nil, 184 | }, { 185 | `decodedLen=2; tagLiteral, 0-byte length; length=3; not enough dst bytes`, 186 | "\x02" + "\x08\xff\xff\xff", 187 | "", 188 | ErrCorrupt, 189 | }, { 190 | `decodedLen=3; tagLiteral, 0-byte length; length=3; not enough src bytes`, 191 | "\x03" + "\x08\xff\xff", 192 | "", 193 | ErrCorrupt, 194 | }, { 195 | `decodedLen=40; tagLiteral, 0-byte length; length=40; valid input`, 196 | "\x28" + "\x9c" + lit40, 197 | lit40, 198 | nil, 199 | }, { 200 | `decodedLen=1; tagLiteral, 1-byte length; not enough length bytes`, 201 | "\x01" + "\xf0", 202 | "", 203 | ErrCorrupt, 204 | }, { 205 | `decodedLen=3; tagLiteral, 1-byte length; length=3; valid input`, 206 | "\x03" + "\xf0\x02\xff\xff\xff", 207 | "\xff\xff\xff", 208 | nil, 209 | }, { 210 | `decodedLen=1; tagLiteral, 2-byte length; not enough length bytes`, 211 | "\x01" + "\xf4\x00", 212 | "", 213 | ErrCorrupt, 214 | }, { 215 | `decodedLen=3; tagLiteral, 2-byte length; length=3; valid input`, 216 | "\x03" + "\xf4\x02\x00\xff\xff\xff", 217 | "\xff\xff\xff", 218 | nil, 219 | }, { 220 | `decodedLen=1; tagLiteral, 3-byte length; not enough length bytes`, 221 | "\x01" + "\xf8\x00\x00", 222 | "", 223 | ErrCorrupt, 224 | }, { 225 | `decodedLen=3; tagLiteral, 3-byte length; length=3; valid input`, 226 | "\x03" + "\xf8\x02\x00\x00\xff\xff\xff", 227 | "\xff\xff\xff", 228 | nil, 229 | }, { 230 | `decodedLen=1; tagLiteral, 4-byte length; not enough length bytes`, 231 | "\x01" + "\xfc\x00\x00\x00", 232 | "", 233 | ErrCorrupt, 234 | }, { 235 | `decodedLen=1; tagLiteral, 4-byte length; length=3; not enough dst bytes`, 236 | "\x01" + "\xfc\x02\x00\x00\x00\xff\xff\xff", 237 | "", 238 | ErrCorrupt, 239 | }, { 240 | `decodedLen=4; tagLiteral, 4-byte length; length=3; not enough src bytes`, 241 | "\x04" + "\xfc\x02\x00\x00\x00\xff", 242 | "", 243 | ErrCorrupt, 244 | }, { 245 | `decodedLen=3; tagLiteral, 4-byte length; length=3; valid input`, 246 | "\x03" + "\xfc\x02\x00\x00\x00\xff\xff\xff", 247 | "\xff\xff\xff", 248 | nil, 249 | }, { 250 | `decodedLen=4; tagCopy1, 1 extra length|offset byte; not enough extra bytes`, 251 | "\x04" + "\x01", 252 | "", 253 | ErrCorrupt, 254 | }, { 255 | `decodedLen=4; tagCopy2, 2 extra length|offset bytes; not enough extra bytes`, 256 | "\x04" + "\x02\x00", 257 | "", 258 | ErrCorrupt, 259 | }, { 260 | `decodedLen=4; tagCopy4, 4 extra length|offset bytes; not enough extra bytes`, 261 | "\x04" + "\x03\x00\x00\x00", 262 | "", 263 | ErrCorrupt, 264 | }, { 265 | `decodedLen=4; tagLiteral (4 bytes "abcd"); valid input`, 266 | "\x04" + "\x0cabcd", 267 | "abcd", 268 | nil, 269 | }, { 270 | `decodedLen=13; tagLiteral (4 bytes "abcd"); tagCopy1; length=9 offset=4; valid input`, 271 | "\x0d" + "\x0cabcd" + "\x15\x04", 272 | "abcdabcdabcda", 273 | nil, 274 | }, { 275 | `decodedLen=8; tagLiteral (4 bytes "abcd"); tagCopy1; length=4 offset=4; valid input`, 276 | "\x08" + "\x0cabcd" + "\x01\x04", 277 | "abcdabcd", 278 | nil, 279 | }, { 280 | `decodedLen=8; tagLiteral (4 bytes "abcd"); tagCopy1; length=4 offset=2; valid input`, 281 | "\x08" + "\x0cabcd" + "\x01\x02", 282 | "abcdcdcd", 283 | nil, 284 | }, { 285 | `decodedLen=8; tagLiteral (4 bytes "abcd"); tagCopy1; length=4 offset=1; valid input`, 286 | "\x08" + "\x0cabcd" + "\x01\x01", 287 | "abcddddd", 288 | nil, 289 | }, { 290 | `decodedLen=8; tagLiteral (4 bytes "abcd"); tagCopy1; length=4 offset=0; zero offset`, 291 | "\x08" + "\x0cabcd" + "\x01\x00", 292 | "", 293 | ErrCorrupt, 294 | }, { 295 | `decodedLen=9; tagLiteral (4 bytes "abcd"); tagCopy1; length=4 offset=4; inconsistent dLen`, 296 | "\x09" + "\x0cabcd" + "\x01\x04", 297 | "", 298 | ErrCorrupt, 299 | }, { 300 | `decodedLen=8; tagLiteral (4 bytes "abcd"); tagCopy1; length=4 offset=5; offset too large`, 301 | "\x08" + "\x0cabcd" + "\x01\x05", 302 | "", 303 | ErrCorrupt, 304 | }, { 305 | `decodedLen=7; tagLiteral (4 bytes "abcd"); tagCopy1; length=4 offset=4; length too large`, 306 | "\x07" + "\x0cabcd" + "\x01\x04", 307 | "", 308 | ErrCorrupt, 309 | }, { 310 | `decodedLen=6; tagLiteral (4 bytes "abcd"); tagCopy2; length=2 offset=3; valid input`, 311 | "\x06" + "\x0cabcd" + "\x06\x03\x00", 312 | "abcdbc", 313 | nil, 314 | }, { 315 | `decodedLen=6; tagLiteral (4 bytes "abcd"); tagCopy4; length=2 offset=3; valid input`, 316 | "\x06" + "\x0cabcd" + "\x07\x03\x00\x00\x00", 317 | "abcdbc", 318 | nil, 319 | }, { 320 | `decodedLen=0; tagCopy4, 4 extra length|offset bytes; with msb set (0x93); discovered by go-fuzz`, 321 | "\x00\xfc000\x93", 322 | "", 323 | ErrCorrupt, 324 | }} 325 | 326 | const ( 327 | // notPresentXxx defines a range of byte values [0xa0, 0xc5) that are 328 | // not present in either the input or the output. It is written to dBuf 329 | // to check that Decode does not write bytes past the end of 330 | // dBuf[:dLen]. 331 | // 332 | // The magic number 37 was chosen because it is prime. A more 'natural' 333 | // number like 32 might lead to a false negative if, for example, a 334 | // byte was incorrectly copied 4*8 bytes later. 335 | notPresentBase = 0xa0 336 | notPresentLen = 37 337 | ) 338 | 339 | var dBuf [100]byte 340 | loop: 341 | for i, tc := range testCases { 342 | input := []byte(tc.input) 343 | for _, x := range input { 344 | if notPresentBase <= x && x < notPresentBase+notPresentLen { 345 | t.Errorf("#%d (%s): input shouldn't contain %#02x\ninput: % x", i, tc.desc, x, input) 346 | continue loop 347 | } 348 | } 349 | 350 | dLen, n := binary.Uvarint(input) 351 | if n <= 0 { 352 | t.Errorf("#%d (%s): invalid varint-encoded dLen", i, tc.desc) 353 | continue 354 | } 355 | if dLen > uint64(len(dBuf)) { 356 | t.Errorf("#%d (%s): dLen %d is too large", i, tc.desc, dLen) 357 | continue 358 | } 359 | 360 | for j := range dBuf { 361 | dBuf[j] = byte(notPresentBase + j%notPresentLen) 362 | } 363 | g, gotErr := Decode(dBuf[:], input) 364 | if got := string(g); got != tc.want || gotErr != tc.wantErr { 365 | t.Errorf("#%d (%s):\ngot %q, %v\nwant %q, %v", 366 | i, tc.desc, got, gotErr, tc.want, tc.wantErr) 367 | continue 368 | } 369 | for j, x := range dBuf { 370 | if uint64(j) < dLen { 371 | continue 372 | } 373 | if w := byte(notPresentBase + j%notPresentLen); x != w { 374 | t.Errorf("#%d (%s): Decode overrun: dBuf[%d] was modified: got %#02x, want %#02x\ndBuf: % x", 375 | i, tc.desc, j, x, w, dBuf) 376 | continue loop 377 | } 378 | } 379 | } 380 | } 381 | 382 | func TestDecodeCopy4(t *testing.T) { 383 | dots := strings.Repeat(".", 65536) 384 | 385 | input := strings.Join([]string{ 386 | "\x89\x80\x04", // decodedLen = 65545. 387 | "\x0cpqrs", // 4-byte literal "pqrs". 388 | "\xf4\xff\xff" + dots, // 65536-byte literal dots. 389 | "\x13\x04\x00\x01\x00", // tagCopy4; length=5 offset=65540. 390 | }, "") 391 | 392 | gotBytes, err := Decode(nil, []byte(input)) 393 | if err != nil { 394 | t.Fatal(err) 395 | } 396 | got := string(gotBytes) 397 | want := "pqrs" + dots + "pqrs." 398 | if len(got) != len(want) { 399 | t.Fatalf("got %d bytes, want %d", len(got), len(want)) 400 | } 401 | if got != want { 402 | for i := 0; i < len(got); i++ { 403 | if g, w := got[i], want[i]; g != w { 404 | t.Fatalf("byte #%d: got %#02x, want %#02x", i, g, w) 405 | } 406 | } 407 | } 408 | } 409 | 410 | // TestDecodeLengthOffset tests decoding an encoding of the form literal + 411 | // copy-length-offset + literal. For example: "abcdefghijkl" + "efghij" + "AB". 412 | func TestDecodeLengthOffset(t *testing.T) { 413 | const ( 414 | prefix = "abcdefghijklmnopqr" 415 | suffix = "ABCDEFGHIJKLMNOPQR" 416 | 417 | // notPresentXxx defines a range of byte values [0xa0, 0xc5) that are 418 | // not present in either the input or the output. It is written to 419 | // gotBuf to check that Decode does not write bytes past the end of 420 | // gotBuf[:totalLen]. 421 | // 422 | // The magic number 37 was chosen because it is prime. A more 'natural' 423 | // number like 32 might lead to a false negative if, for example, a 424 | // byte was incorrectly copied 4*8 bytes later. 425 | notPresentBase = 0xa0 426 | notPresentLen = 37 427 | ) 428 | var gotBuf, wantBuf, inputBuf [128]byte 429 | for length := 1; length <= 18; length++ { 430 | for offset := 1; offset <= 18; offset++ { 431 | loop: 432 | for suffixLen := 0; suffixLen <= 18; suffixLen++ { 433 | totalLen := len(prefix) + length + suffixLen 434 | 435 | inputLen := binary.PutUvarint(inputBuf[:], uint64(totalLen)) 436 | inputBuf[inputLen] = tagLiteral + 4*byte(len(prefix)-1) 437 | inputLen++ 438 | inputLen += copy(inputBuf[inputLen:], prefix) 439 | inputBuf[inputLen+0] = tagCopy2 + 4*byte(length-1) 440 | inputBuf[inputLen+1] = byte(offset) 441 | inputBuf[inputLen+2] = 0x00 442 | inputLen += 3 443 | if suffixLen > 0 { 444 | inputBuf[inputLen] = tagLiteral + 4*byte(suffixLen-1) 445 | inputLen++ 446 | inputLen += copy(inputBuf[inputLen:], suffix[:suffixLen]) 447 | } 448 | input := inputBuf[:inputLen] 449 | 450 | for i := range gotBuf { 451 | gotBuf[i] = byte(notPresentBase + i%notPresentLen) 452 | } 453 | got, err := Decode(gotBuf[:], input) 454 | if err != nil { 455 | t.Errorf("length=%d, offset=%d; suffixLen=%d: %v", length, offset, suffixLen, err) 456 | continue 457 | } 458 | 459 | wantLen := 0 460 | wantLen += copy(wantBuf[wantLen:], prefix) 461 | for i := 0; i < length; i++ { 462 | wantBuf[wantLen] = wantBuf[wantLen-offset] 463 | wantLen++ 464 | } 465 | wantLen += copy(wantBuf[wantLen:], suffix[:suffixLen]) 466 | want := wantBuf[:wantLen] 467 | 468 | for _, x := range input { 469 | if notPresentBase <= x && x < notPresentBase+notPresentLen { 470 | t.Errorf("length=%d, offset=%d; suffixLen=%d: input shouldn't contain %#02x\ninput: % x", 471 | length, offset, suffixLen, x, input) 472 | continue loop 473 | } 474 | } 475 | for i, x := range gotBuf { 476 | if i < totalLen { 477 | continue 478 | } 479 | if w := byte(notPresentBase + i%notPresentLen); x != w { 480 | t.Errorf("length=%d, offset=%d; suffixLen=%d; totalLen=%d: "+ 481 | "Decode overrun: gotBuf[%d] was modified: got %#02x, want %#02x\ngotBuf: % x", 482 | length, offset, suffixLen, totalLen, i, x, w, gotBuf) 483 | continue loop 484 | } 485 | } 486 | for _, x := range want { 487 | if notPresentBase <= x && x < notPresentBase+notPresentLen { 488 | t.Errorf("length=%d, offset=%d; suffixLen=%d: want shouldn't contain %#02x\nwant: % x", 489 | length, offset, suffixLen, x, want) 490 | continue loop 491 | } 492 | } 493 | 494 | if !bytes.Equal(got, want) { 495 | t.Errorf("length=%d, offset=%d; suffixLen=%d:\ninput % x\ngot % x\nwant % x", 496 | length, offset, suffixLen, input, got, want) 497 | continue 498 | } 499 | } 500 | } 501 | } 502 | } 503 | 504 | const ( 505 | goldenText = "Isaac.Newton-Opticks.txt" 506 | goldenCompressed = goldenText + ".rawsnappy" 507 | ) 508 | 509 | func TestDecodeGoldenInput(t *testing.T) { 510 | tDir := filepath.FromSlash(*testdataDir) 511 | src, err := ioutil.ReadFile(filepath.Join(tDir, goldenCompressed)) 512 | if err != nil { 513 | t.Fatalf("ReadFile: %v", err) 514 | } 515 | got, err := Decode(nil, src) 516 | if err != nil { 517 | t.Fatalf("Decode: %v", err) 518 | } 519 | want, err := ioutil.ReadFile(filepath.Join(tDir, goldenText)) 520 | if err != nil { 521 | t.Fatalf("ReadFile: %v", err) 522 | } 523 | if err := cmp(got, want); err != nil { 524 | t.Fatal(err) 525 | } 526 | } 527 | 528 | func TestEncodeGoldenInput(t *testing.T) { 529 | tDir := filepath.FromSlash(*testdataDir) 530 | src, err := ioutil.ReadFile(filepath.Join(tDir, goldenText)) 531 | if err != nil { 532 | t.Fatalf("ReadFile: %v", err) 533 | } 534 | got := Encode(nil, src) 535 | want, err := ioutil.ReadFile(filepath.Join(tDir, goldenCompressed)) 536 | if err != nil { 537 | t.Fatalf("ReadFile: %v", err) 538 | } 539 | if err := cmp(got, want); err != nil { 540 | t.Fatal(err) 541 | } 542 | } 543 | 544 | func TestExtendMatchGoldenInput(t *testing.T) { 545 | tDir := filepath.FromSlash(*testdataDir) 546 | src, err := ioutil.ReadFile(filepath.Join(tDir, goldenText)) 547 | if err != nil { 548 | t.Fatalf("ReadFile: %v", err) 549 | } 550 | for i, tc := range extendMatchGoldenTestCases { 551 | got := extendMatch(src, tc.i, tc.j) 552 | if got != tc.want { 553 | t.Errorf("test #%d: i, j = %5d, %5d: got %5d (= j + %6d), want %5d (= j + %6d)", 554 | i, tc.i, tc.j, got, got-tc.j, tc.want, tc.want-tc.j) 555 | } 556 | } 557 | } 558 | 559 | func TestExtendMatch(t *testing.T) { 560 | // ref is a simple, reference implementation of extendMatch. 561 | ref := func(src []byte, i, j int) int { 562 | for ; j < len(src) && src[i] == src[j]; i, j = i+1, j+1 { 563 | } 564 | return j 565 | } 566 | 567 | nums := []int{0, 1, 2, 7, 8, 9, 29, 30, 31, 32, 33, 34, 38, 39, 40} 568 | for yIndex := 40; yIndex > 30; yIndex-- { 569 | xxx := bytes.Repeat([]byte("x"), 40) 570 | if yIndex < len(xxx) { 571 | xxx[yIndex] = 'y' 572 | } 573 | for _, i := range nums { 574 | for _, j := range nums { 575 | if i >= j { 576 | continue 577 | } 578 | got := extendMatch(xxx, i, j) 579 | want := ref(xxx, i, j) 580 | if got != want { 581 | t.Errorf("yIndex=%d, i=%d, j=%d: got %d, want %d", yIndex, i, j, got, want) 582 | } 583 | } 584 | } 585 | } 586 | } 587 | 588 | const snappytoolCmdName = "cmd/snappytool/snappytool" 589 | 590 | func skipTestSameEncodingAsCpp() (msg string) { 591 | if !goEncoderShouldMatchCppEncoder { 592 | return fmt.Sprintf("skipping testing that the encoding is byte-for-byte identical to C++: GOARCH=%s", runtime.GOARCH) 593 | } 594 | if _, err := os.Stat(snappytoolCmdName); err != nil { 595 | return fmt.Sprintf("could not find snappytool: %v", err) 596 | } 597 | return "" 598 | } 599 | 600 | func runTestSameEncodingAsCpp(src []byte) error { 601 | got := Encode(nil, src) 602 | 603 | cmd := exec.Command(snappytoolCmdName, "-e") 604 | cmd.Stdin = bytes.NewReader(src) 605 | want, err := cmd.Output() 606 | if err != nil { 607 | return fmt.Errorf("could not run snappytool: %v", err) 608 | } 609 | return cmp(got, want) 610 | } 611 | 612 | func TestSameEncodingAsCppShortCopies(t *testing.T) { 613 | if msg := skipTestSameEncodingAsCpp(); msg != "" { 614 | t.Skip(msg) 615 | } 616 | src := bytes.Repeat([]byte{'a'}, 20) 617 | for i := 0; i <= len(src); i++ { 618 | if err := runTestSameEncodingAsCpp(src[:i]); err != nil { 619 | t.Errorf("i=%d: %v", i, err) 620 | } 621 | } 622 | } 623 | 624 | func TestSameEncodingAsCppLongFiles(t *testing.T) { 625 | if msg := skipTestSameEncodingAsCpp(); msg != "" { 626 | t.Skip(msg) 627 | } 628 | bDir := filepath.FromSlash(*benchdataDir) 629 | failed := false 630 | for i, tf := range testFiles { 631 | if err := downloadBenchmarkFiles(t, tf.filename); err != nil { 632 | t.Fatalf("failed to download testdata: %s", err) 633 | } 634 | data := readFile(t, filepath.Join(bDir, tf.filename)) 635 | if n := tf.sizeLimit; 0 < n && n < len(data) { 636 | data = data[:n] 637 | } 638 | if err := runTestSameEncodingAsCpp(data); err != nil { 639 | t.Errorf("i=%d: %v", i, err) 640 | failed = true 641 | } 642 | } 643 | if failed { 644 | t.Errorf("was the snappytool program built against the C++ snappy library version " + 645 | "d53de187 or later, committed on 2016-04-05? See " + 646 | "https://github.com/google/snappy/commit/d53de18799418e113e44444252a39b12a0e4e0cc") 647 | } 648 | } 649 | 650 | // TestSlowForwardCopyOverrun tests the "expand the pattern" algorithm 651 | // described in decode_amd64.s and its claim of a 10 byte overrun worst case. 652 | func TestSlowForwardCopyOverrun(t *testing.T) { 653 | const base = 100 654 | 655 | for length := 1; length < 18; length++ { 656 | for offset := 1; offset < 18; offset++ { 657 | highWaterMark := base 658 | d := base 659 | l := length 660 | o := offset 661 | 662 | // makeOffsetAtLeast8 663 | for o < 8 { 664 | if end := d + 8; highWaterMark < end { 665 | highWaterMark = end 666 | } 667 | l -= o 668 | d += o 669 | o += o 670 | } 671 | 672 | // fixUpSlowForwardCopy 673 | a := d 674 | d += l 675 | 676 | // finishSlowForwardCopy 677 | for l > 0 { 678 | if end := a + 8; highWaterMark < end { 679 | highWaterMark = end 680 | } 681 | a += 8 682 | l -= 8 683 | } 684 | 685 | dWant := base + length 686 | overrun := highWaterMark - dWant 687 | if d != dWant || overrun < 0 || 10 < overrun { 688 | t.Errorf("length=%d, offset=%d: d and overrun: got (%d, %d), want (%d, something in [0, 10])", 689 | length, offset, d, overrun, dWant) 690 | } 691 | } 692 | } 693 | } 694 | 695 | // TestEncodeNoiseThenRepeats encodes input for which the first half is very 696 | // incompressible and the second half is very compressible. The encoded form's 697 | // length should be closer to 50% of the original length than 100%. 698 | func TestEncodeNoiseThenRepeats(t *testing.T) { 699 | for _, origLen := range []int{256 * 1024, 2048 * 1024} { 700 | src := make([]byte, origLen) 701 | rng := rand.New(rand.NewSource(1)) 702 | firstHalf, secondHalf := src[:origLen/2], src[origLen/2:] 703 | for i := range firstHalf { 704 | firstHalf[i] = uint8(rng.Intn(256)) 705 | } 706 | for i := range secondHalf { 707 | secondHalf[i] = uint8(i >> 8) 708 | } 709 | dst := Encode(nil, src) 710 | if got, want := len(dst), origLen*3/4; got >= want { 711 | t.Errorf("origLen=%d: got %d encoded bytes, want less than %d", origLen, got, want) 712 | } 713 | } 714 | } 715 | 716 | func TestFramingFormat(t *testing.T) { 717 | // src is comprised of alternating 1e5-sized sequences of random 718 | // (incompressible) bytes and repeated (compressible) bytes. 1e5 was chosen 719 | // because it is larger than maxBlockSize (64k). 720 | src := make([]byte, 1e6) 721 | rng := rand.New(rand.NewSource(1)) 722 | for i := 0; i < 10; i++ { 723 | if i%2 == 0 { 724 | for j := 0; j < 1e5; j++ { 725 | src[1e5*i+j] = uint8(rng.Intn(256)) 726 | } 727 | } else { 728 | for j := 0; j < 1e5; j++ { 729 | src[1e5*i+j] = uint8(i) 730 | } 731 | } 732 | } 733 | 734 | buf := new(bytes.Buffer) 735 | if _, err := NewWriter(buf).Write(src); err != nil { 736 | t.Fatalf("Write: encoding: %v", err) 737 | } 738 | dst, err := ioutil.ReadAll(NewReader(buf)) 739 | if err != nil { 740 | t.Fatalf("ReadAll: decoding: %v", err) 741 | } 742 | if err := cmp(dst, src); err != nil { 743 | t.Fatal(err) 744 | } 745 | } 746 | 747 | func TestWriterGoldenOutput(t *testing.T) { 748 | buf := new(bytes.Buffer) 749 | w := NewBufferedWriter(buf) 750 | defer w.Close() 751 | w.Write([]byte("abcd")) // Not compressible. 752 | w.Flush() 753 | w.Write(bytes.Repeat([]byte{'A'}, 150)) // Compressible. 754 | w.Flush() 755 | // The next chunk is also compressible, but a naive, greedy encoding of the 756 | // overall length 67 copy as a length 64 copy (the longest expressible as a 757 | // tagCopy1 or tagCopy2) plus a length 3 remainder would be two 3-byte 758 | // tagCopy2 tags (6 bytes), since the minimum length for a tagCopy1 is 4 759 | // bytes. Instead, we could do it shorter, in 5 bytes: a 3-byte tagCopy2 760 | // (of length 60) and a 2-byte tagCopy1 (of length 7). 761 | w.Write(bytes.Repeat([]byte{'B'}, 68)) 762 | w.Write([]byte("efC")) // Not compressible. 763 | w.Write(bytes.Repeat([]byte{'C'}, 20)) // Compressible. 764 | w.Write(bytes.Repeat([]byte{'B'}, 20)) // Compressible. 765 | w.Write([]byte("g")) // Not compressible. 766 | w.Flush() 767 | 768 | got := buf.String() 769 | want := strings.Join([]string{ 770 | magicChunk, 771 | "\x01\x08\x00\x00", // Uncompressed chunk, 8 bytes long (including 4 byte checksum). 772 | "\x68\x10\xe6\xb6", // Checksum. 773 | "\x61\x62\x63\x64", // Uncompressed payload: "abcd". 774 | "\x00\x11\x00\x00", // Compressed chunk, 17 bytes long (including 4 byte checksum). 775 | "\x5f\xeb\xf2\x10", // Checksum. 776 | "\x96\x01", // Compressed payload: Uncompressed length (varint encoded): 150. 777 | "\x00\x41", // Compressed payload: tagLiteral, length=1, "A". 778 | "\xfe\x01\x00", // Compressed payload: tagCopy2, length=64, offset=1. 779 | "\xfe\x01\x00", // Compressed payload: tagCopy2, length=64, offset=1. 780 | "\x52\x01\x00", // Compressed payload: tagCopy2, length=21, offset=1. 781 | "\x00\x18\x00\x00", // Compressed chunk, 24 bytes long (including 4 byte checksum). 782 | "\x30\x85\x69\xeb", // Checksum. 783 | "\x70", // Compressed payload: Uncompressed length (varint encoded): 112. 784 | "\x00\x42", // Compressed payload: tagLiteral, length=1, "B". 785 | "\xee\x01\x00", // Compressed payload: tagCopy2, length=60, offset=1. 786 | "\x0d\x01", // Compressed payload: tagCopy1, length=7, offset=1. 787 | "\x08\x65\x66\x43", // Compressed payload: tagLiteral, length=3, "efC". 788 | "\x4e\x01\x00", // Compressed payload: tagCopy2, length=20, offset=1. 789 | "\x4e\x5a\x00", // Compressed payload: tagCopy2, length=20, offset=90. 790 | "\x00\x67", // Compressed payload: tagLiteral, length=1, "g". 791 | }, "") 792 | if got != want { 793 | t.Fatalf("\ngot: % x\nwant: % x", got, want) 794 | } 795 | } 796 | 797 | func TestEmitLiteral(t *testing.T) { 798 | testCases := []struct { 799 | length int 800 | want string 801 | }{ 802 | {1, "\x00"}, 803 | {2, "\x04"}, 804 | {59, "\xe8"}, 805 | {60, "\xec"}, 806 | {61, "\xf0\x3c"}, 807 | {62, "\xf0\x3d"}, 808 | {254, "\xf0\xfd"}, 809 | {255, "\xf0\xfe"}, 810 | {256, "\xf0\xff"}, 811 | {257, "\xf4\x00\x01"}, 812 | {65534, "\xf4\xfd\xff"}, 813 | {65535, "\xf4\xfe\xff"}, 814 | {65536, "\xf4\xff\xff"}, 815 | } 816 | 817 | dst := make([]byte, 70000) 818 | nines := bytes.Repeat([]byte{0x99}, 65536) 819 | for _, tc := range testCases { 820 | lit := nines[:tc.length] 821 | n := emitLiteral(dst, lit) 822 | if !bytes.HasSuffix(dst[:n], lit) { 823 | t.Errorf("length=%d: did not end with that many literal bytes", tc.length) 824 | continue 825 | } 826 | got := string(dst[:n-tc.length]) 827 | if got != tc.want { 828 | t.Errorf("length=%d:\ngot % x\nwant % x", tc.length, got, tc.want) 829 | continue 830 | } 831 | } 832 | } 833 | 834 | func TestEmitCopy(t *testing.T) { 835 | testCases := []struct { 836 | offset int 837 | length int 838 | want string 839 | }{ 840 | {8, 04, "\x01\x08"}, 841 | {8, 11, "\x1d\x08"}, 842 | {8, 12, "\x2e\x08\x00"}, 843 | {8, 13, "\x32\x08\x00"}, 844 | {8, 59, "\xea\x08\x00"}, 845 | {8, 60, "\xee\x08\x00"}, 846 | {8, 61, "\xf2\x08\x00"}, 847 | {8, 62, "\xf6\x08\x00"}, 848 | {8, 63, "\xfa\x08\x00"}, 849 | {8, 64, "\xfe\x08\x00"}, 850 | {8, 65, "\xee\x08\x00\x05\x08"}, 851 | {8, 66, "\xee\x08\x00\x09\x08"}, 852 | {8, 67, "\xee\x08\x00\x0d\x08"}, 853 | {8, 68, "\xfe\x08\x00\x01\x08"}, 854 | {8, 69, "\xfe\x08\x00\x05\x08"}, 855 | {8, 80, "\xfe\x08\x00\x3e\x08\x00"}, 856 | 857 | {256, 04, "\x21\x00"}, 858 | {256, 11, "\x3d\x00"}, 859 | {256, 12, "\x2e\x00\x01"}, 860 | {256, 13, "\x32\x00\x01"}, 861 | {256, 59, "\xea\x00\x01"}, 862 | {256, 60, "\xee\x00\x01"}, 863 | {256, 61, "\xf2\x00\x01"}, 864 | {256, 62, "\xf6\x00\x01"}, 865 | {256, 63, "\xfa\x00\x01"}, 866 | {256, 64, "\xfe\x00\x01"}, 867 | {256, 65, "\xee\x00\x01\x25\x00"}, 868 | {256, 66, "\xee\x00\x01\x29\x00"}, 869 | {256, 67, "\xee\x00\x01\x2d\x00"}, 870 | {256, 68, "\xfe\x00\x01\x21\x00"}, 871 | {256, 69, "\xfe\x00\x01\x25\x00"}, 872 | {256, 80, "\xfe\x00\x01\x3e\x00\x01"}, 873 | 874 | {2048, 04, "\x0e\x00\x08"}, 875 | {2048, 11, "\x2a\x00\x08"}, 876 | {2048, 12, "\x2e\x00\x08"}, 877 | {2048, 13, "\x32\x00\x08"}, 878 | {2048, 59, "\xea\x00\x08"}, 879 | {2048, 60, "\xee\x00\x08"}, 880 | {2048, 61, "\xf2\x00\x08"}, 881 | {2048, 62, "\xf6\x00\x08"}, 882 | {2048, 63, "\xfa\x00\x08"}, 883 | {2048, 64, "\xfe\x00\x08"}, 884 | {2048, 65, "\xee\x00\x08\x12\x00\x08"}, 885 | {2048, 66, "\xee\x00\x08\x16\x00\x08"}, 886 | {2048, 67, "\xee\x00\x08\x1a\x00\x08"}, 887 | {2048, 68, "\xfe\x00\x08\x0e\x00\x08"}, 888 | {2048, 69, "\xfe\x00\x08\x12\x00\x08"}, 889 | {2048, 80, "\xfe\x00\x08\x3e\x00\x08"}, 890 | } 891 | 892 | dst := make([]byte, 1024) 893 | for _, tc := range testCases { 894 | n := emitCopy(dst, tc.offset, tc.length) 895 | got := string(dst[:n]) 896 | if got != tc.want { 897 | t.Errorf("offset=%d, length=%d:\ngot % x\nwant % x", tc.offset, tc.length, got, tc.want) 898 | } 899 | } 900 | } 901 | 902 | func TestNewBufferedWriter(t *testing.T) { 903 | // Test all 32 possible sub-sequences of these 5 input slices. 904 | // 905 | // Their lengths sum to 400,000, which is over 6 times the Writer ibuf 906 | // capacity: 6 * maxBlockSize is 393,216. 907 | inputs := [][]byte{ 908 | bytes.Repeat([]byte{'a'}, 40000), 909 | bytes.Repeat([]byte{'b'}, 150000), 910 | bytes.Repeat([]byte{'c'}, 60000), 911 | bytes.Repeat([]byte{'d'}, 120000), 912 | bytes.Repeat([]byte{'e'}, 30000), 913 | } 914 | loop: 915 | for i := 0; i < 1< 0; { 1261 | i := copy(x, src) 1262 | x = x[i:] 1263 | } 1264 | return dst 1265 | } 1266 | 1267 | func benchWords(b *testing.B, n int, decode bool) { 1268 | // Note: the file is OS-language dependent so the resulting values are not 1269 | // directly comparable for non-US-English OS installations. 1270 | data := expand(readFile(b, "/usr/share/dict/words"), n) 1271 | if decode { 1272 | benchDecode(b, data) 1273 | } else { 1274 | benchEncode(b, data) 1275 | } 1276 | } 1277 | 1278 | func BenchmarkWordsDecode1e1(b *testing.B) { benchWords(b, 1e1, true) } 1279 | func BenchmarkWordsDecode1e2(b *testing.B) { benchWords(b, 1e2, true) } 1280 | func BenchmarkWordsDecode1e3(b *testing.B) { benchWords(b, 1e3, true) } 1281 | func BenchmarkWordsDecode1e4(b *testing.B) { benchWords(b, 1e4, true) } 1282 | func BenchmarkWordsDecode1e5(b *testing.B) { benchWords(b, 1e5, true) } 1283 | func BenchmarkWordsDecode1e6(b *testing.B) { benchWords(b, 1e6, true) } 1284 | func BenchmarkWordsEncode1e1(b *testing.B) { benchWords(b, 1e1, false) } 1285 | func BenchmarkWordsEncode1e2(b *testing.B) { benchWords(b, 1e2, false) } 1286 | func BenchmarkWordsEncode1e3(b *testing.B) { benchWords(b, 1e3, false) } 1287 | func BenchmarkWordsEncode1e4(b *testing.B) { benchWords(b, 1e4, false) } 1288 | func BenchmarkWordsEncode1e5(b *testing.B) { benchWords(b, 1e5, false) } 1289 | func BenchmarkWordsEncode1e6(b *testing.B) { benchWords(b, 1e6, false) } 1290 | 1291 | func BenchmarkRandomEncode(b *testing.B) { 1292 | rng := rand.New(rand.NewSource(1)) 1293 | data := make([]byte, 1<<20) 1294 | for i := range data { 1295 | data[i] = uint8(rng.Intn(256)) 1296 | } 1297 | benchEncode(b, data) 1298 | } 1299 | 1300 | // testFiles' values are copied directly from 1301 | // https://raw.githubusercontent.com/google/snappy/master/snappy_unittest.cc 1302 | // The label field is unused in snappy-go. 1303 | var testFiles = []struct { 1304 | label string 1305 | filename string 1306 | sizeLimit int 1307 | }{ 1308 | {"html", "html", 0}, 1309 | {"urls", "urls.10K", 0}, 1310 | {"jpg", "fireworks.jpeg", 0}, 1311 | {"jpg_200", "fireworks.jpeg", 200}, 1312 | {"pdf", "paper-100k.pdf", 0}, 1313 | {"html4", "html_x_4", 0}, 1314 | {"txt1", "alice29.txt", 0}, 1315 | {"txt2", "asyoulik.txt", 0}, 1316 | {"txt3", "lcet10.txt", 0}, 1317 | {"txt4", "plrabn12.txt", 0}, 1318 | {"pb", "geo.protodata", 0}, 1319 | {"gaviota", "kppkn.gtb", 0}, 1320 | } 1321 | 1322 | const ( 1323 | // The benchmark data files are at this canonical URL. 1324 | benchURL = "https://raw.githubusercontent.com/google/snappy/master/testdata/" 1325 | ) 1326 | 1327 | func downloadBenchmarkFiles(b testing.TB, basename string) (errRet error) { 1328 | bDir := filepath.FromSlash(*benchdataDir) 1329 | filename := filepath.Join(bDir, basename) 1330 | if stat, err := os.Stat(filename); err == nil && stat.Size() != 0 { 1331 | return nil 1332 | } 1333 | 1334 | if !*download { 1335 | b.Skipf("test data not found; skipping %s without the -download flag", testOrBenchmark(b)) 1336 | } 1337 | // Download the official snappy C++ implementation reference test data 1338 | // files for benchmarking. 1339 | if err := os.MkdirAll(bDir, 0777); err != nil && !os.IsExist(err) { 1340 | return fmt.Errorf("failed to create %s: %s", bDir, err) 1341 | } 1342 | 1343 | f, err := os.Create(filename) 1344 | if err != nil { 1345 | return fmt.Errorf("failed to create %s: %s", filename, err) 1346 | } 1347 | defer f.Close() 1348 | defer func() { 1349 | if errRet != nil { 1350 | os.Remove(filename) 1351 | } 1352 | }() 1353 | url := benchURL + basename 1354 | resp, err := http.Get(url) 1355 | if err != nil { 1356 | return fmt.Errorf("failed to download %s: %s", url, err) 1357 | } 1358 | defer resp.Body.Close() 1359 | if s := resp.StatusCode; s != http.StatusOK { 1360 | return fmt.Errorf("downloading %s: HTTP status code %d (%s)", url, s, http.StatusText(s)) 1361 | } 1362 | _, err = io.Copy(f, resp.Body) 1363 | if err != nil { 1364 | return fmt.Errorf("failed to download %s to %s: %s", url, filename, err) 1365 | } 1366 | return nil 1367 | } 1368 | 1369 | func benchFile(b *testing.B, i int, decode bool) { 1370 | if err := downloadBenchmarkFiles(b, testFiles[i].filename); err != nil { 1371 | b.Fatalf("failed to download testdata: %s", err) 1372 | } 1373 | bDir := filepath.FromSlash(*benchdataDir) 1374 | data := readFile(b, filepath.Join(bDir, testFiles[i].filename)) 1375 | if n := testFiles[i].sizeLimit; 0 < n && n < len(data) { 1376 | data = data[:n] 1377 | } 1378 | if decode { 1379 | benchDecode(b, data) 1380 | } else { 1381 | benchEncode(b, data) 1382 | } 1383 | } 1384 | 1385 | // Naming convention is kept similar to what snappy's C++ implementation uses. 1386 | func Benchmark_UFlat0(b *testing.B) { benchFile(b, 0, true) } 1387 | func Benchmark_UFlat1(b *testing.B) { benchFile(b, 1, true) } 1388 | func Benchmark_UFlat2(b *testing.B) { benchFile(b, 2, true) } 1389 | func Benchmark_UFlat3(b *testing.B) { benchFile(b, 3, true) } 1390 | func Benchmark_UFlat4(b *testing.B) { benchFile(b, 4, true) } 1391 | func Benchmark_UFlat5(b *testing.B) { benchFile(b, 5, true) } 1392 | func Benchmark_UFlat6(b *testing.B) { benchFile(b, 6, true) } 1393 | func Benchmark_UFlat7(b *testing.B) { benchFile(b, 7, true) } 1394 | func Benchmark_UFlat8(b *testing.B) { benchFile(b, 8, true) } 1395 | func Benchmark_UFlat9(b *testing.B) { benchFile(b, 9, true) } 1396 | func Benchmark_UFlat10(b *testing.B) { benchFile(b, 10, true) } 1397 | func Benchmark_UFlat11(b *testing.B) { benchFile(b, 11, true) } 1398 | func Benchmark_ZFlat0(b *testing.B) { benchFile(b, 0, false) } 1399 | func Benchmark_ZFlat1(b *testing.B) { benchFile(b, 1, false) } 1400 | func Benchmark_ZFlat2(b *testing.B) { benchFile(b, 2, false) } 1401 | func Benchmark_ZFlat3(b *testing.B) { benchFile(b, 3, false) } 1402 | func Benchmark_ZFlat4(b *testing.B) { benchFile(b, 4, false) } 1403 | func Benchmark_ZFlat5(b *testing.B) { benchFile(b, 5, false) } 1404 | func Benchmark_ZFlat6(b *testing.B) { benchFile(b, 6, false) } 1405 | func Benchmark_ZFlat7(b *testing.B) { benchFile(b, 7, false) } 1406 | func Benchmark_ZFlat8(b *testing.B) { benchFile(b, 8, false) } 1407 | func Benchmark_ZFlat9(b *testing.B) { benchFile(b, 9, false) } 1408 | func Benchmark_ZFlat10(b *testing.B) { benchFile(b, 10, false) } 1409 | func Benchmark_ZFlat11(b *testing.B) { benchFile(b, 11, false) } 1410 | 1411 | func BenchmarkExtendMatch(b *testing.B) { 1412 | tDir := filepath.FromSlash(*testdataDir) 1413 | src, err := ioutil.ReadFile(filepath.Join(tDir, goldenText)) 1414 | if err != nil { 1415 | b.Fatalf("ReadFile: %v", err) 1416 | } 1417 | b.ResetTimer() 1418 | for i := 0; i < b.N; i++ { 1419 | for _, tc := range extendMatchGoldenTestCases { 1420 | extendMatch(src, tc.i, tc.j) 1421 | } 1422 | } 1423 | } 1424 | -------------------------------------------------------------------------------- /testdata/Isaac.Newton-Opticks.txt: -------------------------------------------------------------------------------- 1 | Produced by Suzanne Lybarger, steve harris, Josephine 2 | Paolucci and the Online Distributed Proofreading Team at 3 | http://www.pgdp.net. 4 | 5 | 6 | 7 | 8 | 9 | 10 | OPTICKS: 11 | 12 | OR, A 13 | 14 | TREATISE 15 | 16 | OF THE 17 | 18 | _Reflections_, _Refractions_, 19 | _Inflections_ and _Colours_ 20 | 21 | OF 22 | 23 | LIGHT. 24 | 25 | _The_ FOURTH EDITION, _corrected_. 26 | 27 | By Sir _ISAAC NEWTON_, Knt. 28 | 29 | LONDON: 30 | 31 | Printed for WILLIAM INNYS at the West-End of St. _Paul's_. MDCCXXX. 32 | 33 | TITLE PAGE OF THE 1730 EDITION 34 | 35 | 36 | 37 | 38 | SIR ISAAC NEWTON'S ADVERTISEMENTS 39 | 40 | 41 | 42 | 43 | Advertisement I 44 | 45 | 46 | _Part of the ensuing Discourse about Light was written at the Desire of 47 | some Gentlemen of the_ Royal-Society, _in the Year 1675, and then sent 48 | to their Secretary, and read at their Meetings, and the rest was added 49 | about twelve Years after to complete the Theory; except the third Book, 50 | and the last Proposition of the Second, which were since put together 51 | out of scatter'd Papers. To avoid being engaged in Disputes about these 52 | Matters, I have hitherto delayed the printing, and should still have 53 | delayed it, had not the Importunity of Friends prevailed upon me. If any 54 | other Papers writ on this Subject are got out of my Hands they are 55 | imperfect, and were perhaps written before I had tried all the 56 | Experiments here set down, and fully satisfied my self about the Laws of 57 | Refractions and Composition of Colours. I have here publish'd what I 58 | think proper to come abroad, wishing that it may not be translated into 59 | another Language without my Consent._ 60 | 61 | _The Crowns of Colours, which sometimes appear about the Sun and Moon, I 62 | have endeavoured to give an Account of; but for want of sufficient 63 | Observations leave that Matter to be farther examined. The Subject of 64 | the Third Book I have also left imperfect, not having tried all the 65 | Experiments which I intended when I was about these Matters, nor 66 | repeated some of those which I did try, until I had satisfied my self 67 | about all their Circumstances. To communicate what I have tried, and 68 | leave the rest to others for farther Enquiry, is all my Design in 69 | publishing these Papers._ 70 | 71 | _In a Letter written to Mr._ Leibnitz _in the year 1679, and published 72 | by Dr._ Wallis, _I mention'd a Method by which I had found some general 73 | Theorems about squaring Curvilinear Figures, or comparing them with the 74 | Conic Sections, or other the simplest Figures with which they may be 75 | compared. And some Years ago I lent out a Manuscript containing such 76 | Theorems, and having since met with some Things copied out of it, I have 77 | on this Occasion made it publick, prefixing to it an_ Introduction, _and 78 | subjoining a_ Scholium _concerning that Method. And I have joined with 79 | it another small Tract concerning the Curvilinear Figures of the Second 80 | Kind, which was also written many Years ago, and made known to some 81 | Friends, who have solicited the making it publick._ 82 | 83 | _I. N._ 84 | 85 | April 1, 1704. 86 | 87 | 88 | Advertisement II 89 | 90 | _In this Second Edition of these Opticks I have omitted the Mathematical 91 | Tracts publish'd at the End of the former Edition, as not belonging to 92 | the Subject. And at the End of the Third Book I have added some 93 | Questions. And to shew that I do not take Gravity for an essential 94 | Property of Bodies, I have added one Question concerning its Cause, 95 | chusing to propose it by way of a Question, because I am not yet 96 | satisfied about it for want of Experiments._ 97 | 98 | _I. N._ 99 | 100 | July 16, 1717. 101 | 102 | 103 | Advertisement to this Fourth Edition 104 | 105 | _This new Edition of Sir_ Isaac Newton's Opticks _is carefully printed 106 | from the Third Edition, as it was corrected by the Author's own Hand, 107 | and left before his Death with the Bookseller. Since Sir_ Isaac's 108 | Lectiones Opticæ, _which he publickly read in the University of_ 109 | Cambridge _in the Years 1669, 1670, and 1671, are lately printed, it has 110 | been thought proper to make at the bottom of the Pages several Citations 111 | from thence, where may be found the Demonstrations, which the Author 112 | omitted in these_ Opticks. 113 | 114 | * * * * * 115 | 116 | Transcriber's Note: There are several greek letters used in the 117 | descriptions of the illustrations. They are signified by [Greek: 118 | letter]. Square roots are noted by the letters sqrt before the equation. 119 | 120 | * * * * * 121 | 122 | THE FIRST BOOK OF OPTICKS 123 | 124 | 125 | 126 | 127 | _PART I._ 128 | 129 | 130 | My Design in this Book is not to explain the Properties of Light by 131 | Hypotheses, but to propose and prove them by Reason and Experiments: In 132 | order to which I shall premise the following Definitions and Axioms. 133 | 134 | 135 | 136 | 137 | _DEFINITIONS_ 138 | 139 | 140 | DEFIN. I. 141 | 142 | _By the Rays of Light I understand its least Parts, and those as well 143 | Successive in the same Lines, as Contemporary in several Lines._ For it 144 | is manifest that Light consists of Parts, both Successive and 145 | Contemporary; because in the same place you may stop that which comes 146 | one moment, and let pass that which comes presently after; and in the 147 | same time you may stop it in any one place, and let it pass in any 148 | other. For that part of Light which is stopp'd cannot be the same with 149 | that which is let pass. The least Light or part of Light, which may be 150 | stopp'd alone without the rest of the Light, or propagated alone, or do 151 | or suffer any thing alone, which the rest of the Light doth not or 152 | suffers not, I call a Ray of Light. 153 | 154 | 155 | DEFIN. II. 156 | 157 | _Refrangibility of the Rays of Light, is their Disposition to be 158 | refracted or turned out of their Way in passing out of one transparent 159 | Body or Medium into another. And a greater or less Refrangibility of 160 | Rays, is their Disposition to be turned more or less out of their Way in 161 | like Incidences on the same Medium._ Mathematicians usually consider the 162 | Rays of Light to be Lines reaching from the luminous Body to the Body 163 | illuminated, and the refraction of those Rays to be the bending or 164 | breaking of those lines in their passing out of one Medium into another. 165 | And thus may Rays and Refractions be considered, if Light be propagated 166 | in an instant. But by an Argument taken from the Æquations of the times 167 | of the Eclipses of _Jupiter's Satellites_, it seems that Light is 168 | propagated in time, spending in its passage from the Sun to us about 169 | seven Minutes of time: And therefore I have chosen to define Rays and 170 | Refractions in such general terms as may agree to Light in both cases. 171 | 172 | 173 | DEFIN. III. 174 | 175 | _Reflexibility of Rays, is their Disposition to be reflected or turned 176 | back into the same Medium from any other Medium upon whose Surface they 177 | fall. And Rays are more or less reflexible, which are turned back more 178 | or less easily._ As if Light pass out of a Glass into Air, and by being 179 | inclined more and more to the common Surface of the Glass and Air, 180 | begins at length to be totally reflected by that Surface; those sorts of 181 | Rays which at like Incidences are reflected most copiously, or by 182 | inclining the Rays begin soonest to be totally reflected, are most 183 | reflexible. 184 | 185 | 186 | DEFIN. IV. 187 | 188 | _The Angle of Incidence is that Angle, which the Line described by the 189 | incident Ray contains with the Perpendicular to the reflecting or 190 | refracting Surface at the Point of Incidence._ 191 | 192 | 193 | DEFIN. V. 194 | 195 | _The Angle of Reflexion or Refraction, is the Angle which the line 196 | described by the reflected or refracted Ray containeth with the 197 | Perpendicular to the reflecting or refracting Surface at the Point of 198 | Incidence._ 199 | 200 | 201 | DEFIN. VI. 202 | 203 | _The Sines of Incidence, Reflexion, and Refraction, are the Sines of the 204 | Angles of Incidence, Reflexion, and Refraction._ 205 | 206 | 207 | DEFIN. VII 208 | 209 | _The Light whose Rays are all alike Refrangible, I call Simple, 210 | Homogeneal and Similar; and that whose Rays are some more Refrangible 211 | than others, I call Compound, Heterogeneal and Dissimilar._ The former 212 | Light I call Homogeneal, not because I would affirm it so in all 213 | respects, but because the Rays which agree in Refrangibility, agree at 214 | least in all those their other Properties which I consider in the 215 | following Discourse. 216 | 217 | 218 | DEFIN. VIII. 219 | 220 | _The Colours of Homogeneal Lights, I call Primary, Homogeneal and 221 | Simple; and those of Heterogeneal Lights, Heterogeneal and Compound._ 222 | For these are always compounded of the colours of Homogeneal Lights; as 223 | will appear in the following Discourse. 224 | 225 | 226 | 227 | 228 | _AXIOMS._ 229 | 230 | 231 | AX. I. 232 | 233 | _The Angles of Reflexion and Refraction, lie in one and the same Plane 234 | with the Angle of Incidence._ 235 | 236 | 237 | AX. II. 238 | 239 | _The Angle of Reflexion is equal to the Angle of Incidence._ 240 | 241 | 242 | AX. III. 243 | 244 | _If the refracted Ray be returned directly back to the Point of 245 | Incidence, it shall be refracted into the Line before described by the 246 | incident Ray._ 247 | 248 | 249 | AX. IV. 250 | 251 | _Refraction out of the rarer Medium into the denser, is made towards the 252 | Perpendicular; that is, so that the Angle of Refraction be less than the 253 | Angle of Incidence._ 254 | 255 | 256 | AX. V. 257 | 258 | _The Sine of Incidence is either accurately or very nearly in a given 259 | Ratio to the Sine of Refraction._ 260 | 261 | Whence if that Proportion be known in any one Inclination of the 262 | incident Ray, 'tis known in all the Inclinations, and thereby the 263 | Refraction in all cases of Incidence on the same refracting Body may be 264 | determined. Thus if the Refraction be made out of Air into Water, the 265 | Sine of Incidence of the red Light is to the Sine of its Refraction as 4 266 | to 3. If out of Air into Glass, the Sines are as 17 to 11. In Light of 267 | other Colours the Sines have other Proportions: but the difference is so 268 | little that it need seldom be considered. 269 | 270 | [Illustration: FIG. 1] 271 | 272 | Suppose therefore, that RS [in _Fig._ 1.] represents the Surface of 273 | stagnating Water, and that C is the point of Incidence in which any Ray 274 | coming in the Air from A in the Line AC is reflected or refracted, and I 275 | would know whither this Ray shall go after Reflexion or Refraction: I 276 | erect upon the Surface of the Water from the point of Incidence the 277 | Perpendicular CP and produce it downwards to Q, and conclude by the 278 | first Axiom, that the Ray after Reflexion and Refraction, shall be 279 | found somewhere in the Plane of the Angle of Incidence ACP produced. I 280 | let fall therefore upon the Perpendicular CP the Sine of Incidence AD; 281 | and if the reflected Ray be desired, I produce AD to B so that DB be 282 | equal to AD, and draw CB. For this Line CB shall be the reflected Ray; 283 | the Angle of Reflexion BCP and its Sine BD being equal to the Angle and 284 | Sine of Incidence, as they ought to be by the second Axiom, But if the 285 | refracted Ray be desired, I produce AD to H, so that DH may be to AD as 286 | the Sine of Refraction to the Sine of Incidence, that is, (if the Light 287 | be red) as 3 to 4; and about the Center C and in the Plane ACP with the 288 | Radius CA describing a Circle ABE, I draw a parallel to the 289 | Perpendicular CPQ, the Line HE cutting the Circumference in E, and 290 | joining CE, this Line CE shall be the Line of the refracted Ray. For if 291 | EF be let fall perpendicularly on the Line PQ, this Line EF shall be the 292 | Sine of Refraction of the Ray CE, the Angle of Refraction being ECQ; and 293 | this Sine EF is equal to DH, and consequently in Proportion to the Sine 294 | of Incidence AD as 3 to 4. 295 | 296 | In like manner, if there be a Prism of Glass (that is, a Glass bounded 297 | with two Equal and Parallel Triangular ends, and three plain and well 298 | polished Sides, which meet in three Parallel Lines running from the 299 | three Angles of one end to the three Angles of the other end) and if the 300 | Refraction of the Light in passing cross this Prism be desired: Let ACB 301 | [in _Fig._ 2.] represent a Plane cutting this Prism transversly to its 302 | three Parallel lines or edges there where the Light passeth through it, 303 | and let DE be the Ray incident upon the first side of the Prism AC where 304 | the Light goes into the Glass; and by putting the Proportion of the Sine 305 | of Incidence to the Sine of Refraction as 17 to 11 find EF the first 306 | refracted Ray. Then taking this Ray for the Incident Ray upon the second 307 | side of the Glass BC where the Light goes out, find the next refracted 308 | Ray FG by putting the Proportion of the Sine of Incidence to the Sine of 309 | Refraction as 11 to 17. For if the Sine of Incidence out of Air into 310 | Glass be to the Sine of Refraction as 17 to 11, the Sine of Incidence 311 | out of Glass into Air must on the contrary be to the Sine of Refraction 312 | as 11 to 17, by the third Axiom. 313 | 314 | [Illustration: FIG. 2.] 315 | 316 | Much after the same manner, if ACBD [in _Fig._ 3.] represent a Glass 317 | spherically convex on both sides (usually called a _Lens_, such as is a 318 | Burning-glass, or Spectacle-glass, or an Object-glass of a Telescope) 319 | and it be required to know how Light falling upon it from any lucid 320 | point Q shall be refracted, let QM represent a Ray falling upon any 321 | point M of its first spherical Surface ACB, and by erecting a 322 | Perpendicular to the Glass at the point M, find the first refracted Ray 323 | MN by the Proportion of the Sines 17 to 11. Let that Ray in going out of 324 | the Glass be incident upon N, and then find the second refracted Ray 325 | N_q_ by the Proportion of the Sines 11 to 17. And after the same manner 326 | may the Refraction be found when the Lens is convex on one side and 327 | plane or concave on the other, or concave on both sides. 328 | 329 | [Illustration: FIG. 3.] 330 | 331 | 332 | AX. VI. 333 | 334 | _Homogeneal Rays which flow from several Points of any Object, and fall 335 | perpendicularly or almost perpendicularly on any reflecting or 336 | refracting Plane or spherical Surface, shall afterwards diverge from so 337 | many other Points, or be parallel to so many other Lines, or converge to 338 | so many other Points, either accurately or without any sensible Error. 339 | And the same thing will happen, if the Rays be reflected or refracted 340 | successively by two or three or more Plane or Spherical Surfaces._ 341 | 342 | The Point from which Rays diverge or to which they converge may be 343 | called their _Focus_. And the Focus of the incident Rays being given, 344 | that of the reflected or refracted ones may be found by finding the 345 | Refraction of any two Rays, as above; or more readily thus. 346 | 347 | _Cas._ 1. Let ACB [in _Fig._ 4.] be a reflecting or refracting Plane, 348 | and Q the Focus of the incident Rays, and Q_q_C a Perpendicular to that 349 | Plane. And if this Perpendicular be produced to _q_, so that _q_C be 350 | equal to QC, the Point _q_ shall be the Focus of the reflected Rays: Or 351 | if _q_C be taken on the same side of the Plane with QC, and in 352 | proportion to QC as the Sine of Incidence to the Sine of Refraction, the 353 | Point _q_ shall be the Focus of the refracted Rays. 354 | 355 | [Illustration: FIG. 4.] 356 | 357 | _Cas._ 2. Let ACB [in _Fig._ 5.] be the reflecting Surface of any Sphere 358 | whose Centre is E. Bisect any Radius thereof, (suppose EC) in T, and if 359 | in that Radius on the same side the Point T you take the Points Q and 360 | _q_, so that TQ, TE, and T_q_, be continual Proportionals, and the Point 361 | Q be the Focus of the incident Rays, the Point _q_ shall be the Focus of 362 | the reflected ones. 363 | 364 | [Illustration: FIG. 5.] 365 | 366 | _Cas._ 3. Let ACB [in _Fig._ 6.] be the refracting Surface of any Sphere 367 | whose Centre is E. In any Radius thereof EC produced both ways take ET 368 | and C_t_ equal to one another and severally in such Proportion to that 369 | Radius as the lesser of the Sines of Incidence and Refraction hath to 370 | the difference of those Sines. And then if in the same Line you find any 371 | two Points Q and _q_, so that TQ be to ET as E_t_ to _tq_, taking _tq_ 372 | the contrary way from _t_ which TQ lieth from T, and if the Point Q be 373 | the Focus of any incident Rays, the Point _q_ shall be the Focus of the 374 | refracted ones. 375 | 376 | [Illustration: FIG. 6.] 377 | 378 | And by the same means the Focus of the Rays after two or more Reflexions 379 | or Refractions may be found. 380 | 381 | [Illustration: FIG. 7.] 382 | 383 | _Cas._ 4. Let ACBD [in _Fig._ 7.] be any refracting Lens, spherically 384 | Convex or Concave or Plane on either side, and let CD be its Axis (that 385 | is, the Line which cuts both its Surfaces perpendicularly, and passes 386 | through the Centres of the Spheres,) and in this Axis produced let F and 387 | _f_ be the Foci of the refracted Rays found as above, when the incident 388 | Rays on both sides the Lens are parallel to the same Axis; and upon the 389 | Diameter F_f_ bisected in E, describe a Circle. Suppose now that any 390 | Point Q be the Focus of any incident Rays. Draw QE cutting the said 391 | Circle in T and _t_, and therein take _tq_ in such proportion to _t_E as 392 | _t_E or TE hath to TQ. Let _tq_ lie the contrary way from _t_ which TQ 393 | doth from T, and _q_ shall be the Focus of the refracted Rays without 394 | any sensible Error, provided the Point Q be not so remote from the Axis, 395 | nor the Lens so broad as to make any of the Rays fall too obliquely on 396 | the refracting Surfaces.[A] 397 | 398 | And by the like Operations may the reflecting or refracting Surfaces be 399 | found when the two Foci are given, and thereby a Lens be formed, which 400 | shall make the Rays flow towards or from what Place you please.[B] 401 | -------------------------------------------------------------------------------- /testdata/Isaac.Newton-Opticks.txt.rawsnappy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/golang/snappy/43d5d4cd4e0e3390b0b645d5c3ef1187642403d8/testdata/Isaac.Newton-Opticks.txt.rawsnappy --------------------------------------------------------------------------------