├── .github └── FUNDING.yml ├── CHANGELOG.md ├── LICENSE ├── README.md ├── composer.json └── src └── TarDecoder.php /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | github: clue 2 | custom: https://clue.engineering/support 3 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | ## 0.2.0 (2019-09-09) 4 | 5 | * Feature / BC break: Support latest ReactPHP stream version and strictly follow stream semantics. 6 | (#11 by @clue) 7 | 8 | * Feature: Add backpressure support and support throttling. 9 | (#12 by @clue) 10 | 11 | * Improve test suite by adding PHPUnit to `require-dev`, support PHPUnit 7 - legacy PHPUnit 4, 12 | test against legacy PHP 5.3 through PHP 7.3 and update project homepage. 13 | (#9 and #10 by @clue) 14 | 15 | ## 0.1.0 (2015-06-17) 16 | 17 | * First tagged release 18 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Christian Lück 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is furnished 10 | to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # clue/reactphp-tar 2 | 3 | [![CI status](https://github.com/clue/reactphp-tar/actions/workflows/ci.yml/badge.svg)](https://github.com/clue/reactphp-tar/actions) 4 | [![installs on Packagist](https://img.shields.io/packagist/dt/clue/tar-react?color=blue&label=installs%20on%20Packagist)](https://packagist.org/packages/clue/tar-react) 5 | 6 | Streaming parser to extract tarballs with [ReactPHP](https://reactphp.org/). 7 | 8 | The [TAR file format](https://en.wikipedia.org/wiki/Tar_%28computing%29) is a 9 | common archive format to store several files in a single archive file (commonly 10 | referred to as "tarball" with a `.tar` extension). This lightweight library 11 | provides an efficient implementation to extract tarballs in a streaming fashion, 12 | processing one chunk at a time in memory without having to rely on disk I/O. 13 | 14 | **Table of Contents** 15 | 16 | * [Quickstart example](#quickstart-example) 17 | * [Install](#install) 18 | * [Tests](#tests) 19 | * [License](#license) 20 | * [More](#more) 21 | 22 | > Note: This project is in beta stage! Feel free to report any issues you encounter. 23 | 24 | ## Quickstart example 25 | 26 | Once [installed](#install), you can use the following code to pipe a readable 27 | tar stream into the `TarDecoder` which emits "entry" events for each individual file: 28 | 29 | ```php 30 | on('entry', function (array $header, React\Stream\ReadableStreamInterface $file) { 39 | echo 'File ' . $header['filename']; 40 | echo ' (' . $header['size'] . ' bytes):' . PHP_EOL; 41 | 42 | $file->on('data', function ($chunk) { 43 | echo $chunk; 44 | }); 45 | }); 46 | 47 | $stream->pipe($decoder); 48 | ``` 49 | 50 | See also the [examples](examples/). 51 | 52 | ## Install 53 | 54 | The recommended way to install this library is [through Composer](https://getcomposer.org/). 55 | [New to Composer?](https://getcomposer.org/doc/00-intro.md) 56 | 57 | While in beta, this project does not currently follow [SemVer](https://semver.org/). 58 | This will install the latest supported version: 59 | 60 | ```bash 61 | composer require clue/tar-react:^0.2 62 | ``` 63 | 64 | See also the [CHANGELOG](CHANGELOG.md) for details about version upgrades. 65 | 66 | This project aims to run on any platform and thus does not require any PHP 67 | extensions and supports running on legacy PHP 5.3 through current PHP 8+. 68 | It's *highly recommended to use the latest supported PHP version* for this project. 69 | 70 | ## Tests 71 | 72 | To run the test suite, you first need to clone this repo and then install all 73 | dependencies [through Composer](https://getcomposer.org/): 74 | 75 | ```bash 76 | composer install 77 | ``` 78 | 79 | To run the test suite, go to the project root and run: 80 | 81 | ```bash 82 | vendor/bin/phpunit 83 | ``` 84 | 85 | ## License 86 | 87 | This project is released under the permissive [MIT license](LICENSE). 88 | 89 | > Did you know that I offer custom development services and issuing invoices for 90 | sponsorships of releases and for contributions? Contact me (@clue) for details. 91 | 92 | ## More 93 | 94 | * If you want to learn more about processing streams of data, refer to the documentation of 95 | the underlying [react/stream](https://github.com/reactphp/stream) component. 96 | 97 | * If you want to process compressed tarballs (`.tar.gz` and `.tgz` file extension), you may 98 | want to use [clue/reactphp-zlib](https://github.com/clue/reactphp-zlib) on the compressed 99 | input stream before passing the decompressed stream to the tar decoder. 100 | -------------------------------------------------------------------------------- /composer.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "clue/tar-react", 3 | "description": "Streaming parser to extract tarballs with ReactPHP.", 4 | "keywords": ["tar", "archive", "tarball", "untar", "parser", "decoder", "extract", "unpack", "ReactPHP", "async"], 5 | "homepage": "https://github.com/clue/reactphp-tar", 6 | "license": "MIT", 7 | "authors": [ 8 | { 9 | "name": "Christian Lück", 10 | "email": "christian@clue.engineering" 11 | } 12 | ], 13 | "require": { 14 | "php": ">=5.3", 15 | "react/stream": "^1.2" 16 | }, 17 | "require-dev": { 18 | "clue/hexdump": "~0.2.0", 19 | "react/event-loop": "^1.2", 20 | "phpunit/phpunit": "^9.6 || ^5.7 || ^4.8.36" 21 | }, 22 | "autoload": { 23 | "psr-4": { 24 | "Clue\\React\\Tar\\": "src/" 25 | } 26 | }, 27 | "autoload-dev": { 28 | "psr-4": { 29 | "Clue\\Tests\\React\\Tar\\": "tests/" 30 | } 31 | } 32 | } 33 | -------------------------------------------------------------------------------- /src/TarDecoder.php: -------------------------------------------------------------------------------- 1 | format = "Z100name/Z8mode/Z8uid/Z8gid/Z12size/Z12mtime/Z8checksum/Z1type/Z100symlink/Z6magic/Z2version/Z32owner/Z32group/Z8deviceMajor/Z8deviceMinor/Z155prefix/Z12unpacked"; 37 | 38 | if (PHP_VERSION < 5.5) { 39 | // PHP 5.5 replaced 'a' with 'Z' (read X bytes and removing trailing NULL bytes) 40 | $this->format = str_replace('Z', 'a', $this->format); // @codeCoverageIgnore 41 | } 42 | } 43 | 44 | public function write($data) 45 | { 46 | if (!$this->writable) { 47 | return false; 48 | } 49 | 50 | // incomplete entry => read until end of entry before expecting next header 51 | if ($this->streaming !== null) { 52 | $data = $this->consumeEntry($data); 53 | 54 | // entry still incomplete => wait for next chunk 55 | if ($this->streaming !== null) { 56 | return !$this->paused; 57 | } 58 | } 59 | 60 | // trailing padding remaining => skip padding before expecting next header 61 | if ($this->padding !== 0) { 62 | $data = $this->consumePadding($data); 63 | 64 | // padding still remaining => wait for next chunk 65 | if ($this->padding !== 0) { 66 | return true; 67 | } 68 | } 69 | 70 | $this->buffer .= $data; 71 | 72 | while (isset($this->buffer[self::BLOCK_SIZE - 1])) { 73 | $header = substr($this->buffer, 0, self::BLOCK_SIZE); 74 | $this->buffer = (string)substr($this->buffer, self::BLOCK_SIZE); 75 | 76 | if (rtrim($header, "\0") === '') { 77 | // skip if whole header consists of null bytes 78 | // trailing nulls indicate end of archive, but continue reading next block anyway 79 | continue; 80 | } 81 | try { 82 | $header = $this->readHeader($header); 83 | } catch (RuntimeException $e) { 84 | // clean up before throwing 85 | $this->buffer = ''; 86 | $this->writable = false; 87 | 88 | $this->emit('error', array($e)); 89 | $this->close(); 90 | return false; 91 | } 92 | 93 | $this->streaming = new ThroughStream(); 94 | $this->remaining = $header['size']; 95 | $this->padding = $header['padding']; 96 | 97 | // entry stream is not paused by default - unless explicitly paused 98 | // emit "drain" even when entry stream is ready again to support backpressure 99 | $that = $this; 100 | $paused =& $this->paused; 101 | $paused = false; 102 | $this->streaming->on('drain', function () use (&$paused, $that) { 103 | $paused = false; 104 | $that->emit('drain'); 105 | }); 106 | $this->streaming->on('close', function () use (&$paused, $that) { 107 | if ($paused) { 108 | $paused = false; 109 | $that->emit('drain'); 110 | } 111 | }); 112 | 113 | $this->emit('entry', array($header, $this->streaming)); 114 | 115 | if ($this->remaining === 0) { 116 | $this->streaming->end(); 117 | $this->streaming = null; 118 | } else { 119 | $this->buffer = $this->consumeEntry($this->buffer); 120 | } 121 | 122 | // incomplete entry => do not read next header 123 | if ($this->streaming !== null) { 124 | return !$this->paused; 125 | } 126 | 127 | if ($this->padding !== 0) { 128 | $this->buffer = $this->consumePadding($this->buffer); 129 | } 130 | 131 | // incomplete padding => do not read next header 132 | if ($this->padding !== 0) { 133 | return true; 134 | } 135 | } 136 | 137 | return true; 138 | } 139 | 140 | public function end($data = null) 141 | { 142 | if ($data !== null) { 143 | $this->write($data); 144 | } 145 | 146 | if ($this->streaming !== null) { 147 | // input stream ended but we were still streaming an entry => emit error about incomplete entry 148 | $this->streaming->emit('error', array(new \RuntimeException('TAR input stream ended unexpectedly'))); 149 | $this->streaming->close(); 150 | $this->streaming = null; 151 | 152 | // add some dummy data to also trigger error on decoder stream 153 | $this->buffer = '.'; 154 | } 155 | 156 | if ($this->buffer !== '') { 157 | // incomplete entry in buffer 158 | $this->emit('error', array(new \RuntimeException('Stream ended with incomplete entry'))); 159 | $this->buffer = ''; 160 | } 161 | 162 | $this->writable = false; 163 | $this->close(); 164 | } 165 | 166 | public function close() 167 | { 168 | if ($this->closing) { 169 | return; 170 | } 171 | 172 | $this->closing = true; 173 | $this->writable = false; 174 | $this->buffer = ''; 175 | 176 | if ($this->streaming !== null) { 177 | // input stream ended but we were still streaming an entry => forcefully close without error 178 | $this->streaming->close(); 179 | $this->streaming = null; 180 | } 181 | 182 | // ignore whether we're still expecting NUL-padding 183 | 184 | $this->emit('close'); 185 | $this->removeAllListeners(); 186 | } 187 | 188 | public function isWritable() 189 | { 190 | return $this->writable; 191 | } 192 | 193 | private function consumeEntry($buffer) 194 | { 195 | // try to read up to [remaining] bytes from buffer 196 | $data = substr($buffer, 0, $this->remaining); 197 | $len = strlen($data); 198 | 199 | // reduce remaining buffer by number of bytes actually read 200 | $buffer = substr($buffer, $len); 201 | $this->remaining -= $len; 202 | 203 | // emit chunk of data 204 | $ret = $this->streaming->write($data); 205 | 206 | // nothing remaining => entry stream finished 207 | if ($this->remaining === 0) { 208 | $this->streaming->end(); 209 | $this->streaming = null; 210 | } 211 | 212 | // throttle input when streaming entry is still writable but returns false (backpressure) 213 | if ($ret === false && $this->streaming !== null && $this->streaming->isWritable()) { 214 | $this->paused = true; 215 | } 216 | 217 | return $buffer; 218 | } 219 | 220 | private function consumePadding($buffer) 221 | { 222 | if (strlen($buffer) > $this->padding) { 223 | // data exceeds padding => skip padding and continue 224 | $buffer = (string)substr($buffer, $this->padding); 225 | $this->padding = 0; 226 | 227 | return $buffer; 228 | } 229 | 230 | // less data than padding, skip only a bit of the padding and wait for next chunk 231 | $this->padding -= strlen($buffer); 232 | return ''; 233 | } 234 | 235 | // https://github.com/mishak87/archive-tar/blob/master/Reader.php#L155 236 | private function readHeader($header) 237 | { 238 | $record = unpack($this->format, $header); 239 | 240 | // we only support "ustar" format (for now?) 241 | if ($record['magic'] !== 'ustar') { 242 | throw new RuntimeException('Unsupported archive type, expected "ustar", but found "' . $record['magic'] . '"'); 243 | } 244 | 245 | // convert to decimal values 246 | foreach (array('uid', 'gid', 'size', 'mtime', 'checksum') as $key) { 247 | $record[$key] = octdec($record[$key]); 248 | } 249 | 250 | // calculate and compare header checksum 251 | $checksum = 0; 252 | for ($i = 0; $i < self::BLOCK_SIZE; $i++) { 253 | $checksum += 148 <= $i && $i < 156 ? 32 : ord($header[$i]); 254 | } 255 | if ($record['checksum'] != $checksum) { 256 | throw new RuntimeException('Invalid header checksum, expected "' . $record['checksum'] . '", but calculated "' . $checksum . '" (looks like the archive is corrupted)'); 257 | } 258 | 259 | // padding consists of X NULL bytes after record entry until next BLOCK_SIZE boundary 260 | $record['padding'] = (self::BLOCK_SIZE - ($record['size'] % self::BLOCK_SIZE)) % self::BLOCK_SIZE; 261 | 262 | // filename consists of prefix and name 263 | $record['filename'] = $record['prefix'] . $record['name']; 264 | 265 | return $record; 266 | } 267 | } 268 | --------------------------------------------------------------------------------