├── .github └── FUNDING.yml ├── CHANGELOG.md ├── LICENSE ├── README.md ├── composer.json └── src └── StreamingJsonParser.php /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | github: clue 2 | custom: https://clue.engineering/support 3 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | ## 0.1.1 (2020-12-19) 4 | 5 | * Improve documentation and add references to NDJSON. 6 | (#12 and #13 by @clue) 7 | 8 | * Improve test suite and add `.gitattributes` to exclude dev files from exports. 9 | Add PHP 8 support, update to PHPUnit 9 and simplify test setup. 10 | (#9, #10 and #11 by @SimonFrings) 11 | 12 | ## 0.1.0 (2015-03-10) 13 | 14 | * First tagged release 15 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Christian Lück 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is furnished 10 | to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # clue/json-stream 2 | 3 | [![CI status](https://github.com/clue/json-stream/workflows/CI/badge.svg)](https://github.com/clue/json-stream/actions) 4 | [![installs on Packagist](https://img.shields.io/packagist/dt/clue/json-stream?color=blue&label=installs%20on%20Packagist)](https://packagist.org/packages/clue/json-stream) 5 | 6 | A really simple and lightweight, incremental parser for [JSON streaming](https://en.wikipedia.org/wiki/JSON_Streaming) 7 | (concatenated JSON and [newline-delimited JSON](http://ndjson.org/), in PHP. 8 | You can use this library to process a stream of data that consists of multiple JSON documents. 9 | 10 | **Table of contents** 11 | 12 | * [Support us](#support-us) 13 | * [JSON streaming](#json-streaming) 14 | * [Quickstart example](#quickstart-example) 15 | * [Description](#description) 16 | * [Install](#install) 17 | * [Tests](#tests) 18 | * [License](#license) 19 | * [More](#more) 20 | 21 | ## Support us 22 | 23 | We invest a lot of time developing, maintaining and updating our awesome 24 | open-source projects. You can help us sustain this high-quality of our work by 25 | [becoming a sponsor on GitHub](https://github.com/sponsors/clue). Sponsors get 26 | numerous benefits in return, see our [sponsoring page](https://github.com/sponsors/clue) 27 | for details. 28 | 29 | Let's take these projects to the next level together! 🚀 30 | 31 | ## JSON streaming 32 | 33 | A newline-delimited JSON (NDJSON) example stream consisting of 3 individual JSON documents could look like this: 34 | 35 | ```json 36 | { "id": 1, "name": "first" } 37 | { "id": 3, "name": "third" } 38 | { "id": 6, "name": "sixth" } 39 | ``` 40 | 41 | > Less commonly, the same format is referred to as JSON lines (JSONL) or 42 | line-delimited JSON (LDJSON), which is not to be confused with JSON-LD. 43 | To avoid confusion, we consistently refer to this as newline-delimited JSON (NDJSON). 44 | If you control the generating side, we highly recommend going for NDJSON 45 | instead of using concatenated JSON as discussed below. 46 | See also [clue/reactphp-ndjson](https://github.com/clue/reactphp-ndjson). 47 | 48 | For this project, the whitespace between the individual JSON documents is entirely optional. 49 | Instead of newlines, you can use any number of whitespace or none at all. 50 | 51 | A concatenated JSON example stream consisting of 3 individual JSON documents could look like this: 52 | 53 | ```json 54 | { "id": 1, "name": "first" }{ "id": 3, "name": "third" }{ "id": 6, "name": "sixth"} 55 | ``` 56 | 57 | The input stream can be of arbitrary size and can be interrupted at any time. 58 | This is often useful for processing network streams, where the chunk/buffer size is 59 | not under your control and you could potentially read single bytes only. 60 | 61 | Please note that this library is about processing a stream that can contain any number of 62 | JSON documents. 63 | It is assumed that each document has a reasonable size and fits into memory. 64 | This is not to be confused with a streaming parser for processing a single, huge JSON document 65 | that is too big to fit into memory. 66 | 67 | ## Quickstart example 68 | 69 | Once [installed](#install), you can use the following sample code to parse a stream of JSON chunks: 70 | 71 | ```php 72 | $parser = new StreamingJsonParser(); 73 | 74 | assert($parser->push('[ 1, 2') === array()); 75 | assert($parser->push('3 ]') === array(array(1, 2, 3)); 76 | assert($parser->push('{} {}') === array(array(), array()); 77 | ``` 78 | 79 | ## Description 80 | 81 | This is actually only a really simple hack to call a normal document based parser 82 | whenever it *thinks* a full document has been found in the input stream. 83 | Because the normal parser is implemented as an extension (instead of userland PHP), 84 | this turns out to be pretty fast for streams that contain common, rather small 85 | objects. 86 | 87 | You might want to use this if 88 | 89 | * you have to deal with a stream of multiple JSON documents 90 | * you have to handle chunks of incomplete JSON documents 91 | * you prefer a lightweight parser 92 | 93 | You probably don't want to use this if 94 | 95 | * you deal with complete JSON documents 96 | * you have a proper delimiter (such as newlines) between your individual JSON documents 97 | * your JSON documents are too big to fit into RAM 98 | * you have a CS background and/or are in love with actual incremental, recursive parsers 99 | 100 | ## Install 101 | 102 | The recommended way to install this library is [through Composer](https://getcomposer.org/). 103 | [New to Composer?](https://getcomposer.org/doc/00-intro.md) 104 | 105 | This project does not currently follow [SemVer](https://semver.org/). 106 | This will install the latest supported version: 107 | 108 | ```bash 109 | $ composer require clue/json-stream:^0.1.1 110 | ``` 111 | 112 | See also the [CHANGELOG](CHANGELOG.md) for details about version upgrades. 113 | 114 | This project aims to run on any platform and thus does not require any PHP 115 | extensions and supports running on legacy PHP 5.3 through current PHP 8+ and 116 | HHVM. 117 | It's *highly recommended to use PHP 7+* for this project. 118 | 119 | ## Tests 120 | 121 | To run the test suite, you first need to clone this repo and then install all 122 | dependencies [through Composer](https://getcomposer.org/): 123 | 124 | ```bash 125 | $ composer install 126 | ``` 127 | 128 | To run the test suite, go to the project root and run: 129 | 130 | ```bash 131 | $ php vendor/bin/phpunit 132 | ``` 133 | 134 | ## License 135 | 136 | This project is released under the permissive [MIT license](LICENSE). 137 | 138 | > Did you know that I offer custom development services and issuing invoices for 139 | sponsorships of releases and for contributions? Contact me (@clue) for details. 140 | 141 | ## More 142 | 143 | * If you want to efficiently process (possibly infinite) streams of data, 144 | you may want to use [clue/reactphp-ndjson](https://github.com/clue/reactphp-ndjson) 145 | to process newline-delimited JSON (NDJSON) files (`.ndjson` file extension). 146 | -------------------------------------------------------------------------------- /composer.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "clue/json-stream", 3 | "description": "Lightweight incremental streaming JSON parser", 4 | "keywords": ["JSON", "streaming parser", "incremental parser"], 5 | "homepage": "https://github.com/clue/json-stream", 6 | "license": "MIT", 7 | "authors": [ 8 | { 9 | "name": "Christian Lück", 10 | "email": "christian@clue.engineering" 11 | } 12 | ], 13 | "autoload": { 14 | "psr-4": { "Clue\\JsonStream\\": "src/" } 15 | }, 16 | "require": { 17 | "php": ">=5.3" 18 | }, 19 | "require-dev": { 20 | "phpunit/phpunit": "^9.3 || ^5.7 || ^4.8.35" 21 | } 22 | } 23 | -------------------------------------------------------------------------------- /src/StreamingJsonParser.php: -------------------------------------------------------------------------------- 1 | endCharacter === null) { 20 | // trim leading whitespace 21 | $chunk = ltrim($chunk); 22 | 23 | if ($chunk === '') { 24 | // only whitespace => skip chunk 25 | break; 26 | } elseif ($chunk[0] === '[') { 27 | // array/list delimiter 28 | $this->endCharacter = ']'; 29 | } elseif ($chunk[0] === '{') { 30 | // object/hash delimiter 31 | $this->endCharacter = '}'; 32 | } else { 33 | throw new UnexpectedValueException('Invalid start'); 34 | } 35 | } 36 | 37 | $pos = strpos($chunk, $this->endCharacter); 38 | 39 | // no end found in chunk => must be part of segment, wait for next chunk 40 | if ($pos === false) { 41 | $this->buffer .= $chunk; 42 | break; 43 | } 44 | 45 | // possible end found in chunk => select possible segment from buffer, keep remaining chunk 46 | $this->buffer .= substr($chunk, 0, $pos + 1); 47 | $chunk = substr($chunk, $pos + 1); 48 | 49 | // try to parse 50 | $json = json_decode($this->buffer, $this->assoc); 51 | 52 | // successfully parsed 53 | if ($json !== null) { 54 | $objects [] = $json; 55 | 56 | // clear parsed buffer and continue checking remaining chunk 57 | $this->buffer = ''; 58 | $this->endCharacter = null; 59 | } 60 | } 61 | 62 | return $objects; 63 | } 64 | 65 | public function isEmpty() 66 | { 67 | return ($this->buffer === ''); 68 | } 69 | } 70 | --------------------------------------------------------------------------------