├── .editorconfig ├── .gitattributes ├── .github └── workflows │ └── main.yml ├── .gitignore ├── AUTHORS.md ├── CHANGELOG.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── composer.json └── src ├── Configuration.php ├── Nodes ├── DOM │ ├── DOMAttr.php │ ├── DOMCdataSection.php │ ├── DOMCharacterData.php │ ├── DOMComment.php │ ├── DOMDocument.php │ ├── DOMDocumentFragment.php │ ├── DOMDocumentType.php │ ├── DOMElement.php │ ├── DOMEntity.php │ ├── DOMEntityReference.php │ ├── DOMNode.php │ ├── DOMNodeList.php │ ├── DOMNotation.php │ ├── DOMProcessingInstruction.php │ └── DOMText.php ├── NodeTrait.php └── NodeUtility.php ├── ParseException.php └── Readability.php /.editorconfig: -------------------------------------------------------------------------------- 1 | root = true 2 | 3 | [*] 4 | indent_style = space 5 | indent_size = 4 6 | end_of_line = lf 7 | charset = utf-8 8 | trim_trailing_whitespace = true 9 | insert_final_newline = false -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # Ignore test-related files 2 | /test/ export-ignore 3 | /phpunit.xml export-ignore 4 | /Makefile export-ignore 5 | /docker export-ignore 6 | /docker-compose.yml export-ignore 7 | 8 | test/* linguist-language=PHP 9 | * text=auto eol=lf -------------------------------------------------------------------------------- /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | # This is a basic workflow to help you get started with Actions 2 | 3 | name: CI 4 | 5 | # Controls when the workflow will run 6 | on: 7 | # Triggers the workflow on push or pull request events but only for the master branch 8 | push: 9 | branches: [master] 10 | pull_request: 11 | branches: [master] 12 | 13 | # Allows you to run this workflow manually from the Actions tab 14 | workflow_dispatch: 15 | 16 | # A workflow run is made up of one or more jobs that can run sequentially or in parallel 17 | jobs: 18 | # This workflow contains a single job called "build" 19 | build: 20 | # The type of runner that the job will run on 21 | runs-on: ubuntu-latest 22 | 23 | strategy: 24 | matrix: 25 | php: ['8.1', '8.2', '8.3', '8.4'] 26 | libxml: ['2.9.14'] 27 | 28 | # Steps represent a sequence of tasks that will be executed as part of the job 29 | steps: 30 | # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it 31 | - uses: actions/checkout@v3 32 | 33 | - name: Set up PHP 34 | uses: shivammathur/setup-php@v2 35 | with: 36 | php-version: ${{matrix.php}} 37 | tools: composer:v2 38 | 39 | - name: Install dependencies 40 | run: composer install 41 | 42 | # Runs a set of commands using the runners shell 43 | - name: Run tests 44 | run: | 45 | docker build --build-arg PHP_VERSION=${{matrix.php}} --build-arg LIBXML_VERSION=${{matrix.libxml}} -t gh-action - < ./docker/php/Dockerfile 46 | docker run --volume $PWD:/app --workdir="/app" --env XDEBUG_MODE=coverage gh-action php ./vendor/bin/phpunit --coverage-clover /app/test/clover.xml 47 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | vendor 3 | composer.lock 4 | /test.* 5 | /test/changed/ -------------------------------------------------------------------------------- /AUTHORS.md: -------------------------------------------------------------------------------- 1 | # Authors 2 | 3 | Readability.php developed by **Andres Rey**. 4 | 5 | Based on Arc90's readability.js (1.7.1) script available at: http://code.google.com/p/arc90labs-readability. 6 | Copyright (c) 2010 Arc90 Inc 7 | 8 | The AUTHORS/Contributors are (and/or have been): 9 | 10 | * Andres Rey 11 | * Sergiy Lavryk 12 | * Pedro Amorim 13 | * Malu Decks 14 | * Keyvan Minoukadeh 15 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Change Log 2 | All notable changes to this project will be documented in this file. 3 | 4 | ## [v3.3.3](https://github.com/fivefilters/readability.php/releases/tag/v3.3.3) 5 | - Fix type error - extends type support to add DOMProcessingInstruction in more method signatures (reported by @reinierkors) 6 | 7 | ## [v3.3.2](https://github.com/fivefilters/readability.php/releases/tag/v3.3.2) 8 | - Fix type error - extends type support to include DOMCdataSection and DOMProcessingInstruction in various method signatures (reported by @mikiescolarmrf and @Grotax) 9 | 10 | ## [v3.3.1](https://github.com/fivefilters/readability.php/releases/tag/v3.3.1) 11 | - Fix DOMProcessingInstruction errors 12 | 13 | ## [v3.3.0](https://github.com/fivefilters/readability.php/releases/tag/v3.3.0) 14 | - Fixed PHP 8.4 deprecation warning (reported by @pich) 15 | - Migrated type declarations from PHPDoc blocks to native PHP 8 property and method types 16 | - Empty class attributes now removed when `keepClasses` is disabled 17 | - Replaced legacy DOM operations with native PHP 8 methods: 18 | - `isWhitespaceInElementContent()` for whitespace detection 19 | - `firstElementChild` and `previousElementSibling` for DOM traversal 20 | - Updated Docker test environment to support PHP 8.1-8.4 21 | 22 | ## [v3.2.0](https://github.com/fivefilters/readability.php/releases/tag/v3.2.0) 23 | - Update dependencies to newer versions (League/URI version 7), to make it compatible with projects already relying on those versions 24 | - Minimum PHP version set to 8.1 (required by League/URI 7) 25 | - Update Docker tests to use PHP 8.1, 8.2 and 8.3 26 | 27 | ## [v3.1.7](https://github.com/fivefilters/readability.php/releases/tag/v3.1.7) 28 | - Fixes URL syntax errors when bad URLs are encountered when rewriting relative URLs - reported by @marcelklehr 29 | - Fixes PHP 8 deprecation notice when base URLs (used for rewriting relative URLs) don't have a path component - thanks to @blat and @Markus-GS 30 | 31 | ## [v3.1.6](https://github.com/fivefilters/readability.php/releases/tag/v3.1.6) 32 | - Avoid re-parsing source HTML when making multiple attempts to identify content in parse() 33 | 34 | ## [v3.1.5](https://github.com/fivefilters/readability.php/releases/tag/v3.1.5) 35 | - Allow psr/log version 2.x and 3.x - thanks to @piotrek-r and @ArondeParon 36 | 37 | ## [v3.1.4](https://github.com/fivefilters/readability.php/releases/tag/v3.1.4) 38 | - Fixes improper use of null coalescing operator - reported by @thedf 39 | 40 | ## [v3.1.3](https://github.com/fivefilters/readability.php/releases/tag/v3.1.3) 41 | - Fixes issue where exception was thrown when resolving an invalid relative URL (when setFixRelativeURLs(true)) - reported by @jeffbotw 42 | 43 | ## [v3.1.2](https://github.com/fivefilters/readability.php/releases/tag/v3.1.2) 44 | - Fixes issue "Warning: Undefined array key 2" reported by @castroCrea 45 | - Fixes issue "Notice: Trying to get property '' of non-object" reported by @thedf 46 | 47 | ## [v3.1.1](https://github.com/fivefilters/readability.php/releases/tag/v3.1.1) 48 | - Exclude tests folder when using composer 49 | 50 | ## [v3.1.0](https://github.com/fivefilters/readability.php/releases/tag/v3.1.0) 51 | - Minimum PHP version 7.4 (composer.json updated) 52 | - Updated the Docker file to support versions of PHP from 7.4 to 8.1 53 | - Updated the Docker file to allow you to run PHP with libxml 2.9.10, 2.9.13, 2.9.14 54 | - Test with PHP 8.1 55 | 56 | ## [v3.0.0](https://github.com/fivefilters/readability.php/releases/tag/v3.0.0) 57 | - Implemented changes made to Readability.js up to 26 August 2021, with the exception of a [piece of code](https://github.com/fivefilters/readability.php/commit/1c662465bded2ab3acf3b975a1315c8c45f0bf73#diff-b9b31807b1a39caec18ddc293e9c52931ba8b55191c61e6b77a623d699a599ffR1899) which doesn't produce the same results in PHP for us compard to the JS version. 58 | - Default parser is now HTML5-PHP, which handles HTML better than libxml 59 | - Replaced the expected HTML files in the tests folder to reflect HTML5-PHP's serialisation 60 | - Updated the Docker file to support versions of PHP from 7.3 to 8.0 (previously it was 7.0 to 7.3) 61 | - Updated the Docker file to allow you to run PHP with libxml 2.9.4, 2.9.5, 2.9.10, and 2.9.12 62 | - Fatal error bug fix (thanks Balazsp) 63 | 64 | ## [v2.1.0](https://github.com/andreskrey/readability.php/releases/tag/v2.1.0) 65 | - Avoid overwriting extracted metadata with similarly named keys (like `og:image` and `og:image:width`) 66 | - Imported new `getSiteName()` feature from JS version as of [21 Dec 2018](https://github.com/mozilla/readability/pull/504) 67 | - Added getFirstElementChild function to NodeTrait + test case (Issue #83) 68 | - Reworked the test suit to use TestPage objects and give more hints about what failed 69 | - Removed getWordThreshold and setWordThreshold configuration functions 70 | - Added NodeUtility::filterTextNodes and deprecated NodeTrait getChildren() 71 | - Added new DOMNodeList fake class that mimics the original DOMNodeList class but allows to add new nodes to the list 72 | - Added new Dockerfiles that pulls different versions of PHP and libxml. Now we are supporting 4 versions of PHP and 6 versions of libxml! 73 | 74 | ## [v2.0.1](https://github.com/andreskrey/readability.php/releases/tag/v2.0.1) 75 | - Fixed small issue that prevented the main image from showing up in the results 76 | 77 | ## [v2.0.0](https://github.com/andreskrey/readability.php/releases/tag/v2.0.0) 78 | 79 | - [BREAKING CHANGE] Bumped the minimum supported version of PHP to 7.0 80 | - Clean `