├── LICENSE-MIT ├── Readme.md ├── unittest.php └── truncateHTML.php /LICENSE-MIT: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | Copyright (c) 2018 Jean-Louis Grall, contributors 3 | 4 | Permission is hereby granted, free of charge, to any person 5 | obtaining a copy of this software and associated documentation 6 | files (the "Software"), to deal in the Software without 7 | restriction, including without limitation the rights to use, 8 | copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the 10 | Software is furnished to do so, subject to the following 11 | conditions: 12 | 13 | The above copyright notice and this permission notice shall be 14 | included in all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 18 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 20 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 21 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 22 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 23 | OTHER DEALINGS IN THE SOFTWARE. 24 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # truncateHTML 2 | 3 | A PHP function that truncates (shortens) a given HTML5 string to a max number of characters. 4 | 5 | __Example:__ truncate after 6 characters including the ellipsis: 6 | `

A red ball.

` __=>__ `

A red…

` 7 | 8 | Compatible with PHP 5.6 and 7+ 9 | Uses the _mbstring_ PHP extension for UTF-8. 10 | More than 240 unit tests (see or run: [unittest.php](unittest.php)) 11 | 12 | _The function is in [truncateHTML.php](truncateHTML.php), you can just copy/paste it to your project._ 13 | 14 | 15 | ## Features: 16 | 17 | - Quickly truncate most common HTML5 sources without using a full HTML parser (which is ~100x slower). 18 | - Configurable ellipsis: `…`, `...`, `More`, etc. 19 | - Can include the length of the ellipsis in the truncated result. 20 | - Supports self-closing tags like: ``, ``, `` 21 | - Collapsing spaces: sequences of multiple spaces are counted only once (including `
`, ` ` and a few others) 22 | - Don't count characters in invisible elements like: ``, `
Hi
More text."); 53 | // => "
Hi…
" 54 | 55 | // Collapsing multiple spaces: 56 | truncateHTML(6, "A
  \n\t long space!"); 57 | // => "A
  \n\t long…" 58 | 59 | // Tag mismatch: truncates before the error: 60 | truncateHTML(99, "Clickhere"); 61 | // => "Click…" 62 | ``` 63 | 64 | 65 | ## API: 66 | 67 | __`string truncateHTML(int $maxLength, string $html, array $options = [])`__ 68 | 69 | - `$maxLength`: the returned HTML will contain at most $maxLength countable characters. 70 | If negative, remove $maxLength countable characters from the end of the $html. 71 | - `$html`: the input HTML string that will be truncated. 72 | - `$options`: (optional) an array of options: 73 | 74 | |Options (with default value)|Descriptions| 75 | |---|---| 76 | |`'ellipsis'=>'…'`
(or: `'ellipsis'=>'...'`)|The ellipsis that will be included. Can be an empty string, can contain HTML tags.
(`'…'` is the horizontal ellipsis character, ie. `'...'` as a single unicode character)
(If not using UTF-8 mode, the default value will be `'...'` instead of `'…'`)| 77 | |`'includeEllipsisLength'=>true`|Whether to include the length of the ellipsis in the length of the truncated result.| 78 | |`'wholeWord'=>true`|When truncating, don't cut in the middle of a word. Instead cut at the end of the last word.| 79 | |`'cutWord'=>18`|When `wholeWord` is enabled, allows to cut long words after `cutWord` characters (Set to `0` or `false` to disable)| 80 | |`'utf8'=>true`|Use UTF-8 mode. You should always use [UTF-8](https://en.wikipedia.org/wiki/UTF-8) though.
If `utf8` is `false`, only ASCII-compatible single-byte encodings (such as [Latin-1](https://en.wikipedia.org/wiki/ISO/IEC_8859-1)) are supported. For other encodings, use [mb_convert_encoding](https://secure.php.net/manual/en/function.mb-convert-encoding.php) to convert to UTF-8 and back.
(If UTF-8 is disabled, the default ellipsis will be `'...'` instead of `'…'`)| 81 | 82 | 83 | ## Limitations: 84 | 85 | XHTML: probably works in most cases, but is untested. 86 | 87 | __Not supported:__ 88 | - __Malformed HTML__, badly nested tags, missing closing tags: it doesn't try to guess the correct fix (for this you would need a full HTML parser). 89 | _Note: when meeting an unexpected closing tag: it always truncates before the closing tag (see the examples)._ 90 | - Uncommon HTML code like: 91 | - [HTML tags inside an HTML Tag attribute](https://stackoverflow.com/questions/4699276/can-data-attribute-contain-html-tags): `` 92 | - The string __``__ inside ``. For this you would need a full HTML parser, or a JavaScript parser. (Other tags are ok, but don't have a closing tag `` in a JavaScript string or comment) 93 | - The string __``__ inside ``. For this you would need a full HTML parser, or a CSS parser. (Other tags are ok, but don't have a closing tag `` in a CSS comment) 94 | - XML 95 | - CDATA ([deprecated in HTML5](https://developer.mozilla.org/en-US/docs/Web/API/CDATASection)) 96 | 97 | _If you find more, please open an [issue](https://github.com/jlgrall/truncateHTML/issues)._ 98 | 99 | ## History (changelog) 100 | 101 | - __v1.0.1__ _(9 Feb. 2018)_: 102 | - Fix multibyte characters in regex 103 | - Add parameter types verifications 104 | - __v1.0__ _(5 Feb. 2018)_: 105 | - Initial version 106 | - _Inspired by:_ 107 | - _[StackOverflow: Truncate text containing HTML, ignoring tags](https://stackoverflow.com/questions/1193500/truncate-text-containing-html-ignoring-tags/1193598#1193598)_ 108 | - _[truncate() from CakePHP](https://github.com/cakephp/cakephp/blob/master/src/Utility/Text.php)_ -------------------------------------------------------------------------------- /unittest.php: -------------------------------------------------------------------------------- 1 | ..."): sets the input html string. 7 | - input(['option' => 'value']): sets or modifies an option that is used for the $options parameter. 8 | - t($maxLength, "Expected output"): tests the output of truncateHTML() with the expected output, using the given $maxLength and the previously set input html and the $options parameter. 9 | - sub(['option' => 'value'], function() {...}): sets the given options for inside the function. 10 | */ 11 | 12 | sub([ 13 | 'ellipsis' => "…", 14 | 'includeEllipsisLength' => false, 15 | 'wholeWord' => false, 16 | 'cutWord' => 3, 17 | 'utf8' => true, 18 | ], function() { 19 | 20 | /* TEST: empty html */ 21 | input(""); 22 | t(-1, ""); 23 | t( 0, ""); 24 | t( 1, ""); 25 | 26 | 27 | /* TEST: basic html */ 28 | input("a"); 29 | t(-1, "…"); 30 | t( 0, "…"); 31 | t( 1, "a"); 32 | t( 2, "a"); 33 | 34 | input("12 456789"); 35 | t(-2, "12 4567…"); 36 | t(-1, "12 45678…"); 37 | t( 0, "…"); 38 | t( 1, "1…"); 39 | t( 2, "12…"); 40 | t( 3, "12 …"); 41 | t( 8, "12 45678…"); 42 | t( 9, "12 456789"); 43 | t(10, "12 456789"); 44 | 45 | /* TEST: HTML entities */ 46 | input("1& +56789"); 47 | t(-2, "1& +567…"); 48 | t(-1, "1& +5678…"); 49 | t( 0, "…"); 50 | t( 1, "1…"); 51 | t( 2, "1&…"); 52 | t( 3, "1& …"); 53 | t( 4, "1& +…"); 54 | t( 8, "1& +5678…"); 55 | t( 9, "1& +56789"); 56 | t(10, "1& +56789"); 57 | 58 | /* TEST: Multiple spaces */ 59 | input("___"); 60 | t(2, "__…"); 61 | input("  \t\t
"); 62 | t(-1, "…"); 63 | t( 0, "…"); 64 | t( 1, "  \t\t
"); 65 | t( 2, "  \t\t
"); 66 | 67 | input("1   3456"); 68 | t(-1, "1   345…"); 69 | t( 0, "…"); 70 | t( 1, "1…"); 71 | t( 2, "1   …"); 72 | t( 3, "1   3…"); 73 | t( 4, "1   34…"); 74 | 75 | input("1 "); 76 | t(-1, "1…"); 77 | t( 0, "…"); 78 | t( 1, "1…"); 79 | t( 2, "1 "); 80 | 81 | input(" 2"); 82 | t(-1, " …"); 83 | t( 0, "…"); 84 | t( 1, " …"); 85 | t( 2, " 2"); 86 | 87 | 88 | /* TEST: Include ellipsis length */ 89 | sub(['includeEllipsisLength' => true], function() { 90 | 91 | input(['ellipsis' => "…"]); 92 | 93 | input(""); 94 | t( 0, ""); 95 | 96 | input("12"); 97 | t( 0, "…"); 98 | t( 1, "…"); 99 | t( 2, "12"); 100 | 101 | 102 | input(['ellipsis' => "..."]); 103 | 104 | input(""); 105 | t( 0, ""); 106 | 107 | input("12"); 108 | t( 0, "..."); 109 | t( 1, "..."); 110 | t( 2, "12"); 111 | 112 | input("123"); 113 | t( 0, "..."); 114 | t( 1, "..."); 115 | t( 2, "..."); 116 | t( 3, "123"); 117 | }); 118 | 119 | 120 | /* TEST: Empty ellipsis */ 121 | sub(['ellipsis' => ""], function() { 122 | 123 | input(['includeEllipsisLength' => false]); 124 | 125 | input(""); 126 | t( 0, ""); 127 | t( 1, ""); 128 | 129 | input("abc"); 130 | t( 0, ""); 131 | t( 1, "a"); 132 | t( 2, "ab"); 133 | t( 3, "abc"); 134 | 135 | input(['includeEllipsisLength' => true]); 136 | 137 | input(""); 138 | t( 0, ""); 139 | t( 1, ""); 140 | 141 | input("abc"); 142 | t( 0, ""); 143 | t( 1, "a"); 144 | t( 2, "ab"); 145 | t( 3, "abc"); 146 | }); 147 | 148 | 149 | /* TEST: Whole word */ 150 | sub(['wholeWord' => true], function() { 151 | 152 | input("12 45678"); 153 | t( 0, "…"); 154 | t( 1, "…"); 155 | t( 2, "12…"); 156 | t( 3, "12…"); 157 | t( 4, "12…"); 158 | t( 5, "12…"); 159 | t( 6, "12 456…"); 160 | t( 7, "12 4567…"); 161 | t( 7, "12…", ['cutWord' => 0]); 162 | t( 7, "12…", ['cutWord' => false]); 163 | 164 | input("12  45678"); 165 | t( 0, "…"); 166 | t( 1, "…"); 167 | t( 2, "12…"); 168 | t( 3, "12…"); 169 | t( 4, "12…"); 170 | t( 5, "12…"); 171 | t( 6, "12  456…"); 172 | t( 7, "12  4567…"); 173 | 174 | input("12
 
 45678"); 175 | t( 0, "…"); 176 | t( 1, "…"); 177 | t( 2, "12…"); 178 | t( 3, "12…"); 179 | t( 4, "12…"); 180 | t( 5, "12…"); 181 | t( 6, "12
 
 456…"); 182 | t( 7, "12
 
 4567…"); 183 | 184 | 185 | /* TEST: Include ellipsis length */ 186 | sub(['includeEllipsisLength' => true], function() { 187 | input("12 45678"); 188 | t( 0, "…"); 189 | t( 1, "…"); 190 | t( 2, "…"); 191 | t( 3, "12…"); 192 | t( 4, "12…"); 193 | t( 5, "12…"); 194 | t( 6, "12…"); 195 | t( 7, "12 456…"); 196 | 197 | /* TEST: HTML entities */ 198 | input("1& +56789"); 199 | t( 0, "…"); 200 | t( 1, "…"); 201 | t( 2, "…"); 202 | t( 3, "1&…"); 203 | t( 4, "1&…"); 204 | t( 5, "1&…"); 205 | t( 6, "1&…"); 206 | t( 7, "1& +56…"); 207 | t( 8, "1& +567…"); 208 | t( 9, "1& +56789"); 209 | t(10, "1& +56789"); 210 | }); 211 | }); 212 | 213 | 214 | 215 | 216 | /* NOW WITH TAGS */ 217 | 218 | input(""); 219 | t( 0, "…"); 220 | t( 1, ""); 221 | t( 2, ""); 222 | 223 | input(""); 224 | t( 0, "…"); 225 | t( 1, ""); 226 | t( 2, ""); 227 | 228 | input(""); 229 | t( 0, "…"); 230 | t( 1, ""); 231 | t( 2, ""); 232 | 233 | input("1"); // closing tag mismatch 234 | t(-1, "…"); 235 | t( 0, "…"); 236 | t( 1, "1…"); 237 | t( 2, "1…"); 238 | 239 | input("1
2
"); // closing tag mismatch 240 | t(-1, "1…"); 241 | t( 0, "…"); 242 | t( 1, "1…"); 243 | t( 2, "1
2…
"); 244 | t( 3, "1
2…
"); 245 | 246 | input("123456"); 247 | t( 0, "…"); 248 | t( 1, "1…"); 249 | t( 2, "12…"); 250 | t( 3, "123…"); 251 | t( 4, "1234…"); 252 | t( 5, "12345…"); 253 | t( 6, "123456"); 254 | 255 | 256 | /* TEST: Nested tags */ 257 | input("1
2
34

56

78"); 258 | t( 0, "…"); 259 | t( 1, "1…"); 260 | t( 2, "1
2…
"); 261 | t( 3, "1
2
3…"); 262 | t( 4, "1
2
34…"); 263 | t( 5, "1
2
34

5…

"); 264 | t( 6, "1
2
34

56…

"); 265 | t( 7, "1
2
34

56

7…"); 266 | t( 8, "1
2
34

56

78"); 267 | 268 | 269 | /* TEST: Self-closing tags */ 270 | input("
12"); 271 | t( 0, "…"); 272 | t( 1, "
1…"); 273 | t( 2, "
12"); 274 | 275 | 276 | /* TEST: Tags AND Include ellipsis length */ 277 | sub(['includeEllipsisLength' => true], function() { 278 | 279 | input(['ellipsis' => "…"]); 280 | 281 | input("1234567"); 282 | t( 0, "…"); 283 | t( 1, "…"); 284 | t( 2, "1…"); 285 | t( 3, "12…"); 286 | t( 4, "123…"); 287 | t( 5, "1234…"); 288 | t( 6, "12345…"); 289 | t( 7, "1234567"); 290 | 291 | 292 | input(['ellipsis' => "..."]); 293 | 294 | input("123456789"); 295 | t( 0, "..."); 296 | t( 1, "..."); 297 | t( 2, "..."); 298 | t( 3, "..."); 299 | t( 4, "1..."); 300 | t( 5, "12..."); 301 | t( 6, "123..."); 302 | t( 7, "1234..."); 303 | t( 8, "12345..."); 304 | t( 9, "123456789"); 305 | }); 306 | 307 | 308 | /* TEST: Don't count spaces separating tags */ 309 | input(" 2 4 7 "); 310 | t( 0, "… "); 311 | t( 1, " "); 312 | t( 2, " 2…"); 313 | t( 3, " 2 …"); 314 | t( 4, " 2 4…"); 315 | t( 5, " 2 4 … "); 316 | t( 6, " 2 4 "); 317 | t( 7, " 2 4 7…"); 318 | t( 8, " 2 4 7 "); 319 | 320 | 321 | /* TEST: Tags that don't count */ 322 | input("ZZZZZ1"); 323 | t( 0, "…"); 324 | t( 1, "ZZZZZ1"); 325 | 326 | /* TEST: Tags that don't count before a tag mismatch */ 327 | sub(function() { 328 | 329 | input("1
2
", ['wholeWord' => true]); // closing tag mismatch 330 | t(-1, "1…"); 331 | t( 0, "…"); 332 | t( 1, "1…"); 333 | t( 2, "1
2…
"); 334 | t( 3, "1
2…
"); 335 | t( 4, "1
2…
"); 336 | 337 | input("1
2
", ['wholeWord' => false]); // closing tag mismatch 338 | t(-1, "1…"); 339 | t( 0, "…"); 340 | t( 1, "1…"); 341 | t( 2, "1
2…
"); 342 | t( 3, "1
2…
"); 343 | t( 4, "1
2…
"); 344 | }); 345 | 346 | /* TEST: Tag mismatch inside a tag that don't count */ 347 | input("12"); // closing tag mismatch 348 | t(-1, "…"); 349 | t( 0, "…"); 350 | t( 1, "1…"); 351 | t( 2, "1…"); 352 | t( 3, "1…"); 353 | 354 | /* TEST: Style tag */ 355 | input("1"); 356 | t( 0, "…"); 357 | t( 1, "1"); 358 | 359 | /* TEST: Script tag */ 360 | input("1"); 361 | t( 0, "…"); 362 | t( 1, "1"); 363 | 364 | /* TEST: HTML comment */ 365 | input("12"); 366 | t( 0, "…"); 367 | t( 1, "1…"); 368 | t( 2, "12"); 369 | 370 | 371 | /* TEST: Tag AND Whole word */ 372 | sub(['wholeWord' => true], function() { 373 | input("12345678901"); 374 | t( 0, "…"); 375 | t( 1, "…"); 376 | t( 2, "12…"); 377 | t( 3, "12…"); 378 | t( 4, "12…"); 379 | t( 5, "12345…"); 380 | t( 6, "123456…"); 381 | t( 7, "123456…"); 382 | t( 8, "123456…"); 383 | t( 9, "123456789…"); 384 | t(10, "1234567890…"); 385 | t(11, "12345678901"); 386 | 387 | 388 | /* TEST: Include ellipsis length */ 389 | sub(['includeEllipsisLength' => true], function() { 390 | input("12345678901"); 391 | t( 0, "…"); 392 | t( 1, "…"); 393 | t( 2, "…"); 394 | t( 3, "12…"); 395 | t( 4, "12…"); 396 | t( 5, "12…"); 397 | t( 6, "12345…"); 398 | t( 7, "123456…"); 399 | t( 8, "123456…"); 400 | t( 9, "123456…"); 401 | t(10, "123456789…"); 402 | t(11, "12345678901"); 403 | 404 | /* TEST: HTML entities */ 405 | input("1& +567890"); 406 | t( 0, "…"); 407 | t( 1, "…"); 408 | t( 2, "1…"); 409 | t( 3, "1&…"); 410 | t( 4, "1&…"); 411 | t( 5, "1& +…"); 412 | t( 6, "1& +…"); 413 | t( 7, "1& +…"); 414 | t( 8, "1& +567…"); 415 | t( 9, "1& +5678…"); 416 | t(10, "1& +567890"); 417 | }); 418 | }); 419 | 420 | 421 | 422 | /* NOW CHECK FOR PREVIOUS FIXED ERRORS AND BUGS */ 423 | 424 | // TEST fix: multibyte character bug in regexes in $finalizeEllipsisData(): 425 | sub([ 426 | 'includeEllipsisLength' => true, 427 | 'wholeWord' => true, 428 | ], function() { 429 | input("

éa a

"); 430 | t( 0, "…"); 431 | t( 1, "…"); 432 | t( 2, "…"); 433 | t( 3, "

éa…

"); 434 | t( 4, "

éa a

"); 435 | }); 436 | }); 437 | 438 | 439 | 440 | /* TEST: Readme.md examples */ 441 | 442 | input("

A red ball.

"); 443 | t( 6, "

A red…

"); 444 | 445 | input("
A lumberjack
"); 446 | t( 5, "
A…
"); 447 | 448 | input("
A lumberjack
"); 449 | t( 5, "
A lum…
", ['wholeWord' => false, 'includeEllipsisLength' => false]); 450 | 451 | input("https://php.net/docs.php"); 452 | t( 5, "…"); 453 | input("https://php.net/docs.php"); 454 | t(20, "https://php.net/doc…"); 455 | 456 | input("
Hi
More text."); 457 | t( 3, "
Hi…
"); 458 | 459 | input("A
  \n\t long space!"); 460 | t( 7, "A
  \n\t long…"); 461 | 462 | input("Clickhere"); 463 | t(99, "Click…"); 464 | 465 | 466 | /* TEST: StackOverflow examples (https://stackoverflow.com/questions/1193500/truncate-text-containing-html-ignoring-tags/48671866#48671866) */ 467 | 468 | input("

A red ball.

", ['wholeWord' => false]); 469 | t( 9, "

A red ba…

"); 470 | 471 | 472 | 473 | finish(); 474 | 475 | 476 | 477 | 478 | 479 | 480 | /*############################################################*/ 481 | /*############################################################*/ 482 | /*############################################################*/ 483 | /*###### UTILITY FUNCTIONS ######*/ 484 | 485 | function init() { 486 | global $unittest, $paramsStack, $params; 487 | 488 | ini_set('display_errors', 1); 489 | ini_set('display_startup_errors', 1); 490 | error_reporting(E_ALL); 491 | ini_set('assert.exception', 1); // Assertion failure will throw an exception 492 | 493 | if (!function_exists('truncateHTML')) { 494 | require_once('truncateHTML.php'); 495 | } 496 | 497 | // SETUP some globals: 498 | $unittest = []; // Data related to the executed tests, see definition in init(). 499 | $paramsStack = []; // Used by sub() to manage $params when changing contexts. 500 | $params; // Contains the current parameters for truncateHTML(), see definition in resetParams(). 501 | 502 | resetParams(); 503 | 504 | $unittest = [ 505 | 'startTime' => microtime(true), 506 | 'succeededTests' => 0, 507 | 'failedTests' => 0, 508 | 'executedTests' => 0, 509 | ]; 510 | } 511 | 512 | function finish() { 513 | global $unittest; 514 | 515 | $unittest['endTime'] = microtime(true); 516 | 517 | echo "\033[01;32mSuccess ({$unittest['succeededTests']}/{$unittest['executedTests']})\033[0m\n"; 518 | echo "Run time: " . round(($unittest['endTime'] - $unittest['startTime']) * 1000, 2) . " ms\n"; 519 | } 520 | 521 | function resetParams() { 522 | global $params; 523 | $params = [ 524 | 'html' => '', 525 | 'options' => [], 526 | ]; 527 | } 528 | 529 | 530 | function input($html) { 531 | global $params, $paramsStack; 532 | 533 | $args = func_get_args(); 534 | foreach ($args as $arg) { 535 | if (is_string($arg)) { 536 | $params['html'] = $arg; 537 | } 538 | else if (is_array($arg)) { 539 | $params['options'] = $arg + $params['options']; 540 | } 541 | } 542 | } 543 | 544 | function sub() { 545 | global $params, $paramsStack; 546 | 547 | $paramsStack[] = $params; 548 | 549 | $args = func_get_args(); 550 | foreach ($args as $arg) { 551 | if (is_string($arg)) { 552 | $params['html'] = $arg; 553 | } 554 | else if (is_array($arg)) { 555 | $params['options'] = $arg + $params['options']; 556 | } 557 | else if (is_callable($arg)) { 558 | $arg(); 559 | } 560 | } 561 | 562 | $params = array_pop($paramsStack); 563 | } 564 | 565 | function t($maxLength, $expect, array $options = []) { 566 | global $unittest, $params; 567 | 568 | $html = $params['html']; 569 | $options = $options + $params['options']; 570 | $out = truncateHTML($maxLength, $html, $options); 571 | $unittest['executedTests']++; 572 | if ($out !== $expect) { 573 | $unittest['failedTests']++; 574 | $trace = debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS, 2); 575 | $line = $trace[0]['line']; 576 | echo "\033[01;31mFailed test {$unittest['executedTests']} (line $line):\033[0m\n"; 577 | echo "maxLength: $maxLength\n"; 578 | echo "html: '$html'\n"; 579 | echo "output: '$out'\n"; 580 | echo "expected: '$expect'\n"; 581 | echo "options: ".var_export($options, true)."\n"; 582 | exit(); 583 | } 584 | else { 585 | $unittest['succeededTests']++; 586 | } 587 | } -------------------------------------------------------------------------------- /truncateHTML.php: -------------------------------------------------------------------------------- 1 | (string) Ellipsis. Default: utf8 ? '…' : '...' 14 | * 'includeEllipsisLength' => (bool) Does $maxLength include the length of ellipsis ? Default: true 15 | * 'wholeWord' => (bool) Truncate at end of last whole word. Default: true 16 | * 'cutWord' => (int>=0|false) Default: 18 17 | * 'utf8' => (bool) Default: true 18 | * ] 19 | * @return string $truncated_html 20 | */ 21 | function truncateHTML($maxLength, $html, array $options = []) { 22 | assert(is_int($maxLength), "Parameter \$maxLength must be an int"); 23 | assert(is_string($html), "Parameter \$html must be a string"); 24 | 25 | $_isUtf8 = !isset($options['utf8']) || $options['utf8'] === true; 26 | $default = [ 27 | // If utf8, ellipsis defaults to HORIZONTAL ELLIPSIS ('…' ie. '...' as a single unicode character): 28 | 'ellipsis' => $_isUtf8 ? "\xe2\x80\xa6" : '...', 29 | 'includeEllipsisLength' => true, 30 | 'wholeWord' => true, 31 | 'cutWord' => 18, // Set to 0 or false to disable 32 | 'utf8' => true, 33 | 34 | // Internal use: 35 | 'forceBacktrack' => false, 36 | 'debug' => false, 37 | ]; 38 | $options += $default; 39 | 40 | assert(is_int($options['cutWord']) || $options['cutWord'] === false, "Option \$options['cutWord'] must be an integer or FALSE"); 41 | 42 | // THE function that does all the work of finding the position for the ellipsis, 43 | // the position for the truncation, and keeping track of opened tags: 44 | $analyze = function($maxLength, $html, array $options = []) use (&$analyze) { 45 | // For UTF-8 input: 46 | $utf8_mod = $options['utf8'] ? 'u' : ''; 47 | $strlen = $options['utf8'] ? 'mb_strlen' : 'strlen'; 48 | $substr = $options['utf8'] ? 'mb_substr' : 'substr'; 49 | 50 | if ($maxLength === -1) { 51 | // Internal use only: in this case, we are only interested in the length of $html, not in really truncating it. 52 | $maxLength = strlen($html); 53 | $options = ['ellipsis' => '', 'includeEllipsisLength' => false, 'wholeWord' => false] + $options; 54 | } 55 | 56 | $pos = 0; // Current position in $html 57 | $length = 0; // Length of $html at $pos (number of countable characters) 58 | $openedTags = []; // Stack of opened tags at $pos 59 | $isCounting = true; // Are we currently counting the characters we meet ? (false in HTML comments, ') { // End script: 335 | $re_nextTag = $re_inHTML; 336 | $isCounting = true; 337 | } 338 | elseif ($tag === '') { // End script: 339 | $re_nextTag = $re_inHTML; 340 | $isCounting = true; 341 | } 342 | else { // Other tag: 343 | $tagName = strtolower($tagMatches[1][0]); 344 | 345 | // Opening tag: 346 | if ($tag[1] !== '/') { 347 | $isCountingTag = $isCounting && !in_array($tagName, $noCountingTags, true); 348 | if (!$reachedOpenTag()) break; 349 | 350 | // If not self-closing tag: 351 | if ($tag[strlen($tag) - 2] !== '/' && !in_array($tagName, $selfClosingTags, true)) { 352 | if ($tagName === '!--') { // Start HTML comment: 353 | $re_nextTag = $re_inComment; 354 | } 355 | elseif ($tagName === 'script') { // Start script: 356 | $re_nextTag = $re_inScript; 357 | } 358 | elseif ($tagName === 'style') { // Start style: 359 | $re_nextTag = $re_inStyle; 360 | } 361 | else { 362 | // Stack opened tag: 363 | $openedTags[] = ['name' => $tagName, 'wasCounting' => $isCounting]; 364 | } 365 | $isCounting = $isCountingTag; 366 | } 367 | } 368 | // Closing tag: 369 | else { 370 | $prevTag = array_pop($openedTags); 371 | 372 | if ($tagName === $prevTag['name']) { 373 | $isCounting = $prevTag['wasCounting']; 374 | } 375 | else { // Un-paired closing tag (Malformed HTML ? Mismatched or badly nested tag ?) 376 | if ($prevTag !== null) $openedTags[] = $prevTag; 377 | if ($options['debug'] === true) throw new \Exception("Unmatched closing tag '$tag' (\$tagPos=$tagPos, \$pos=$pos, \$length=$length)"); 378 | else { 379 | // We backtrack: 380 | if ($endData_lastCountedChar['ellipsisPos'] !== -2) { 381 | $endData_maxLength = $endData_lastCountedChar; 382 | break; 383 | } 384 | // If we cannot backtrack directly, we rerun analyze() and force backtracking: 385 | else { 386 | $maxLength = ($endData_ellipsisIncluded['ellipsisPos'] === -1) ? $ellipsis_maxLength : $maxLength; 387 | return $analyze($maxLength, $html, ['forceBacktrack' => true] + $options); 388 | } 389 | } 390 | } 391 | } 392 | } 393 | 394 | // Continue after the tag: 395 | $pos += strlen($tag); 396 | } 397 | 398 | // Complete endDatas if needed with the current $pos: 399 | foreach ([&$endData_maxLength, &$endData_ellipsisIncluded] as &$endData) { 400 | if ($endData['ellipsisPos'] === -1) { // ie. we didn't reach $maxLength 401 | // So we can include all the length to $pos: 402 | $endData['ellipsisPos'] = $pos; 403 | $endData['length'] = $length; 404 | } 405 | if ($endData['truncatePos'] === -1) { // ie. we didn't reach a countable character after $maxLength 406 | // So we can include all the bytes to $pos: 407 | $endData['truncatePos'] = $pos; 408 | $endData['openedTags'] = $openedTags; 409 | } 410 | } 411 | 412 | // Should we return $endData_maxLength or $endData_ellipsisIncluded ? 413 | // In case we must include the ellipsis length: 414 | // - if we could reach the end of $html, it means that without the added length of the ellipsis, the length of $html is less than $maxLength 415 | // - otherwise we return the end with the ellipsis length included 416 | $endData_selected = $endData_maxLength; 417 | if ($options['includeEllipsisLength'] && $endData_maxLength['truncatePos'] !== strlen($html)) { 418 | $endData_selected = $endData_ellipsisIncluded; 419 | } 420 | 421 | return $endData_selected; 422 | }; // End of analyze() 423 | 424 | 425 | // If $maxLength is negative, remove $maxLength countable characters from the end of the $html: 426 | if ($maxLength < 0) { 427 | $maxLength = $analyze(-1, $html, $options)['length'] + $maxLength; 428 | if ($maxLength < 0) $maxLength = 0; 429 | } 430 | 431 | // Analyze $html: 432 | $r = $analyze($maxLength, $html, $options); 433 | $ellipsisPos = $r['ellipsisPos']; 434 | $truncatePos = $r['truncatePos']; 435 | $openedTags = $r['openedTags']; 436 | 437 | assert(!($ellipsisPos < 0), "Not counted: \$ellipsisPos=$ellipsisPos"); 438 | assert(!($truncatePos < 0), "Not processed: \$truncatePos=$truncatePos"); 439 | assert(!($truncatePos > strlen($html)), "Read too far: \$truncatePos=$truncatePos is greater than strlen(\$html)=".strlen($html)); 440 | 441 | // If $html is shorter than $maxLength: 442 | if ($truncatePos === strlen($html)) return $html; 443 | 444 | // Close all remaining opened tags: 445 | $closingTags = ''; 446 | while (!empty($openedTags)) $closingTags .= ''; 447 | 448 | // Return truncated $html with insertion of ellipsis and appended closing tags: 449 | return substr($html, 0, $ellipsisPos) 450 | . $options['ellipsis'] 451 | . substr($html, $ellipsisPos, $truncatePos - $ellipsisPos) 452 | . $closingTags; 453 | } --------------------------------------------------------------------------------