├── README.md ├── bootstrap.php ├── composer.json ├── index.php └── src └── CoreNLP └── CorenlpAdapter.php /README.md: -------------------------------------------------------------------------------- 1 | 2 | # PHP Stanford CoreNLP adapter 3 | 4 | [![Version](https://img.shields.io/packagist/v/dennis-de-swart/php-stanford-corenlp-adapter.svg?style=flat-square)](https://packagist.org/packages/dennis-de-swart/php-stanford-corenlp-adapter) 5 | [![Total Downloads](https://img.shields.io/packagist/dt/dennis-de-swart/php-stanford-corenlp-adapter.svg?style=flat-square)](https://packagist.org/packages/dennis-de-swart/php-stanford-corenlp-adapter) 6 | [![Maintenance](https://img.shields.io/maintenance/yes/2020.svg?style=flat-square)](https://github.com/DennisDeSwart/php-stanford-corenlp-adapter) 7 | [![Minimum PHP Version](https://img.shields.io/badge/php-%3E%3D%205.6-4AC51C.svg?style=flat-square)](http://php.net/) 8 | [![License](https://img.shields.io/packagist/l/dennis-de-swart/php-stanford-corenlp-adapter.svg?style=flat-square)](https://opensource.org/licenses/MIT) 9 | 10 | PHP adapter for use with Stanford CoreNLP 11 | 12 | ## Features 13 | - Connect to Stanford University CoreNLP API online 14 | - Connect to Stanford CoreNLP 3.7.0 server 15 | - Annotators available: tokenize,ssplit,pos, parse, depparse, ner, regexner,lemma, mention, natlog, coref, openie, kbp 16 | - The package creates Part-Of-Speech Trees with depth, parent- and child ID 17 | 18 | 19 | 20 | ## Requirements 21 | - PHP 5.5 or higher: it also works on PHP 7 22 | - Windows or Linux 64-bit, 8Gb memory or more recommended 23 | - Either Guzzle HTTP Client (installed by default) or only cURL. 24 | - Composer for PHP 25 | ``` 26 | https://getcomposer.org/ 27 | ``` 28 | 29 | 30 | ## Update 24th February 2018 31 | PHP7 Type hinting removed, because it was causing issues for some users. 32 | 33 | ## Update 28th January 2019 34 | Fixed issue with PHP 7.1 upwards 35 | 36 | ## Installation using ZIP files 37 | 38 | - Install Stanford CoreNLP Server. See the installation walkthrough below. 39 | - Download and unpack the files from this package. 40 | - Copy the files to your to your webserver directory. Usually "htdocs" or "var/www". 41 | - Run a Composer update 42 | 43 | 44 | 45 | ## Installation using Composer 46 | 47 | - Insert the following line into the "require" of your "composer.json" file. 48 | 49 | ``` 50 | { 51 | "require": { 52 | "dennis-de-swart/php-stanford-corenlp-adapter": "*" 53 | } 54 | } 55 | ``` 56 | 57 | - Run a composer update 58 | 59 | 60 | 61 | ## Using the Stanford CoreNLP online API service 62 | 63 | 64 | The adapter by default uses Stanford's online API service. This should work right after the composer update. 65 | Note that the online API is a public service. If you want to analyze large volumes of text or sensitive data, 66 | please install the Java server version. 67 | 68 | 69 | 70 | ## OpenIE 71 | 72 | OpenIE creates "subject-relation-object" tuples. This is similar (but not the same) as the "Subject-Verb-Object" concept of the English language. 73 | 74 | Notes: 75 | - OpenIE is only available on the Java offline version, not with the "online" mode. See the installation walkthrough below 76 | - OpenIE data is not always available. Sometimes the result array might show empty, this is not an error. 77 | 78 | ``` 79 | http://nlp.stanford.edu/software/openie.html 80 | https://en.wikipedia.org/wiki/Subject-verb-object 81 | ``` 82 | 83 | 84 | 85 | # Installation / Walkthrough for Java server version 86 | 87 | 88 | 89 | 90 | ## Step 1: install Java 91 | 92 | ``` 93 | https://java.com/en/download/help/index_installing.xml?os=All+Platforms&j=8&n=20 94 | ``` 95 | 96 | ## Step 2: installing the Stanford CoreNLP 3.7.0 server 97 | ``` 98 | http://stanfordnlp.github.io/CoreNLP/index.html#download 99 | ``` 100 | 101 | 102 | 103 | ## Step 3: Port for server 104 | Default port for the Java server is port 9000. If port 9000 is not available you can change the port in the "bootstrap.php" file. Example: 105 | 106 | ``` 107 | define('CURLURL' , 'http://localhost:9000/'); 108 | 109 | ``` 110 | 111 | 112 | ## Step 4: Start the CoreNLP serve from the command line. 113 | 114 | Go to the download directory, then enter the following command: 115 | 116 | ``` 117 | java -mx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 118 | ``` 119 | 120 | Important note: the Stanford manual says "-mx4g", however I found that this can lead to a Java OutOfMemory error. It is also important to use a 64-bit operating system with at enough memory (8Gb or more recommended) 121 | 122 | 123 | ## Step 5: Test if the server has started by surfing to it's URL 124 | ``` 125 | http://localhost:9000/ 126 | ``` 127 | When you surf to this URL, you should see the CoreNLP GUI. If you have problems with installation you can check the manual: 128 | ``` 129 | http://stanfordnlp.github.io/CoreNLP/corenlp-server.html 130 | ``` 131 | 132 | ## Step 6: Set ONLINE_API to FALSE 133 | 134 | In "bootstrap.php" set define('ONLINE_API' , FALSE). This tells the Adapter to use the Java version 135 | 136 | 137 | 138 | 139 | # Usage examples 140 | 141 | 142 | 143 | ## Instantiate the adapter: 144 | ``` 145 | $coreNLP = new CorenlpAdapter(); 146 | ``` 147 | 148 | 149 | ## To process a text, call the "getOutput" method: 150 | ``` 151 | $text = 'The Golden Gate Bridge was designed by Joseph Strauss.'; 152 | $coreNLP->getOutput($text); 153 | ``` 154 | 155 | Note that the first time that you process a text, the server takes about 20 to 30 seconds extra to load definitions. All other calls to the server after that will be much faster. Small texts are usually processed within seconds. 156 | 157 | 158 | 159 | ## The results 160 | 161 | If successful the following properties will be available: 162 | ``` 163 | $coreNLP->serverMemory; //contains all of the server output 164 | $coreNLP->trees; //contains processed flat trees. Each part of the tree is assigned an ID key 165 | 166 | $coreNLP->getWordValues($coreNLP->trees[1]) // get just the words from a tree 167 | ``` 168 | 169 | 170 | 171 | 172 | ******************************** 173 | ### Diagram A: Tree With Tokens 174 | ******************************** 175 | ``` 176 | Array 177 | ( 178 | [1] => Array 179 | ( 180 | [parent] => 181 | [pennTreebankTag] => ROOT 182 | [depth] => 0 183 | ) 184 | 185 | [2] => Array 186 | ( 187 | [parent] => 1 188 | [pennTreebankTag] => S 189 | [depth] => 2 190 | ) 191 | 192 | [3] => Array 193 | ( 194 | [parent] => 2 195 | [pennTreebankTag] => NP 196 | [depth] => 4 197 | ) 198 | 199 | [4] => Array 200 | ( 201 | [parent] => 3 202 | [pennTreebankTag] => PRP 203 | [depth] => 6 204 | [word] => I 205 | [index] => 1 206 | [originalText] => I 207 | [lemma] => I 208 | [characterOffsetBegin] => 0 209 | [characterOffsetEnd] => 1 210 | [pos] => PRP 211 | [ner] => O 212 | [before] => 213 | [after] => 214 | [openIE] => Array 215 | ( 216 | [0] => subject 217 | [1] => subject 218 | [2] => subject 219 | ) 220 | 221 | ) 222 | 223 | [5] => Array 224 | ( 225 | [parent] => 2 226 | [pennTreebankTag] => VP 227 | [depth] => 4 228 | ) 229 | 230 | [6] => Array 231 | ( 232 | [parent] => 5 233 | [pennTreebankTag] => MD 234 | [depth] => 6 235 | [word] => will 236 | [index] => 2 237 | [originalText] => will 238 | [lemma] => will 239 | [characterOffsetBegin] => 2 240 | [characterOffsetEnd] => 6 241 | [pos] => MD 242 | [ner] => O 243 | [before] => 244 | [after] => 245 | [openIE] => Array 246 | ( 247 | [0] => subject 248 | [1] => subject 249 | [2] => relation 250 | ) 251 | 252 | ) 253 | 254 | [7] => Array 255 | ( 256 | [parent] => 5 257 | [pennTreebankTag] => VP 258 | [depth] => 6 259 | ) 260 | 261 | [8] => Array 262 | ( 263 | [parent] => 7 264 | [pennTreebankTag] => VB 265 | [depth] => 8 266 | [word] => meet 267 | [index] => 3 268 | [originalText] => meet 269 | [lemma] => meet 270 | [characterOffsetBegin] => 7 271 | [characterOffsetEnd] => 11 272 | [pos] => VB 273 | [ner] => O 274 | [before] => 275 | [after] => 276 | [openIE] => Array 277 | ( 278 | [0] => subject 279 | [1] => subject 280 | [2] => relation 281 | ) 282 | 283 | ) 284 | 285 | [9] => Array 286 | ( 287 | [parent] => 7 288 | [pennTreebankTag] => NP 289 | [depth] => 8 290 | ) 291 | 292 | [10] => Array 293 | ( 294 | [parent] => 9 295 | [pennTreebankTag] => NP 296 | [depth] => 10 297 | ) 298 | 299 | [11] => Array 300 | ( 301 | [parent] => 10 302 | [pennTreebankTag] => NNP 303 | [depth] => 12 304 | [word] => Mary 305 | [index] => 4 306 | [originalText] => Mary 307 | [lemma] => Mary 308 | [characterOffsetBegin] => 12 309 | [characterOffsetEnd] => 16 310 | [pos] => NNP 311 | [ner] => PERSON 312 | [before] => 313 | [after] => 314 | [openIE] => Array 315 | ( 316 | [1] => subject 317 | [2] => object 318 | [3] => subject 319 | [0] => subject 320 | ) 321 | 322 | ) 323 | 324 | [12] => Array 325 | ( 326 | [parent] => 9 327 | [pennTreebankTag] => PP 328 | [depth] => 10 329 | ) 330 | 331 | [13] => Array 332 | ( 333 | [parent] => 12 334 | [pennTreebankTag] => IN 335 | [depth] => 12 336 | [word] => in 337 | [index] => 5 338 | [originalText] => in 339 | [lemma] => in 340 | [characterOffsetBegin] => 17 341 | [characterOffsetEnd] => 19 342 | [pos] => IN 343 | [ner] => O 344 | [before] => 345 | [after] => 346 | [openIE] => Array 347 | ( 348 | [1] => relation 349 | [3] => relation 350 | [0] => relation 351 | ) 352 | 353 | ) 354 | 355 | [14] => Array 356 | ( 357 | [parent] => 12 358 | [pennTreebankTag] => NP 359 | [depth] => 12 360 | ) 361 | 362 | [15] => Array 363 | ( 364 | [parent] => 14 365 | [pennTreebankTag] => NNP 366 | [depth] => 14 367 | [word] => New 368 | [index] => 6 369 | [originalText] => New 370 | [lemma] => New 371 | [characterOffsetBegin] => 20 372 | [characterOffsetEnd] => 23 373 | [pos] => NNP 374 | [ner] => LOCATION 375 | [before] => 376 | [after] => 377 | [openIE] => Array 378 | ( 379 | [1] => relation 380 | [3] => object 381 | [0] => object 382 | ) 383 | 384 | ) 385 | 386 | [16] => Array 387 | ( 388 | [parent] => 14 389 | [pennTreebankTag] => NNP 390 | [depth] => 14 391 | [word] => York 392 | [index] => 7 393 | [originalText] => York 394 | [lemma] => York 395 | [characterOffsetBegin] => 24 396 | [characterOffsetEnd] => 28 397 | [pos] => NNP 398 | [ner] => LOCATION 399 | [before] => 400 | [after] => 401 | [openIE] => Array 402 | ( 403 | [1] => object 404 | [3] => object 405 | ) 406 | 407 | ) 408 | 409 | [17] => Array 410 | ( 411 | [parent] => 7 412 | [pennTreebankTag] => PP 413 | [depth] => 8 414 | ) 415 | 416 | [18] => Array 417 | ( 418 | [parent] => 17 419 | [pennTreebankTag] => IN 420 | [depth] => 10 421 | [word] => at 422 | [index] => 8 423 | [originalText] => at 424 | [lemma] => at 425 | [characterOffsetBegin] => 29 426 | [characterOffsetEnd] => 31 427 | [pos] => IN 428 | [ner] => O 429 | [before] => 430 | [after] => 431 | [openIE] => Array 432 | ( 433 | [1] => object 434 | ) 435 | 436 | ) 437 | 438 | [19] => Array 439 | ( 440 | [parent] => 17 441 | [pennTreebankTag] => NP 442 | [depth] => 10 443 | ) 444 | 445 | [20] => Array 446 | ( 447 | [parent] => 19 448 | [pennTreebankTag] => CD 449 | [depth] => 12 450 | [word] => 10pm 451 | [index] => 9 452 | [originalText] => 10pm 453 | [lemma] => 10pm 454 | [characterOffsetBegin] => 32 455 | [characterOffsetEnd] => 36 456 | [pos] => CD 457 | [ner] => TIME 458 | [normalizedNER] => T22:00 459 | [before] => 460 | [after] => 461 | [timex] => Array 462 | ( 463 | [tid] => t1 464 | [type] => TIME 465 | [value] => T22:00 466 | ) 467 | 468 | [openIE] => Array 469 | ( 470 | [0] => object 471 | [1] => object 472 | ) 473 | 474 | ) 475 | 476 | ) 477 | 478 | ``` 479 | 480 | *********************************************************************** 481 | ### Diagram B: The ServerMemory contains all the server data 482 | *********************************************************************** 483 | ``` 484 | Array 485 | ( 486 | [0] => Array 487 | ( 488 | [sentences] => Array 489 | ( 490 | [0] => Array 491 | ( 492 | [index] => 0 493 | [parse] => (ROOT 494 | (S 495 | (NP (PRP I)) 496 | (VP (MD will) 497 | (VP (VB meet) 498 | (NP 499 | (NP (NNP Mary)) 500 | (PP (IN in) 501 | (NP (NNP New) (NNP York)))) 502 | (PP (IN at) 503 | (NP (CD 10pm))))))) 504 | [basic-dependencies] => Array 505 | ( 506 | [0] => Array 507 | ( 508 | [dep] => ROOT 509 | [governor] => 0 510 | [governorGloss] => ROOT 511 | [dependent] => 3 512 | [dependentGloss] => meet 513 | ) 514 | 515 | [1] => Array 516 | ( 517 | [dep] => nsubj 518 | [governor] => 3 519 | [governorGloss] => meet 520 | [dependent] => 1 521 | [dependentGloss] => I 522 | ) 523 | 524 | [2] => Array 525 | ( 526 | [dep] => aux 527 | [governor] => 3 528 | [governorGloss] => meet 529 | [dependent] => 2 530 | [dependentGloss] => will 531 | ) 532 | 533 | [3] => Array 534 | ( 535 | [dep] => dobj 536 | [governor] => 3 537 | [governorGloss] => meet 538 | [dependent] => 4 539 | [dependentGloss] => Mary 540 | ) 541 | 542 | [4] => Array 543 | ( 544 | [dep] => case 545 | [governor] => 7 546 | [governorGloss] => York 547 | [dependent] => 5 548 | [dependentGloss] => in 549 | ) 550 | 551 | [5] => Array 552 | ( 553 | [dep] => compound 554 | [governor] => 7 555 | [governorGloss] => York 556 | [dependent] => 6 557 | [dependentGloss] => New 558 | ) 559 | 560 | [6] => Array 561 | ( 562 | [dep] => nmod 563 | [governor] => 4 564 | [governorGloss] => Mary 565 | [dependent] => 7 566 | [dependentGloss] => York 567 | ) 568 | 569 | [7] => Array 570 | ( 571 | [dep] => case 572 | [governor] => 9 573 | [governorGloss] => 10pm 574 | [dependent] => 8 575 | [dependentGloss] => at 576 | ) 577 | 578 | [8] => Array 579 | ( 580 | [dep] => nmod 581 | [governor] => 3 582 | [governorGloss] => meet 583 | [dependent] => 9 584 | [dependentGloss] => 10pm 585 | ) 586 | 587 | ) 588 | 589 | [collapsed-dependencies] => Array 590 | ( 591 | [0] => Array 592 | ( 593 | [dep] => ROOT 594 | [governor] => 0 595 | [governorGloss] => ROOT 596 | [dependent] => 3 597 | [dependentGloss] => meet 598 | ) 599 | 600 | [1] => Array 601 | ( 602 | [dep] => nsubj 603 | [governor] => 3 604 | [governorGloss] => meet 605 | [dependent] => 1 606 | [dependentGloss] => I 607 | ) 608 | 609 | [2] => Array 610 | ( 611 | [dep] => aux 612 | [governor] => 3 613 | [governorGloss] => meet 614 | [dependent] => 2 615 | [dependentGloss] => will 616 | ) 617 | 618 | [3] => Array 619 | ( 620 | [dep] => dobj 621 | [governor] => 3 622 | [governorGloss] => meet 623 | [dependent] => 4 624 | [dependentGloss] => Mary 625 | ) 626 | 627 | [4] => Array 628 | ( 629 | [dep] => case 630 | [governor] => 7 631 | [governorGloss] => York 632 | [dependent] => 5 633 | [dependentGloss] => in 634 | ) 635 | 636 | [5] => Array 637 | ( 638 | [dep] => compound 639 | [governor] => 7 640 | [governorGloss] => York 641 | [dependent] => 6 642 | [dependentGloss] => New 643 | ) 644 | 645 | [6] => Array 646 | ( 647 | [dep] => nmod:in 648 | [governor] => 4 649 | [governorGloss] => Mary 650 | [dependent] => 7 651 | [dependentGloss] => York 652 | ) 653 | 654 | [7] => Array 655 | ( 656 | [dep] => case 657 | [governor] => 9 658 | [governorGloss] => 10pm 659 | [dependent] => 8 660 | [dependentGloss] => at 661 | ) 662 | 663 | [8] => Array 664 | ( 665 | [dep] => nmod:at 666 | [governor] => 3 667 | [governorGloss] => meet 668 | [dependent] => 9 669 | [dependentGloss] => 10pm 670 | ) 671 | 672 | ) 673 | 674 | [collapsed-ccprocessed-dependencies] => Array 675 | ( 676 | [0] => Array 677 | ( 678 | [dep] => ROOT 679 | [governor] => 0 680 | [governorGloss] => ROOT 681 | [dependent] => 3 682 | [dependentGloss] => meet 683 | ) 684 | 685 | [1] => Array 686 | ( 687 | [dep] => nsubj 688 | [governor] => 3 689 | [governorGloss] => meet 690 | [dependent] => 1 691 | [dependentGloss] => I 692 | ) 693 | 694 | [2] => Array 695 | ( 696 | [dep] => aux 697 | [governor] => 3 698 | [governorGloss] => meet 699 | [dependent] => 2 700 | [dependentGloss] => will 701 | ) 702 | 703 | [3] => Array 704 | ( 705 | [dep] => dobj 706 | [governor] => 3 707 | [governorGloss] => meet 708 | [dependent] => 4 709 | [dependentGloss] => Mary 710 | ) 711 | 712 | [4] => Array 713 | ( 714 | [dep] => case 715 | [governor] => 7 716 | [governorGloss] => York 717 | [dependent] => 5 718 | [dependentGloss] => in 719 | ) 720 | 721 | [5] => Array 722 | ( 723 | [dep] => compound 724 | [governor] => 7 725 | [governorGloss] => York 726 | [dependent] => 6 727 | [dependentGloss] => New 728 | ) 729 | 730 | [6] => Array 731 | ( 732 | [dep] => nmod:in 733 | [governor] => 4 734 | [governorGloss] => Mary 735 | [dependent] => 7 736 | [dependentGloss] => York 737 | ) 738 | 739 | [7] => Array 740 | ( 741 | [dep] => case 742 | [governor] => 9 743 | [governorGloss] => 10pm 744 | [dependent] => 8 745 | [dependentGloss] => at 746 | ) 747 | 748 | [8] => Array 749 | ( 750 | [dep] => nmod:at 751 | [governor] => 3 752 | [governorGloss] => meet 753 | [dependent] => 9 754 | [dependentGloss] => 10pm 755 | ) 756 | 757 | ) 758 | 759 | [openie] => Array 760 | ( 761 | [0] => Array 762 | ( 763 | [subject] => I 764 | [subjectSpan] => Array 765 | ( 766 | [0] => 0 767 | [1] => 1 768 | ) 769 | 770 | [relation] => will meet Mary at 771 | [relationSpan] => Array 772 | ( 773 | [0] => 1 774 | [1] => 3 775 | ) 776 | 777 | [object] => 10pm 778 | [objectSpan] => Array 779 | ( 780 | [0] => 8 781 | [1] => 9 782 | ) 783 | 784 | ) 785 | 786 | [1] => Array 787 | ( 788 | [subject] => I 789 | [subjectSpan] => Array 790 | ( 791 | [0] => 0 792 | [1] => 1 793 | ) 794 | 795 | [relation] => will meet 796 | [relationSpan] => Array 797 | ( 798 | [0] => 1 799 | [1] => 3 800 | ) 801 | 802 | [object] => Mary in New York 803 | [objectSpan] => Array 804 | ( 805 | [0] => 3 806 | [1] => 7 807 | ) 808 | 809 | ) 810 | 811 | [2] => Array 812 | ( 813 | [subject] => I 814 | [subjectSpan] => Array 815 | ( 816 | [0] => 0 817 | [1] => 1 818 | ) 819 | 820 | [relation] => will meet 821 | [relationSpan] => Array 822 | ( 823 | [0] => 1 824 | [1] => 3 825 | ) 826 | 827 | [object] => Mary 828 | [objectSpan] => Array 829 | ( 830 | [0] => 3 831 | [1] => 4 832 | ) 833 | 834 | ) 835 | 836 | [3] => Array 837 | ( 838 | [subject] => Mary 839 | [subjectSpan] => Array 840 | ( 841 | [0] => 3 842 | [1] => 4 843 | ) 844 | 845 | [relation] => is in 846 | [relationSpan] => Array 847 | ( 848 | [0] => 4 849 | [1] => 5 850 | ) 851 | 852 | [object] => New York 853 | [objectSpan] => Array 854 | ( 855 | [0] => 5 856 | [1] => 7 857 | ) 858 | 859 | ) 860 | 861 | ) 862 | 863 | [tokens] => Array 864 | ( 865 | [0] => Array 866 | ( 867 | [index] => 1 868 | [word] => I 869 | [originalText] => I 870 | [lemma] => I 871 | [characterOffsetBegin] => 0 872 | [characterOffsetEnd] => 1 873 | [pos] => PRP 874 | [ner] => O 875 | [before] => 876 | [after] => 877 | ) 878 | 879 | [1] => Array 880 | ( 881 | [index] => 2 882 | [word] => will 883 | [originalText] => will 884 | [lemma] => will 885 | [characterOffsetBegin] => 2 886 | [characterOffsetEnd] => 6 887 | [pos] => MD 888 | [ner] => O 889 | [before] => 890 | [after] => 891 | ) 892 | 893 | [2] => Array 894 | ( 895 | [index] => 3 896 | [word] => meet 897 | [originalText] => meet 898 | [lemma] => meet 899 | [characterOffsetBegin] => 7 900 | [characterOffsetEnd] => 11 901 | [pos] => VB 902 | [ner] => O 903 | [before] => 904 | [after] => 905 | ) 906 | 907 | [3] => Array 908 | ( 909 | [index] => 4 910 | [word] => Mary 911 | [originalText] => Mary 912 | [lemma] => Mary 913 | [characterOffsetBegin] => 12 914 | [characterOffsetEnd] => 16 915 | [pos] => NNP 916 | [ner] => PERSON 917 | [before] => 918 | [after] => 919 | ) 920 | 921 | [4] => Array 922 | ( 923 | [index] => 5 924 | [word] => in 925 | [originalText] => in 926 | [lemma] => in 927 | [characterOffsetBegin] => 17 928 | [characterOffsetEnd] => 19 929 | [pos] => IN 930 | [ner] => O 931 | [before] => 932 | [after] => 933 | ) 934 | 935 | [5] => Array 936 | ( 937 | [index] => 6 938 | [word] => New 939 | [originalText] => New 940 | [lemma] => New 941 | [characterOffsetBegin] => 20 942 | [characterOffsetEnd] => 23 943 | [pos] => NNP 944 | [ner] => LOCATION 945 | [before] => 946 | [after] => 947 | ) 948 | 949 | [6] => Array 950 | ( 951 | [index] => 7 952 | [word] => York 953 | [originalText] => York 954 | [lemma] => York 955 | [characterOffsetBegin] => 24 956 | [characterOffsetEnd] => 28 957 | [pos] => NNP 958 | [ner] => LOCATION 959 | [before] => 960 | [after] => 961 | ) 962 | 963 | [7] => Array 964 | ( 965 | [index] => 8 966 | [word] => at 967 | [originalText] => at 968 | [lemma] => at 969 | [characterOffsetBegin] => 29 970 | [characterOffsetEnd] => 31 971 | [pos] => IN 972 | [ner] => O 973 | [before] => 974 | [after] => 975 | ) 976 | 977 | [8] => Array 978 | ( 979 | [index] => 9 980 | [word] => 10pm 981 | [originalText] => 10pm 982 | [lemma] => 10pm 983 | [characterOffsetBegin] => 32 984 | [characterOffsetEnd] => 36 985 | [pos] => CD 986 | [ner] => TIME 987 | [normalizedNER] => T22:00 988 | [before] => 989 | [after] => 990 | [timex] => Array 991 | ( 992 | [tid] => t1 993 | [type] => TIME 994 | [value] => T22:00 995 | ) 996 | 997 | ) 998 | 999 | ) 1000 | 1001 | ) 1002 | 1003 | ) 1004 | 1005 | ) 1006 | 1007 | ``` 1008 | 1009 | ## Any questions? 1010 | 1011 | Please let me know. 1012 | 1013 | 1014 | ## Credits 1015 | 1016 | Some functions are forked from this "Stanford parser" package: 1017 | ``` 1018 | https://github.com/agentile/PHP-Stanford-NLP 1019 | ``` 1020 | 1021 | -------------------------------------------------------------------------------- /bootstrap.php: -------------------------------------------------------------------------------- 1 | CoreNLP Adapter error: could not load "Composer" files.

' 48 | . '- Run "composer update" on the command line
' 49 | . '- If Composer is not installed, go to: install Composer

'; 50 | die; 51 | } 52 | -------------------------------------------------------------------------------- /composer.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "dennis-de-swart/php-stanford-corenlp-adapter", 3 | "type": "library", 4 | "description": "PHP adapter for use with Stanford CoreNLP tools", 5 | "keywords": ["stanford","nlp","ner","pos","parser"], 6 | "homepage": "https://github.com/DennisDeSwart/PHP-stanford-corenlp-adapter", 7 | "license": "MIT", 8 | "authors": [ 9 | { 10 | "name": "Dennis de Swart", 11 | "email": "dennis@dennisdeswart.nl", 12 | "homepage": "http://www.dennisdeswart.nl", 13 | "role": "Developer" 14 | } 15 | ], 16 | "require": { 17 | "php": ">=5.5", 18 | "guzzlehttp/guzzle": "^6.2.0" 19 | }, 20 | "autoload": { 21 | "classmap": ["src/"] 22 | } 23 | } 24 | -------------------------------------------------------------------------------- /index.php: -------------------------------------------------------------------------------- 1 | getOutput($text1); 23 | 24 | // Second text 25 | $text2 = 'The Golden Gate Bridge was designed by Joseph Strauss.'; 26 | $coreNLP->getOutput($text2); 27 | 28 | /** 29 | * Display result 30 | */ 31 | 32 | // this makes it easier to read 33 | echo '
';
34 | 
35 |     // show complete output
36 |     headerText('The "Server Memory Object" (below) contains all the server output');
37 |     print_r($coreNLP->serverMemory);
38 | 
39 |     // first text tree
40 |     headerText('FIRST TEXT: Part-Of-Speech tree');
41 |     print_r($coreNLP->trees[0]);
42 | 
43 |     // second text tree
44 |     headerText('SECOND TEXT: Part-Of-Speech tree');
45 |     print_r($coreNLP->trees[1]);
46 |     
47 |     // get IDs for a tree
48 |     headerText('EVERY TREE HAS UNIQUE IDs: this shows the Word-tree-IDs for the second tree');
49 |     print_r($coreNLP->getWordValues($coreNLP->trees[1]));
50 | 
51 |     // this is just a helper function for a nice header
52 |     function headerText($header){
53 |             echo '
***'.str_repeat('*', strlen($header)).'***
'; 54 | echo '** '.$header.' **
'; 55 | echo '***'.str_repeat('*', strlen($header)).'***

'; 56 | } 57 | 58 | -------------------------------------------------------------------------------- /src/CoreNLP/CorenlpAdapter.php: -------------------------------------------------------------------------------- 1 | loadHTMLfile(ONLINE_URL.urlencode($text)); 29 | $pre = $doc->getElementsByTagName('pre')->item(0); 30 | $content = $pre->nodeValue; 31 | $string = htmlentities($content, null, 'utf-8'); 32 | $content = str_replace(" ", "", $string); 33 | $content = html_entity_decode($content); 34 | $this->serverRawOutput = $content; 35 | 36 | // get object with data 37 | $this->serverOutput = json_decode($this->serverRawOutput, true); // note: decodes into an array, not an object 38 | return; 39 | } 40 | 41 | /** 42 | * function getServerOutput: 43 | * - sends a request 44 | * - returns server output 45 | * 46 | * @param string $text 47 | * @return type 48 | */ 49 | public function getServerOutput($text){ 50 | 51 | if(USE_GUZZLE){ 52 | $this->getOutputGuzzle($text); 53 | } else { 54 | $this->getOutputCurl($text); 55 | } 56 | } 57 | 58 | public function getOutputCurl($text){ 59 | // create a shell command 60 | $command = 'curl --data "'.$text.'" "'.CURLURL.'"?properties={"'.CURLPROPERTIES.'"}'; 61 | 62 | try { 63 | // do the shell command 64 | $this->serverRawOutput = shell_exec($command); 65 | 66 | } catch (Exception $e) { 67 | echo 'Caught exception: ', $e->getMessage(), "\n"; 68 | } 69 | 70 | // get object with data 71 | $this->serverOutput = json_decode($this->serverRawOutput, true); // note: decodes into an array, not an object 72 | 73 | return; 74 | } 75 | 76 | public function getOutputGuzzle($text){ 77 | 78 | $client = new \GuzzleHttp\Client(); 79 | $res = $client->request('POST', CURLURL, [ 80 | 'body' => $text 81 | ]); 82 | 83 | $json = $res->getBody(); 84 | $this->serverOutput = json_decode($json, true); 85 | 86 | return; 87 | } 88 | 89 | 90 | /** 91 | * function getOutput 92 | * 93 | * - role: all-in-one function to make life easy for the user 94 | */ 95 | public function getOutput($text){ 96 | 97 | if(ONLINE_API){ 98 | // run the text through the public API 99 | $this->getServerOutputOnline($text); 100 | } else{ 101 | // run the text through Java CoreNLP 102 | $this->getServerOutput($text); 103 | } 104 | 105 | // cache result 106 | $this->serverMemory[] = $this->serverOutput; 107 | 108 | if(empty($this->serverOutput)){ 109 | echo '** ERROR: No output from the CoreNLP Server **
110 | - Check if the CoreNLP server is running. Start the CoreNLP server if necessary
111 | - Check if the port you are using (probably port 9000) is not blocked by another program
'; 112 | die; 113 | } 114 | 115 | /** 116 | * create trees 117 | */ 118 | $sentences = $this->serverOutput['sentences']; 119 | foreach($this->serverOutput['sentences'] as $sentence){ 120 | $tree = $this->getTreeWithTokens($sentence); // gets one tree 121 | $this->trees[] = $tree; // collect all trees 122 | } 123 | 124 | /** 125 | * add OpenIE data 126 | */ 127 | $this->addOpenIE(); 128 | 129 | // to get the trees just call $coreNLP->trees in the main program 130 | return; 131 | } 132 | 133 | 134 | /** 135 | * 136 | * MAIN PARSING FUNCTIONS 137 | * 138 | */ 139 | /** 140 | * Gets tree from parse 141 | * 142 | * @param string $parse 143 | * @return array 144 | */ 145 | public function getTree($parse){ 146 | 147 | $this->getSentenceTree($parse); // creates tree from parse, then saves tree in "mem" 148 | $result = $this->mem; // get tree from "mem" 149 | $this->resetSentenceTree(); // clear "mem" 150 | 151 | return (array) $result; 152 | } 153 | 154 | /** 155 | * Gets tree that combines depth/ parent information with the tokens 156 | * 157 | * @param array $sentence 158 | * @return array 159 | */ 160 | public function getTreeWithTokens($sentence){ 161 | 162 | $parse = $sentence['parse']; 163 | $tokens = $sentence['tokens']; 164 | 165 | // get simple tree 166 | $tree = $this->getTree($parse); 167 | 168 | // step 1: get tree key ID's for each of the words 169 | $treeWordKeys = $this->getWordKeys($tree); 170 | 171 | // step 2: change the keys of the token array to tree IDs 172 | $combinedTokens = array_combine(array_values($treeWordKeys), $tokens); 173 | 174 | // step 3: import the token array into the tree 175 | foreach($tree as $treeKey => $value){ 176 | if(array_key_exists($treeKey, $combinedTokens)){ 177 | $tokenItems = $combinedTokens[$treeKey]; 178 | 179 | foreach($tokenItems as $tokenKey => $token){ 180 | $tree[$treeKey][$tokenKey] = $token; 181 | } 182 | } 183 | } 184 | return $tree; 185 | } 186 | 187 | /** 188 | * helpers for SentenceTree 189 | */ 190 | private $mem; 191 | private $memId; 192 | private $memparent; 193 | private $iteratorDepth; 194 | private $memDepth; 195 | private $parentId; 196 | private $sentenceTree = array(); 197 | 198 | /** 199 | * resets SentenceTree 200 | */ 201 | private function resetSentenceTree(){ 202 | $this->mem = array(); 203 | $this->memId = 0; 204 | $this->memparent = array(); 205 | $this->iteratorDepth= 0; 206 | $this->memDepth = -1; 207 | $this->parentId = 0; 208 | $this->sentenceTree = array(); 209 | } 210 | 211 | /** 212 | * Takes one $sentence and creates a flat tree with: 213 | * - parentId 214 | * - penn Treebank Tag 215 | * - depth 216 | * - word value 217 | * 218 | * @param string $sentence 219 | */ 220 | public function getSentenceTree($sentence){ 221 | 222 | // parse the tree 223 | $this->sentenceTree = $this->runSentenceTree($sentence); 224 | 225 | $iterator = new RecursiveIteratorIterator( 226 | new RecursiveArrayIterator($this->sentenceTree)); 227 | 228 | for($iterator->next(); $iterator->valid(); $iterator->next()) 229 | { 230 | if(!is_array($iterator->current())){ 231 | 232 | $this->iteratorDepth = $iterator->getDepth(); 233 | 234 | if($this->iteratorDepth > $this->memDepth){ 235 | 236 | $this->depthShiftUp(); 237 | 238 | } else if($this->iteratorDepth < $this->memDepth){ 239 | 240 | $this->depthShiftDown(); 241 | 242 | } else { 243 | 244 | if($iterator->key() == 'pennTag'){ 245 | $this->memId++; 246 | $this->mem[$this->memId]['parent'] = $this->parentId; 247 | } 248 | } 249 | 250 | if($iterator->key() == 'pennTag'){ 251 | $this->mem[$this->memId]['pennTreebankTag'] = $iterator->current(); 252 | $this->mem[$this->memId]['depth'] = $this->iteratorDepth; 253 | } 254 | 255 | if($iterator->key() == 'word'){ 256 | $this->mem[$this->memId]['word'] = $iterator->current(); 257 | } 258 | } 259 | } 260 | } 261 | 262 | /** 263 | * helper for SentenceTree iteration 264 | */ 265 | private function depthShiftUp(){ 266 | 267 | // remember the parent 268 | $this->parentId = $this->memId; 269 | 270 | // set new id for iteration 271 | $this->memId++; 272 | 273 | // set parent 274 | $this->mem[$this->memId]['parent'] = $this->parentId; 275 | 276 | // remember parent 277 | $this->memparent[$this->memDepth] = $this->parentId; 278 | 279 | // set new depth 280 | $this->memDepth = $this->iteratorDepth; 281 | } 282 | 283 | /** 284 | * helper for SentenceTree iteration 285 | */ 286 | private function depthShiftDown(){ 287 | 288 | // set new id for iteration 289 | $this->memId++; 290 | 291 | // set new depth 292 | $this->memDepth = $this->iteratorDepth; 293 | 294 | // set new parent 295 | $this->parentId = ($this->memDepth)-2; 296 | 297 | // write parent to tree 298 | $this->mem[$this->memId]['parent'] = $this->memparent[$this->parentId] ; 299 | } 300 | 301 | /** 302 | * Creates tree for parsed sentence 303 | * Based on https://github.com/agentile/PHP-Stanford-NLP 304 | * 305 | * @param string $sentence 306 | * @return type 307 | */ 308 | private function runSentenceTree($sentence) 309 | { 310 | $arr = array('pennTag' => null); 311 | $length = strlen($sentence); 312 | $node = ''; 313 | $bracket= 1; 314 | 315 | for ($i = 1; $i < $length; $i++) { 316 | 317 | if ($sentence[$i] == '(') { 318 | $bracket += 1; 319 | $match_i = $this->getMatchingBracket($sentence, $i); 320 | $arr['children'][] = $this->runSentenceTree(substr($sentence, $i, ($match_i - $i) + 1)); 321 | $i = $match_i - 1; 322 | } else if ($sentence[$i] == ')') { 323 | $bracket -= 1; 324 | $tag_and_word = explode(' ', trim($node)); 325 | $arr['pennTag'] = $tag_and_word[0]; 326 | 327 | if (array_key_exists('1', $tag_and_word)){ 328 | $arr['word'] = $tag_and_word[1]; 329 | } 330 | 331 | } else { 332 | $node .= $sentence[$i]; 333 | } 334 | 335 | if ($bracket == 0) { 336 | return $arr; 337 | } 338 | } 339 | 340 | return $arr; 341 | } 342 | 343 | /** 344 | * Find the position of a matching closing bracket for a string opening bracket 345 | * 346 | * @param string $string 347 | * @param int $start_pos 348 | * @return type 349 | */ 350 | private function getMatchingBracket($string, $start_pos) 351 | { 352 | $length = strlen($string); 353 | $bracket = 1; 354 | foreach (range($start_pos + 1, $length) as $i) { 355 | if ($string[$i] == '(') { 356 | $bracket += 1; 357 | } else if ($string[$i] == ')') { 358 | $bracket -= 1; 359 | } 360 | if ($bracket == 0) { 361 | return $i; 362 | } 363 | } 364 | } 365 | 366 | /** 367 | * 368 | * OTHER PARSING FUNCTIONS 369 | * 370 | */ 371 | 372 | // Get an array that contains the keys to words within one tree 373 | public function getWordKeys($tree){ 374 | 375 | $result = array(); 376 | 377 | foreach ($tree as $wordId => $node){ 378 | if(array_key_exists('word', $node)){ 379 | $result[] = $wordId; 380 | } 381 | } 382 | return $result; 383 | } 384 | 385 | // Get an array with the tree leaves that contain words 386 | public function getWordValues($tree){ 387 | 388 | $result = array(); 389 | 390 | foreach ($tree as $wordId => $node){ 391 | if(array_key_exists('word', $node)){ 392 | $result[$wordId] = $node; 393 | } 394 | } 395 | return $result; 396 | } 397 | 398 | /** 399 | * OpenIE functions 400 | */ 401 | public function addOpenIE(){ 402 | 403 | foreach($this->serverOutput['sentences'] as $key => $sentence){ 404 | 405 | if(array_key_exists('openie', $sentence)){ 406 | 407 | $openIEs = $sentence['openie']; 408 | 409 | foreach($openIEs as $keyOpenIE => $openIE){ 410 | 411 | if(!empty($openIE)){ 412 | 413 | foreach($this->trees[$key] as &$node){ 414 | 415 | if(array_key_exists('index', $node)){ 416 | 417 | if( $node['index']-1 >= $openIE['subjectSpan'][0] && $node['index']-1 < $openIE['subjectSpan'][1] ){ 418 | $node['openIE'][$keyOpenIE] = 'subject'; 419 | } 420 | 421 | if( ($node['index']-1 >= $openIE['relationSpan'][0]) && $node['index']-1 < $openIE['relationSpan'][1] ){ 422 | $node['openIE'][$keyOpenIE] = 'relation'; 423 | } 424 | 425 | if( ($node['index']-1 >= $openIE['objectSpan'][0]) && $node['index']-1 < $openIE['objectSpan'][1] ){ 426 | $node['openIE'][$keyOpenIE] = 'object'; 427 | } 428 | } 429 | } 430 | } 431 | } 432 | } 433 | } 434 | } 435 | 436 | 437 | } 438 | --------------------------------------------------------------------------------