├── README.md
├── bootstrap.php
├── composer.json
├── index.php
└── src
└── CoreNLP
└── CorenlpAdapter.php
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # PHP Stanford CoreNLP adapter
3 |
4 | [](https://packagist.org/packages/dennis-de-swart/php-stanford-corenlp-adapter)
5 | [](https://packagist.org/packages/dennis-de-swart/php-stanford-corenlp-adapter)
6 | [](https://github.com/DennisDeSwart/php-stanford-corenlp-adapter)
7 | [](http://php.net/)
8 | [](https://opensource.org/licenses/MIT)
9 |
10 | PHP adapter for use with Stanford CoreNLP
11 |
12 | ## Features
13 | - Connect to Stanford University CoreNLP API online
14 | - Connect to Stanford CoreNLP 3.7.0 server
15 | - Annotators available: tokenize,ssplit,pos, parse, depparse, ner, regexner,lemma, mention, natlog, coref, openie, kbp
16 | - The package creates Part-Of-Speech Trees with depth, parent- and child ID
17 |
18 |
19 |
20 | ## Requirements
21 | - PHP 5.5 or higher: it also works on PHP 7
22 | - Windows or Linux 64-bit, 8Gb memory or more recommended
23 | - Either Guzzle HTTP Client (installed by default) or only cURL.
24 | - Composer for PHP
25 | ```
26 | https://getcomposer.org/
27 | ```
28 |
29 |
30 | ## Update 24th February 2018
31 | PHP7 Type hinting removed, because it was causing issues for some users.
32 |
33 | ## Update 28th January 2019
34 | Fixed issue with PHP 7.1 upwards
35 |
36 | ## Installation using ZIP files
37 |
38 | - Install Stanford CoreNLP Server. See the installation walkthrough below.
39 | - Download and unpack the files from this package.
40 | - Copy the files to your to your webserver directory. Usually "htdocs" or "var/www".
41 | - Run a Composer update
42 |
43 |
44 |
45 | ## Installation using Composer
46 |
47 | - Insert the following line into the "require" of your "composer.json" file.
48 |
49 | ```
50 | {
51 | "require": {
52 | "dennis-de-swart/php-stanford-corenlp-adapter": "*"
53 | }
54 | }
55 | ```
56 |
57 | - Run a composer update
58 |
59 |
60 |
61 | ## Using the Stanford CoreNLP online API service
62 |
63 |
64 | The adapter by default uses Stanford's online API service. This should work right after the composer update.
65 | Note that the online API is a public service. If you want to analyze large volumes of text or sensitive data,
66 | please install the Java server version.
67 |
68 |
69 |
70 | ## OpenIE
71 |
72 | OpenIE creates "subject-relation-object" tuples. This is similar (but not the same) as the "Subject-Verb-Object" concept of the English language.
73 |
74 | Notes:
75 | - OpenIE is only available on the Java offline version, not with the "online" mode. See the installation walkthrough below
76 | - OpenIE data is not always available. Sometimes the result array might show empty, this is not an error.
77 |
78 | ```
79 | http://nlp.stanford.edu/software/openie.html
80 | https://en.wikipedia.org/wiki/Subject-verb-object
81 | ```
82 |
83 |
84 |
85 | # Installation / Walkthrough for Java server version
86 |
87 |
88 |
89 |
90 | ## Step 1: install Java
91 |
92 | ```
93 | https://java.com/en/download/help/index_installing.xml?os=All+Platforms&j=8&n=20
94 | ```
95 |
96 | ## Step 2: installing the Stanford CoreNLP 3.7.0 server
97 | ```
98 | http://stanfordnlp.github.io/CoreNLP/index.html#download
99 | ```
100 |
101 |
102 |
103 | ## Step 3: Port for server
104 | Default port for the Java server is port 9000. If port 9000 is not available you can change the port in the "bootstrap.php" file. Example:
105 |
106 | ```
107 | define('CURLURL' , 'http://localhost:9000/');
108 |
109 | ```
110 |
111 |
112 | ## Step 4: Start the CoreNLP serve from the command line.
113 |
114 | Go to the download directory, then enter the following command:
115 |
116 | ```
117 | java -mx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000
118 | ```
119 |
120 | Important note: the Stanford manual says "-mx4g", however I found that this can lead to a Java OutOfMemory error. It is also important to use a 64-bit operating system with at enough memory (8Gb or more recommended)
121 |
122 |
123 | ## Step 5: Test if the server has started by surfing to it's URL
124 | ```
125 | http://localhost:9000/
126 | ```
127 | When you surf to this URL, you should see the CoreNLP GUI. If you have problems with installation you can check the manual:
128 | ```
129 | http://stanfordnlp.github.io/CoreNLP/corenlp-server.html
130 | ```
131 |
132 | ## Step 6: Set ONLINE_API to FALSE
133 |
134 | In "bootstrap.php" set define('ONLINE_API' , FALSE). This tells the Adapter to use the Java version
135 |
136 |
137 |
138 |
139 | # Usage examples
140 |
141 |
142 |
143 | ## Instantiate the adapter:
144 | ```
145 | $coreNLP = new CorenlpAdapter();
146 | ```
147 |
148 |
149 | ## To process a text, call the "getOutput" method:
150 | ```
151 | $text = 'The Golden Gate Bridge was designed by Joseph Strauss.';
152 | $coreNLP->getOutput($text);
153 | ```
154 |
155 | Note that the first time that you process a text, the server takes about 20 to 30 seconds extra to load definitions. All other calls to the server after that will be much faster. Small texts are usually processed within seconds.
156 |
157 |
158 |
159 | ## The results
160 |
161 | If successful the following properties will be available:
162 | ```
163 | $coreNLP->serverMemory; //contains all of the server output
164 | $coreNLP->trees; //contains processed flat trees. Each part of the tree is assigned an ID key
165 |
166 | $coreNLP->getWordValues($coreNLP->trees[1]) // get just the words from a tree
167 | ```
168 |
169 |
170 |
171 |
172 | ********************************
173 | ### Diagram A: Tree With Tokens
174 | ********************************
175 | ```
176 | Array
177 | (
178 | [1] => Array
179 | (
180 | [parent] =>
181 | [pennTreebankTag] => ROOT
182 | [depth] => 0
183 | )
184 |
185 | [2] => Array
186 | (
187 | [parent] => 1
188 | [pennTreebankTag] => S
189 | [depth] => 2
190 | )
191 |
192 | [3] => Array
193 | (
194 | [parent] => 2
195 | [pennTreebankTag] => NP
196 | [depth] => 4
197 | )
198 |
199 | [4] => Array
200 | (
201 | [parent] => 3
202 | [pennTreebankTag] => PRP
203 | [depth] => 6
204 | [word] => I
205 | [index] => 1
206 | [originalText] => I
207 | [lemma] => I
208 | [characterOffsetBegin] => 0
209 | [characterOffsetEnd] => 1
210 | [pos] => PRP
211 | [ner] => O
212 | [before] =>
213 | [after] =>
214 | [openIE] => Array
215 | (
216 | [0] => subject
217 | [1] => subject
218 | [2] => subject
219 | )
220 |
221 | )
222 |
223 | [5] => Array
224 | (
225 | [parent] => 2
226 | [pennTreebankTag] => VP
227 | [depth] => 4
228 | )
229 |
230 | [6] => Array
231 | (
232 | [parent] => 5
233 | [pennTreebankTag] => MD
234 | [depth] => 6
235 | [word] => will
236 | [index] => 2
237 | [originalText] => will
238 | [lemma] => will
239 | [characterOffsetBegin] => 2
240 | [characterOffsetEnd] => 6
241 | [pos] => MD
242 | [ner] => O
243 | [before] =>
244 | [after] =>
245 | [openIE] => Array
246 | (
247 | [0] => subject
248 | [1] => subject
249 | [2] => relation
250 | )
251 |
252 | )
253 |
254 | [7] => Array
255 | (
256 | [parent] => 5
257 | [pennTreebankTag] => VP
258 | [depth] => 6
259 | )
260 |
261 | [8] => Array
262 | (
263 | [parent] => 7
264 | [pennTreebankTag] => VB
265 | [depth] => 8
266 | [word] => meet
267 | [index] => 3
268 | [originalText] => meet
269 | [lemma] => meet
270 | [characterOffsetBegin] => 7
271 | [characterOffsetEnd] => 11
272 | [pos] => VB
273 | [ner] => O
274 | [before] =>
275 | [after] =>
276 | [openIE] => Array
277 | (
278 | [0] => subject
279 | [1] => subject
280 | [2] => relation
281 | )
282 |
283 | )
284 |
285 | [9] => Array
286 | (
287 | [parent] => 7
288 | [pennTreebankTag] => NP
289 | [depth] => 8
290 | )
291 |
292 | [10] => Array
293 | (
294 | [parent] => 9
295 | [pennTreebankTag] => NP
296 | [depth] => 10
297 | )
298 |
299 | [11] => Array
300 | (
301 | [parent] => 10
302 | [pennTreebankTag] => NNP
303 | [depth] => 12
304 | [word] => Mary
305 | [index] => 4
306 | [originalText] => Mary
307 | [lemma] => Mary
308 | [characterOffsetBegin] => 12
309 | [characterOffsetEnd] => 16
310 | [pos] => NNP
311 | [ner] => PERSON
312 | [before] =>
313 | [after] =>
314 | [openIE] => Array
315 | (
316 | [1] => subject
317 | [2] => object
318 | [3] => subject
319 | [0] => subject
320 | )
321 |
322 | )
323 |
324 | [12] => Array
325 | (
326 | [parent] => 9
327 | [pennTreebankTag] => PP
328 | [depth] => 10
329 | )
330 |
331 | [13] => Array
332 | (
333 | [parent] => 12
334 | [pennTreebankTag] => IN
335 | [depth] => 12
336 | [word] => in
337 | [index] => 5
338 | [originalText] => in
339 | [lemma] => in
340 | [characterOffsetBegin] => 17
341 | [characterOffsetEnd] => 19
342 | [pos] => IN
343 | [ner] => O
344 | [before] =>
345 | [after] =>
346 | [openIE] => Array
347 | (
348 | [1] => relation
349 | [3] => relation
350 | [0] => relation
351 | )
352 |
353 | )
354 |
355 | [14] => Array
356 | (
357 | [parent] => 12
358 | [pennTreebankTag] => NP
359 | [depth] => 12
360 | )
361 |
362 | [15] => Array
363 | (
364 | [parent] => 14
365 | [pennTreebankTag] => NNP
366 | [depth] => 14
367 | [word] => New
368 | [index] => 6
369 | [originalText] => New
370 | [lemma] => New
371 | [characterOffsetBegin] => 20
372 | [characterOffsetEnd] => 23
373 | [pos] => NNP
374 | [ner] => LOCATION
375 | [before] =>
376 | [after] =>
377 | [openIE] => Array
378 | (
379 | [1] => relation
380 | [3] => object
381 | [0] => object
382 | )
383 |
384 | )
385 |
386 | [16] => Array
387 | (
388 | [parent] => 14
389 | [pennTreebankTag] => NNP
390 | [depth] => 14
391 | [word] => York
392 | [index] => 7
393 | [originalText] => York
394 | [lemma] => York
395 | [characterOffsetBegin] => 24
396 | [characterOffsetEnd] => 28
397 | [pos] => NNP
398 | [ner] => LOCATION
399 | [before] =>
400 | [after] =>
401 | [openIE] => Array
402 | (
403 | [1] => object
404 | [3] => object
405 | )
406 |
407 | )
408 |
409 | [17] => Array
410 | (
411 | [parent] => 7
412 | [pennTreebankTag] => PP
413 | [depth] => 8
414 | )
415 |
416 | [18] => Array
417 | (
418 | [parent] => 17
419 | [pennTreebankTag] => IN
420 | [depth] => 10
421 | [word] => at
422 | [index] => 8
423 | [originalText] => at
424 | [lemma] => at
425 | [characterOffsetBegin] => 29
426 | [characterOffsetEnd] => 31
427 | [pos] => IN
428 | [ner] => O
429 | [before] =>
430 | [after] =>
431 | [openIE] => Array
432 | (
433 | [1] => object
434 | )
435 |
436 | )
437 |
438 | [19] => Array
439 | (
440 | [parent] => 17
441 | [pennTreebankTag] => NP
442 | [depth] => 10
443 | )
444 |
445 | [20] => Array
446 | (
447 | [parent] => 19
448 | [pennTreebankTag] => CD
449 | [depth] => 12
450 | [word] => 10pm
451 | [index] => 9
452 | [originalText] => 10pm
453 | [lemma] => 10pm
454 | [characterOffsetBegin] => 32
455 | [characterOffsetEnd] => 36
456 | [pos] => CD
457 | [ner] => TIME
458 | [normalizedNER] => T22:00
459 | [before] =>
460 | [after] =>
461 | [timex] => Array
462 | (
463 | [tid] => t1
464 | [type] => TIME
465 | [value] => T22:00
466 | )
467 |
468 | [openIE] => Array
469 | (
470 | [0] => object
471 | [1] => object
472 | )
473 |
474 | )
475 |
476 | )
477 |
478 | ```
479 |
480 | ***********************************************************************
481 | ### Diagram B: The ServerMemory contains all the server data
482 | ***********************************************************************
483 | ```
484 | Array
485 | (
486 | [0] => Array
487 | (
488 | [sentences] => Array
489 | (
490 | [0] => Array
491 | (
492 | [index] => 0
493 | [parse] => (ROOT
494 | (S
495 | (NP (PRP I))
496 | (VP (MD will)
497 | (VP (VB meet)
498 | (NP
499 | (NP (NNP Mary))
500 | (PP (IN in)
501 | (NP (NNP New) (NNP York))))
502 | (PP (IN at)
503 | (NP (CD 10pm)))))))
504 | [basic-dependencies] => Array
505 | (
506 | [0] => Array
507 | (
508 | [dep] => ROOT
509 | [governor] => 0
510 | [governorGloss] => ROOT
511 | [dependent] => 3
512 | [dependentGloss] => meet
513 | )
514 |
515 | [1] => Array
516 | (
517 | [dep] => nsubj
518 | [governor] => 3
519 | [governorGloss] => meet
520 | [dependent] => 1
521 | [dependentGloss] => I
522 | )
523 |
524 | [2] => Array
525 | (
526 | [dep] => aux
527 | [governor] => 3
528 | [governorGloss] => meet
529 | [dependent] => 2
530 | [dependentGloss] => will
531 | )
532 |
533 | [3] => Array
534 | (
535 | [dep] => dobj
536 | [governor] => 3
537 | [governorGloss] => meet
538 | [dependent] => 4
539 | [dependentGloss] => Mary
540 | )
541 |
542 | [4] => Array
543 | (
544 | [dep] => case
545 | [governor] => 7
546 | [governorGloss] => York
547 | [dependent] => 5
548 | [dependentGloss] => in
549 | )
550 |
551 | [5] => Array
552 | (
553 | [dep] => compound
554 | [governor] => 7
555 | [governorGloss] => York
556 | [dependent] => 6
557 | [dependentGloss] => New
558 | )
559 |
560 | [6] => Array
561 | (
562 | [dep] => nmod
563 | [governor] => 4
564 | [governorGloss] => Mary
565 | [dependent] => 7
566 | [dependentGloss] => York
567 | )
568 |
569 | [7] => Array
570 | (
571 | [dep] => case
572 | [governor] => 9
573 | [governorGloss] => 10pm
574 | [dependent] => 8
575 | [dependentGloss] => at
576 | )
577 |
578 | [8] => Array
579 | (
580 | [dep] => nmod
581 | [governor] => 3
582 | [governorGloss] => meet
583 | [dependent] => 9
584 | [dependentGloss] => 10pm
585 | )
586 |
587 | )
588 |
589 | [collapsed-dependencies] => Array
590 | (
591 | [0] => Array
592 | (
593 | [dep] => ROOT
594 | [governor] => 0
595 | [governorGloss] => ROOT
596 | [dependent] => 3
597 | [dependentGloss] => meet
598 | )
599 |
600 | [1] => Array
601 | (
602 | [dep] => nsubj
603 | [governor] => 3
604 | [governorGloss] => meet
605 | [dependent] => 1
606 | [dependentGloss] => I
607 | )
608 |
609 | [2] => Array
610 | (
611 | [dep] => aux
612 | [governor] => 3
613 | [governorGloss] => meet
614 | [dependent] => 2
615 | [dependentGloss] => will
616 | )
617 |
618 | [3] => Array
619 | (
620 | [dep] => dobj
621 | [governor] => 3
622 | [governorGloss] => meet
623 | [dependent] => 4
624 | [dependentGloss] => Mary
625 | )
626 |
627 | [4] => Array
628 | (
629 | [dep] => case
630 | [governor] => 7
631 | [governorGloss] => York
632 | [dependent] => 5
633 | [dependentGloss] => in
634 | )
635 |
636 | [5] => Array
637 | (
638 | [dep] => compound
639 | [governor] => 7
640 | [governorGloss] => York
641 | [dependent] => 6
642 | [dependentGloss] => New
643 | )
644 |
645 | [6] => Array
646 | (
647 | [dep] => nmod:in
648 | [governor] => 4
649 | [governorGloss] => Mary
650 | [dependent] => 7
651 | [dependentGloss] => York
652 | )
653 |
654 | [7] => Array
655 | (
656 | [dep] => case
657 | [governor] => 9
658 | [governorGloss] => 10pm
659 | [dependent] => 8
660 | [dependentGloss] => at
661 | )
662 |
663 | [8] => Array
664 | (
665 | [dep] => nmod:at
666 | [governor] => 3
667 | [governorGloss] => meet
668 | [dependent] => 9
669 | [dependentGloss] => 10pm
670 | )
671 |
672 | )
673 |
674 | [collapsed-ccprocessed-dependencies] => Array
675 | (
676 | [0] => Array
677 | (
678 | [dep] => ROOT
679 | [governor] => 0
680 | [governorGloss] => ROOT
681 | [dependent] => 3
682 | [dependentGloss] => meet
683 | )
684 |
685 | [1] => Array
686 | (
687 | [dep] => nsubj
688 | [governor] => 3
689 | [governorGloss] => meet
690 | [dependent] => 1
691 | [dependentGloss] => I
692 | )
693 |
694 | [2] => Array
695 | (
696 | [dep] => aux
697 | [governor] => 3
698 | [governorGloss] => meet
699 | [dependent] => 2
700 | [dependentGloss] => will
701 | )
702 |
703 | [3] => Array
704 | (
705 | [dep] => dobj
706 | [governor] => 3
707 | [governorGloss] => meet
708 | [dependent] => 4
709 | [dependentGloss] => Mary
710 | )
711 |
712 | [4] => Array
713 | (
714 | [dep] => case
715 | [governor] => 7
716 | [governorGloss] => York
717 | [dependent] => 5
718 | [dependentGloss] => in
719 | )
720 |
721 | [5] => Array
722 | (
723 | [dep] => compound
724 | [governor] => 7
725 | [governorGloss] => York
726 | [dependent] => 6
727 | [dependentGloss] => New
728 | )
729 |
730 | [6] => Array
731 | (
732 | [dep] => nmod:in
733 | [governor] => 4
734 | [governorGloss] => Mary
735 | [dependent] => 7
736 | [dependentGloss] => York
737 | )
738 |
739 | [7] => Array
740 | (
741 | [dep] => case
742 | [governor] => 9
743 | [governorGloss] => 10pm
744 | [dependent] => 8
745 | [dependentGloss] => at
746 | )
747 |
748 | [8] => Array
749 | (
750 | [dep] => nmod:at
751 | [governor] => 3
752 | [governorGloss] => meet
753 | [dependent] => 9
754 | [dependentGloss] => 10pm
755 | )
756 |
757 | )
758 |
759 | [openie] => Array
760 | (
761 | [0] => Array
762 | (
763 | [subject] => I
764 | [subjectSpan] => Array
765 | (
766 | [0] => 0
767 | [1] => 1
768 | )
769 |
770 | [relation] => will meet Mary at
771 | [relationSpan] => Array
772 | (
773 | [0] => 1
774 | [1] => 3
775 | )
776 |
777 | [object] => 10pm
778 | [objectSpan] => Array
779 | (
780 | [0] => 8
781 | [1] => 9
782 | )
783 |
784 | )
785 |
786 | [1] => Array
787 | (
788 | [subject] => I
789 | [subjectSpan] => Array
790 | (
791 | [0] => 0
792 | [1] => 1
793 | )
794 |
795 | [relation] => will meet
796 | [relationSpan] => Array
797 | (
798 | [0] => 1
799 | [1] => 3
800 | )
801 |
802 | [object] => Mary in New York
803 | [objectSpan] => Array
804 | (
805 | [0] => 3
806 | [1] => 7
807 | )
808 |
809 | )
810 |
811 | [2] => Array
812 | (
813 | [subject] => I
814 | [subjectSpan] => Array
815 | (
816 | [0] => 0
817 | [1] => 1
818 | )
819 |
820 | [relation] => will meet
821 | [relationSpan] => Array
822 | (
823 | [0] => 1
824 | [1] => 3
825 | )
826 |
827 | [object] => Mary
828 | [objectSpan] => Array
829 | (
830 | [0] => 3
831 | [1] => 4
832 | )
833 |
834 | )
835 |
836 | [3] => Array
837 | (
838 | [subject] => Mary
839 | [subjectSpan] => Array
840 | (
841 | [0] => 3
842 | [1] => 4
843 | )
844 |
845 | [relation] => is in
846 | [relationSpan] => Array
847 | (
848 | [0] => 4
849 | [1] => 5
850 | )
851 |
852 | [object] => New York
853 | [objectSpan] => Array
854 | (
855 | [0] => 5
856 | [1] => 7
857 | )
858 |
859 | )
860 |
861 | )
862 |
863 | [tokens] => Array
864 | (
865 | [0] => Array
866 | (
867 | [index] => 1
868 | [word] => I
869 | [originalText] => I
870 | [lemma] => I
871 | [characterOffsetBegin] => 0
872 | [characterOffsetEnd] => 1
873 | [pos] => PRP
874 | [ner] => O
875 | [before] =>
876 | [after] =>
877 | )
878 |
879 | [1] => Array
880 | (
881 | [index] => 2
882 | [word] => will
883 | [originalText] => will
884 | [lemma] => will
885 | [characterOffsetBegin] => 2
886 | [characterOffsetEnd] => 6
887 | [pos] => MD
888 | [ner] => O
889 | [before] =>
890 | [after] =>
891 | )
892 |
893 | [2] => Array
894 | (
895 | [index] => 3
896 | [word] => meet
897 | [originalText] => meet
898 | [lemma] => meet
899 | [characterOffsetBegin] => 7
900 | [characterOffsetEnd] => 11
901 | [pos] => VB
902 | [ner] => O
903 | [before] =>
904 | [after] =>
905 | )
906 |
907 | [3] => Array
908 | (
909 | [index] => 4
910 | [word] => Mary
911 | [originalText] => Mary
912 | [lemma] => Mary
913 | [characterOffsetBegin] => 12
914 | [characterOffsetEnd] => 16
915 | [pos] => NNP
916 | [ner] => PERSON
917 | [before] =>
918 | [after] =>
919 | )
920 |
921 | [4] => Array
922 | (
923 | [index] => 5
924 | [word] => in
925 | [originalText] => in
926 | [lemma] => in
927 | [characterOffsetBegin] => 17
928 | [characterOffsetEnd] => 19
929 | [pos] => IN
930 | [ner] => O
931 | [before] =>
932 | [after] =>
933 | )
934 |
935 | [5] => Array
936 | (
937 | [index] => 6
938 | [word] => New
939 | [originalText] => New
940 | [lemma] => New
941 | [characterOffsetBegin] => 20
942 | [characterOffsetEnd] => 23
943 | [pos] => NNP
944 | [ner] => LOCATION
945 | [before] =>
946 | [after] =>
947 | )
948 |
949 | [6] => Array
950 | (
951 | [index] => 7
952 | [word] => York
953 | [originalText] => York
954 | [lemma] => York
955 | [characterOffsetBegin] => 24
956 | [characterOffsetEnd] => 28
957 | [pos] => NNP
958 | [ner] => LOCATION
959 | [before] =>
960 | [after] =>
961 | )
962 |
963 | [7] => Array
964 | (
965 | [index] => 8
966 | [word] => at
967 | [originalText] => at
968 | [lemma] => at
969 | [characterOffsetBegin] => 29
970 | [characterOffsetEnd] => 31
971 | [pos] => IN
972 | [ner] => O
973 | [before] =>
974 | [after] =>
975 | )
976 |
977 | [8] => Array
978 | (
979 | [index] => 9
980 | [word] => 10pm
981 | [originalText] => 10pm
982 | [lemma] => 10pm
983 | [characterOffsetBegin] => 32
984 | [characterOffsetEnd] => 36
985 | [pos] => CD
986 | [ner] => TIME
987 | [normalizedNER] => T22:00
988 | [before] =>
989 | [after] =>
990 | [timex] => Array
991 | (
992 | [tid] => t1
993 | [type] => TIME
994 | [value] => T22:00
995 | )
996 |
997 | )
998 |
999 | )
1000 |
1001 | )
1002 |
1003 | )
1004 |
1005 | )
1006 |
1007 | ```
1008 |
1009 | ## Any questions?
1010 |
1011 | Please let me know.
1012 |
1013 |
1014 | ## Credits
1015 |
1016 | Some functions are forked from this "Stanford parser" package:
1017 | ```
1018 | https://github.com/agentile/PHP-Stanford-NLP
1019 | ```
1020 |
1021 |
--------------------------------------------------------------------------------
/bootstrap.php:
--------------------------------------------------------------------------------
1 | CoreNLP Adapter error: could not load "Composer" files.
'
48 | . '- Run "composer update" on the command line
'
49 | . '- If Composer is not installed, go to: install Composer
'; 34 | 35 | // show complete output 36 | headerText('The "Server Memory Object" (below) contains all the server output'); 37 | print_r($coreNLP->serverMemory); 38 | 39 | // first text tree 40 | headerText('FIRST TEXT: Part-Of-Speech tree'); 41 | print_r($coreNLP->trees[0]); 42 | 43 | // second text tree 44 | headerText('SECOND TEXT: Part-Of-Speech tree'); 45 | print_r($coreNLP->trees[1]); 46 | 47 | // get IDs for a tree 48 | headerText('EVERY TREE HAS UNIQUE IDs: this shows the Word-tree-IDs for the second tree'); 49 | print_r($coreNLP->getWordValues($coreNLP->trees[1])); 50 | 51 | // this is just a helper function for a nice header 52 | function headerText($header){ 53 | echo '
***'.str_repeat('*', strlen($header)).'***
'; 54 | echo '** '.$header.' **
'; 55 | echo '***'.str_repeat('*', strlen($header)).'***
'; 56 | } 57 | 58 | -------------------------------------------------------------------------------- /src/CoreNLP/CorenlpAdapter.php: -------------------------------------------------------------------------------- 1 | loadHTMLfile(ONLINE_URL.urlencode($text)); 29 | $pre = $doc->getElementsByTagName('pre')->item(0); 30 | $content = $pre->nodeValue; 31 | $string = htmlentities($content, null, 'utf-8'); 32 | $content = str_replace(" ", "", $string); 33 | $content = html_entity_decode($content); 34 | $this->serverRawOutput = $content; 35 | 36 | // get object with data 37 | $this->serverOutput = json_decode($this->serverRawOutput, true); // note: decodes into an array, not an object 38 | return; 39 | } 40 | 41 | /** 42 | * function getServerOutput: 43 | * - sends a request 44 | * - returns server output 45 | * 46 | * @param string $text 47 | * @return type 48 | */ 49 | public function getServerOutput($text){ 50 | 51 | if(USE_GUZZLE){ 52 | $this->getOutputGuzzle($text); 53 | } else { 54 | $this->getOutputCurl($text); 55 | } 56 | } 57 | 58 | public function getOutputCurl($text){ 59 | // create a shell command 60 | $command = 'curl --data "'.$text.'" "'.CURLURL.'"?properties={"'.CURLPROPERTIES.'"}'; 61 | 62 | try { 63 | // do the shell command 64 | $this->serverRawOutput = shell_exec($command); 65 | 66 | } catch (Exception $e) { 67 | echo 'Caught exception: ', $e->getMessage(), "\n"; 68 | } 69 | 70 | // get object with data 71 | $this->serverOutput = json_decode($this->serverRawOutput, true); // note: decodes into an array, not an object 72 | 73 | return; 74 | } 75 | 76 | public function getOutputGuzzle($text){ 77 | 78 | $client = new \GuzzleHttp\Client(); 79 | $res = $client->request('POST', CURLURL, [ 80 | 'body' => $text 81 | ]); 82 | 83 | $json = $res->getBody(); 84 | $this->serverOutput = json_decode($json, true); 85 | 86 | return; 87 | } 88 | 89 | 90 | /** 91 | * function getOutput 92 | * 93 | * - role: all-in-one function to make life easy for the user 94 | */ 95 | public function getOutput($text){ 96 | 97 | if(ONLINE_API){ 98 | // run the text through the public API 99 | $this->getServerOutputOnline($text); 100 | } else{ 101 | // run the text through Java CoreNLP 102 | $this->getServerOutput($text); 103 | } 104 | 105 | // cache result 106 | $this->serverMemory[] = $this->serverOutput; 107 | 108 | if(empty($this->serverOutput)){ 109 | echo '** ERROR: No output from the CoreNLP Server **
110 | - Check if the CoreNLP server is running. Start the CoreNLP server if necessary
111 | - Check if the port you are using (probably port 9000) is not blocked by another program
'; 112 | die; 113 | } 114 | 115 | /** 116 | * create trees 117 | */ 118 | $sentences = $this->serverOutput['sentences']; 119 | foreach($this->serverOutput['sentences'] as $sentence){ 120 | $tree = $this->getTreeWithTokens($sentence); // gets one tree 121 | $this->trees[] = $tree; // collect all trees 122 | } 123 | 124 | /** 125 | * add OpenIE data 126 | */ 127 | $this->addOpenIE(); 128 | 129 | // to get the trees just call $coreNLP->trees in the main program 130 | return; 131 | } 132 | 133 | 134 | /** 135 | * 136 | * MAIN PARSING FUNCTIONS 137 | * 138 | */ 139 | /** 140 | * Gets tree from parse 141 | * 142 | * @param string $parse 143 | * @return array 144 | */ 145 | public function getTree($parse){ 146 | 147 | $this->getSentenceTree($parse); // creates tree from parse, then saves tree in "mem" 148 | $result = $this->mem; // get tree from "mem" 149 | $this->resetSentenceTree(); // clear "mem" 150 | 151 | return (array) $result; 152 | } 153 | 154 | /** 155 | * Gets tree that combines depth/ parent information with the tokens 156 | * 157 | * @param array $sentence 158 | * @return array 159 | */ 160 | public function getTreeWithTokens($sentence){ 161 | 162 | $parse = $sentence['parse']; 163 | $tokens = $sentence['tokens']; 164 | 165 | // get simple tree 166 | $tree = $this->getTree($parse); 167 | 168 | // step 1: get tree key ID's for each of the words 169 | $treeWordKeys = $this->getWordKeys($tree); 170 | 171 | // step 2: change the keys of the token array to tree IDs 172 | $combinedTokens = array_combine(array_values($treeWordKeys), $tokens); 173 | 174 | // step 3: import the token array into the tree 175 | foreach($tree as $treeKey => $value){ 176 | if(array_key_exists($treeKey, $combinedTokens)){ 177 | $tokenItems = $combinedTokens[$treeKey]; 178 | 179 | foreach($tokenItems as $tokenKey => $token){ 180 | $tree[$treeKey][$tokenKey] = $token; 181 | } 182 | } 183 | } 184 | return $tree; 185 | } 186 | 187 | /** 188 | * helpers for SentenceTree 189 | */ 190 | private $mem; 191 | private $memId; 192 | private $memparent; 193 | private $iteratorDepth; 194 | private $memDepth; 195 | private $parentId; 196 | private $sentenceTree = array(); 197 | 198 | /** 199 | * resets SentenceTree 200 | */ 201 | private function resetSentenceTree(){ 202 | $this->mem = array(); 203 | $this->memId = 0; 204 | $this->memparent = array(); 205 | $this->iteratorDepth= 0; 206 | $this->memDepth = -1; 207 | $this->parentId = 0; 208 | $this->sentenceTree = array(); 209 | } 210 | 211 | /** 212 | * Takes one $sentence and creates a flat tree with: 213 | * - parentId 214 | * - penn Treebank Tag 215 | * - depth 216 | * - word value 217 | * 218 | * @param string $sentence 219 | */ 220 | public function getSentenceTree($sentence){ 221 | 222 | // parse the tree 223 | $this->sentenceTree = $this->runSentenceTree($sentence); 224 | 225 | $iterator = new RecursiveIteratorIterator( 226 | new RecursiveArrayIterator($this->sentenceTree)); 227 | 228 | for($iterator->next(); $iterator->valid(); $iterator->next()) 229 | { 230 | if(!is_array($iterator->current())){ 231 | 232 | $this->iteratorDepth = $iterator->getDepth(); 233 | 234 | if($this->iteratorDepth > $this->memDepth){ 235 | 236 | $this->depthShiftUp(); 237 | 238 | } else if($this->iteratorDepth < $this->memDepth){ 239 | 240 | $this->depthShiftDown(); 241 | 242 | } else { 243 | 244 | if($iterator->key() == 'pennTag'){ 245 | $this->memId++; 246 | $this->mem[$this->memId]['parent'] = $this->parentId; 247 | } 248 | } 249 | 250 | if($iterator->key() == 'pennTag'){ 251 | $this->mem[$this->memId]['pennTreebankTag'] = $iterator->current(); 252 | $this->mem[$this->memId]['depth'] = $this->iteratorDepth; 253 | } 254 | 255 | if($iterator->key() == 'word'){ 256 | $this->mem[$this->memId]['word'] = $iterator->current(); 257 | } 258 | } 259 | } 260 | } 261 | 262 | /** 263 | * helper for SentenceTree iteration 264 | */ 265 | private function depthShiftUp(){ 266 | 267 | // remember the parent 268 | $this->parentId = $this->memId; 269 | 270 | // set new id for iteration 271 | $this->memId++; 272 | 273 | // set parent 274 | $this->mem[$this->memId]['parent'] = $this->parentId; 275 | 276 | // remember parent 277 | $this->memparent[$this->memDepth] = $this->parentId; 278 | 279 | // set new depth 280 | $this->memDepth = $this->iteratorDepth; 281 | } 282 | 283 | /** 284 | * helper for SentenceTree iteration 285 | */ 286 | private function depthShiftDown(){ 287 | 288 | // set new id for iteration 289 | $this->memId++; 290 | 291 | // set new depth 292 | $this->memDepth = $this->iteratorDepth; 293 | 294 | // set new parent 295 | $this->parentId = ($this->memDepth)-2; 296 | 297 | // write parent to tree 298 | $this->mem[$this->memId]['parent'] = $this->memparent[$this->parentId] ; 299 | } 300 | 301 | /** 302 | * Creates tree for parsed sentence 303 | * Based on https://github.com/agentile/PHP-Stanford-NLP 304 | * 305 | * @param string $sentence 306 | * @return type 307 | */ 308 | private function runSentenceTree($sentence) 309 | { 310 | $arr = array('pennTag' => null); 311 | $length = strlen($sentence); 312 | $node = ''; 313 | $bracket= 1; 314 | 315 | for ($i = 1; $i < $length; $i++) { 316 | 317 | if ($sentence[$i] == '(') { 318 | $bracket += 1; 319 | $match_i = $this->getMatchingBracket($sentence, $i); 320 | $arr['children'][] = $this->runSentenceTree(substr($sentence, $i, ($match_i - $i) + 1)); 321 | $i = $match_i - 1; 322 | } else if ($sentence[$i] == ')') { 323 | $bracket -= 1; 324 | $tag_and_word = explode(' ', trim($node)); 325 | $arr['pennTag'] = $tag_and_word[0]; 326 | 327 | if (array_key_exists('1', $tag_and_word)){ 328 | $arr['word'] = $tag_and_word[1]; 329 | } 330 | 331 | } else { 332 | $node .= $sentence[$i]; 333 | } 334 | 335 | if ($bracket == 0) { 336 | return $arr; 337 | } 338 | } 339 | 340 | return $arr; 341 | } 342 | 343 | /** 344 | * Find the position of a matching closing bracket for a string opening bracket 345 | * 346 | * @param string $string 347 | * @param int $start_pos 348 | * @return type 349 | */ 350 | private function getMatchingBracket($string, $start_pos) 351 | { 352 | $length = strlen($string); 353 | $bracket = 1; 354 | foreach (range($start_pos + 1, $length) as $i) { 355 | if ($string[$i] == '(') { 356 | $bracket += 1; 357 | } else if ($string[$i] == ')') { 358 | $bracket -= 1; 359 | } 360 | if ($bracket == 0) { 361 | return $i; 362 | } 363 | } 364 | } 365 | 366 | /** 367 | * 368 | * OTHER PARSING FUNCTIONS 369 | * 370 | */ 371 | 372 | // Get an array that contains the keys to words within one tree 373 | public function getWordKeys($tree){ 374 | 375 | $result = array(); 376 | 377 | foreach ($tree as $wordId => $node){ 378 | if(array_key_exists('word', $node)){ 379 | $result[] = $wordId; 380 | } 381 | } 382 | return $result; 383 | } 384 | 385 | // Get an array with the tree leaves that contain words 386 | public function getWordValues($tree){ 387 | 388 | $result = array(); 389 | 390 | foreach ($tree as $wordId => $node){ 391 | if(array_key_exists('word', $node)){ 392 | $result[$wordId] = $node; 393 | } 394 | } 395 | return $result; 396 | } 397 | 398 | /** 399 | * OpenIE functions 400 | */ 401 | public function addOpenIE(){ 402 | 403 | foreach($this->serverOutput['sentences'] as $key => $sentence){ 404 | 405 | if(array_key_exists('openie', $sentence)){ 406 | 407 | $openIEs = $sentence['openie']; 408 | 409 | foreach($openIEs as $keyOpenIE => $openIE){ 410 | 411 | if(!empty($openIE)){ 412 | 413 | foreach($this->trees[$key] as &$node){ 414 | 415 | if(array_key_exists('index', $node)){ 416 | 417 | if( $node['index']-1 >= $openIE['subjectSpan'][0] && $node['index']-1 < $openIE['subjectSpan'][1] ){ 418 | $node['openIE'][$keyOpenIE] = 'subject'; 419 | } 420 | 421 | if( ($node['index']-1 >= $openIE['relationSpan'][0]) && $node['index']-1 < $openIE['relationSpan'][1] ){ 422 | $node['openIE'][$keyOpenIE] = 'relation'; 423 | } 424 | 425 | if( ($node['index']-1 >= $openIE['objectSpan'][0]) && $node['index']-1 < $openIE['objectSpan'][1] ){ 426 | $node['openIE'][$keyOpenIE] = 'object'; 427 | } 428 | } 429 | } 430 | } 431 | } 432 | } 433 | } 434 | } 435 | 436 | 437 | } 438 | --------------------------------------------------------------------------------