├── travis-install-php-mecab.sh ├── example-travis.yml ├── install-php-mecab.sh └── README.md /travis-install-php-mecab.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | set -ex 3 | wget https://github.com/rsky/php-mecab/archive/master.zip 4 | unzip master.zip 5 | cd php-mecab-master/mecab && phpize && ./configure --with-php-config=/usr/bin/php-config --with-mecab-config=/usr/bin/mecab-config && make && sudo make install 6 | echo "extension=.so" >> ~/.phpenv/versions/$(phpenv version-name)/etc/php.ini -------------------------------------------------------------------------------- /example-travis.yml: -------------------------------------------------------------------------------- 1 | language: php 2 | 3 | php: 4 | - 5.4 5 | - 5.5 6 | - 5.6 7 | - 7.0 8 | 9 | before_install: 10 | - sudo apt-get update -qq 11 | - sudo apt-get install -y mecab mecab-ipadic-utf8 12 | - sudo apt-get install php5-dev libmecab-dev build-essential 13 | 14 | before_script: 15 | - chmod +x install-php-mecab.sh 16 | 17 | script: 18 | - ./travis-install-php-mecab.sh 19 | 20 | install: composer install -------------------------------------------------------------------------------- /install-php-mecab.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | set -e 3 | 4 | GREEN="\033[0;32m" 5 | BLUE="\033[1;34m" 6 | RED="\033[0;31m" 7 | NC="\033[0m" 8 | 9 | START=$PWD 10 | 11 | HASMECAB=$(php -r 'echo extension_loaded("mecab");') 12 | 13 | OUTPUT=$(php --version) 14 | 15 | LONGVERSION=$(echo $OUTPUT | cut -d ' ' -f2 | cut -d '-' -f1) 16 | 17 | SHORTVERSION=${LONGVERSION%.*} 18 | 19 | echo $SHORTVERSION 20 | 21 | echo "${GREEN}Installing dependencies...${NC}" 22 | sudo apt-get install mecab mecab-ipadic-utf8 mecab-utils libmecab-dev unzip build-essential php${SHORTVERSION}-dev 23 | 24 | if [ $HASMECAB ] ; then 25 | echo "${BLUE}php-mecab is already installed.${NC}" 26 | 27 | elif [ -d /etc/php/$SHORTVERSION/mods-available ] ; then 28 | echo "${GREEN}Installing php-mecab...${NC}" 29 | wget https://github.com/nihongodera/php-mecab/archive/master.zip 30 | unzip master.zip 31 | cd php-mecab-master/mecab && phpize && ./configure --with-php-config=/usr/bin/php-config --with-mecab-config=/usr/bin/mecab-config && make && sudo make install 32 | cd /etc/php/$SHORTVERSION/mods-available 33 | sudo touch mecab.ini 34 | echo "extension=mecab.so" | sudo tee -a mecab.ini 35 | sudo phpenmod -v $SHORTVERSION mecab 36 | 37 | echo "${BLUE}Cleaning up...${NC}" 38 | 39 | cd $START 40 | rm master.zip 41 | rm -rf "php-mecab-master" 42 | 43 | else 44 | echo "${RED}Unable to install php-mecab.${NC}" 45 | exit 1 46 | 47 | fi 48 | echo "${GREEN}Install complete.${NC}" 49 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | > [!WARNING] 2 | > This repository will no longer be updated. This documentation has moved to my fork of the php-mecab project here: https://github.com/nihongodera/php-mecab. 3 | 4 | # php-mecab Documentation 5 | Documentation for the package [rsky/php-mecab](https://github.com/rsky/php-mecab). 6 | 7 | ## Contents 8 | - [Installation](#installation) 9 | - [Usage](#usage) 10 | - [Initialization](#initialization) 11 | - [Splitting Strings](#splitting-strings) 12 | - [Parsing Strings](#parsing-strings) 13 | - [Using Nodes](#using-nodes) 14 | - [Basic MeCab](#basic-mecab) 15 | - [Classes and Functions](#classes-and-functions) 16 | - [Classes](#classes) 17 | - [MeCab\Tagger](#mecab\tagger) 18 | - [MeCab\Node](#mecab\node) 19 | - [Mecab\Path](#mecab\path) 20 | - [MeCab\NodeIterator](#mecab\nodeiterator) 21 | - [Functions](#functions) 22 | - [Other Resources](#other-resources) 23 | - [Contributing](#contributing) 24 | 25 | ## Installation 26 | (Please note that I am a Linux user and have only tested the Linux installation guide. The Mac and Windows installation guides have been pieced together from other sources.) 27 | 28 | Ubuntu users can use the install script included in this repository to install mecab and php-mecab. 29 | Download the script: 30 | ``` 31 | curl -O https://raw.githubusercontent.com/nihongodera/php-mecab-documentation/master/install-php-mecab.sh 32 | ``` 33 | Make the file executable: 34 | ``` 35 | chmod +x install-php-mecab.sh 36 | ``` 37 | Execute the script: 38 | ``` 39 | ./install-php-mecab.sh 40 | ``` 41 | For information about what the script does, see [here](https://github.com/nihongodera/limelight/wiki/Install-Script). 42 | 43 | ### Install MeCab 44 | Before installing php-mecab, you must install MeCab. 45 | 46 | #### Linux 47 | Linux users can more than likely find MeCab in their distro repositories. Simply install 'mecab' and the package 'mecab-ipadic-utf8'. Ubuntu users can do this with the following command. 48 | ``` 49 | sudo apt-get install mecab mecab-ipadic-utf8 50 | ``` 51 | 52 | If that doesn't work, you can download the source and build it yourself. Note that this will require the package 'build-essential'. 53 | First pull in MeCab. 54 | ``` 55 | wget https://mecab.googlecode.com/files/mecab-0.996.tar.gz 56 | tar zxfv mecab-0.996.tar.gz 57 | cd mecab-0.996 58 | ./configure --with-charset=utf8 --enable-utf8-only 59 | ``` 60 | Then get the dictionary file. 61 | ``` 62 | wget https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz 63 | tar zxfv mecab-ipadic-2.7.0-20070801.tar.gz 64 | cd mecab-ipadic-2.7.0-20070801 65 | ./configure --with-charset=utf8 66 | ``` 67 | 68 | #### Mac OS X 69 | Both MeCab and the required dictionary (mecab-ipadic-utf8) are in MacPorts. If that doesn't work, try downloading the source and building it yourself. You can get the source and the dictionary from the following urls: 70 | https://mecab.googlecode.com/files/mecab-0.996.tar.gz 71 | https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz 72 | 73 | I believe you can build these files with Xcode. Somebody correct me if I'm wrong. 74 | 75 | #### Windows 76 | Download the installer from this url: https://mecab.googlecode.com/files/mecab-0.996.exe 77 | 78 | ### Install php-mecab 79 | First, verify that you have MeCab on your computer by testing it in the command line. Type `mecab` and if you don't get an error, things are looking good. If you get an error that looks something like this `param.cpp(69) [ifs] no such file or directory: /usr/local/lib/mecab/dic/ipadic/dicrc` you need to find your dictionary file and pass it as a parameter. The directory is called 'ipadic-utf8' and needs to contain a file called 'unk.dic'. 80 | ``` 81 | mecab --dicdir=/path/to/dictionary/dic/ipadic/ 82 | ``` 83 | Once you get mecab to start, type some Japanese and make sure you get an appropriate response. 84 | ``` 85 | ~$ mecab 86 | やった! 87 | やっ 動詞,自立,*,*,五段・ラ行,連用タ接続,やる,ヤッ,ヤッ 88 | た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ 89 | ! 記号,一般,*,*,*,*,!,!,! 90 | EOS 91 | ``` 92 | 93 | #### Linux 94 | Install the following dependencies: 95 | php5: 96 | php5-dev 97 | libmecab-dev 98 | build-essential 99 | ``` 100 | sudo apt-get install php5-dev libmecab-dev build-essential 101 | ``` 102 | php7: 103 | php7.0-dev 104 | libmecab-dev 105 | build-essential 106 | ``` 107 | sudo apt-get install php7.0-dev libmecab-dev build-essential 108 | ``` 109 | 110 | Download the php-mecab source. 111 | ``` 112 | wget https://github.com/rsky/php-mecab/archive/master.zip 113 | ``` 114 | 115 | You will need to find the package 'mecab-config'. It is usually located at /usr/bin/mecab-config, but check to make sure. Let's use 'locate' because its easy. 116 | ``` 117 | sudo updatedb 118 | locate mecab-config 119 | ``` 120 | That should give you a path that looks something like /usr/bin/mecab-config. 121 | We should now be ready to build our package. Put your mecab-config path after the --with-mecab-config option. 122 | ``` 123 | unzip master.zip 124 | cd php-mecab-master/mecab 125 | phpize 126 | sudo ./configure --with-php-config=/usr/bin/php-config --with-mecab-config=/path/to/mecab-config 127 | sudo make 128 | sudo make install 129 | ``` 130 | 131 | Occasionally, configure will fail and throw the following error: 132 | ``` 133 | configure: error: wrong MeCab library version or lib not found. Check config.log for more information 134 | ``` 135 | 136 | This usually happens when mecab didn't install properly. To fix this, purge all mecab packages: 137 | ``` 138 | sudo apt-get --purge remove mecab mecab-ipadic-utf8 mecab-utils libmecab-dev 139 | ``` 140 | This will often not remove all the binaries so you may have to manually go into bin and remove them yourself. 141 | ``` 142 | sudo rm /usr/local/bin/mecab 143 | sudo rm /usr/local/bin/mecab-config 144 | ``` 145 | Then, reinstall everything: 146 | ``` 147 | sudo apt-get install mecab mecab-ipadic-utf8 mecab-utils libmecab-dev 148 | ``` 149 | 150 | After completing this step, you should have a mecab.so. Go to /usr/lib/php5/ and find the package with a name that looks is similar to this: 20131226. Have a look in that file and mecab.so should be in there. 151 | 152 | We now just need to enable the mod. 153 | For php5: 154 | Move to /etc/php5/mods-available/ 155 | ``` 156 | cd /etc/php5/mods-available/ 157 | ``` 158 | Next, create a new .ini file for mecab. 159 | ``` 160 | sudo touch mecab.ini 161 | echo "extension=mecab.so" | sudo tee -a mecab.ini 162 | ``` 163 | And then we need to activate the module. 164 | ``` 165 | sudo php5enmod mecab 166 | ``` 167 | ___ 168 | 169 | For php7: 170 | Move to /etc/php/php7.0/mods-available/ 171 | ``` 172 | cd /etc/php/php7.0/mods-available/ 173 | ``` 174 | Next, create a new .ini file for mecab. 175 | ``` 176 | sudo touch mecab.ini 177 | echo "extension=mecab.so" | sudo tee -a mecab.ini 178 | ``` 179 | And then we need to activate the module. 180 | ``` 181 | sudo phpenmod -v 7.0 mecab 182 | ``` 183 | ___ 184 | 185 | Once this is done, you simply need to restart your web server. 186 | For Apache: 187 | ``` 188 | sudo service apache2 restart 189 | ``` 190 | And for nginx: 191 | ``` 192 | sudo service nginx restart 193 | ``` 194 | 195 | You should be ready to go. 196 | 197 | #### Mac OS X 198 | Instructions should be the same as for Linux, but you may require the package xcode in order to properly compile the source code. 199 | 200 | #### Windows 201 | Installing php-mecab is the same as installing any other php extension. The following guide may be of use: 202 | http://php.net/manual/en/install.windows.extensions.php 203 | 204 | According to one of the php-mecab readme files: 205 | >The extension provides the VisualStudio V6 project file mecab.dsp. 206 | >To compile the extension you open this file using VisualStudio, 207 | >select the apropriate configuration for your installation 208 | >(either "Release_TS" or "Debug_TS") and create "php_mecab.dll" 209 | > 210 | >After successfull compilation you have to copy the newly 211 | >created "php_mecab.dll" to the PHP 212 | >extension directory (default: C:\PHP\extensions). 213 | 214 | [Top](#contents) 215 | 216 | ## Usage 217 | php-mecab can be used functionally or as an object. I prefer the OOP approach, but I will try to cover both approaches in this guide. Note that as of version 0.6.0, the procedural functions will not work in php 7. 218 | - [Initialization](#initialization) 219 | - [Splitting Strings](#splitting-strings) 220 | - [Parsing Strings](#parsing-strings) 221 | - [Using Nodes](#using-nodes) 222 | - [Basic MeCab](#basic-mecab) 223 | 224 | ### Initialization 225 | MeCab sometimes requires a dictionary directory to be passed to it on initialization. The location of the directory seems to vary by system, so find 'ipadic-utf8' on your system and pass the full folder path. Often, there will be more than one 'ipadic-utf8' folders on a system. Make sure the one you use contains a file called 'unk.dic'. Without this, mecab will fail to initialize. Pass the the dictionary directory to MeCab with the console flag '-d' in an array. 226 | 227 | The options passed to MeCab are the same as the options used in the command line program. Send them to the constructor in an array. Check the man page for MeCab for all available options. 228 | 229 | #### Object Orientated 230 | New up a [MeCab\Tagger](#__constructarguments-persistent) object. 231 | Version 0.6.0: 232 | ```php 233 | $mecab = new \MeCab\Tagger(); 234 | ``` 235 | Earlier versions: 236 | ```php 237 | $mecab = new \MeCab_Tagger(); 238 | ``` 239 | If it does't work, or you get an error, try passing the array containing the command line flag '-d' and a dictionary folder path to it as a parameter. 240 | ```php 241 | $mecab = new \MeCab\Tagger(['-d', '/path/to/dictionary/mecab/dic/ipadic-utf8']); 242 | ``` 243 | The variable $mecab will be a [MeCab\Tagger](#mecab\tagger) object. 244 | **Throughout this guide, when I refer to *$mecab* in the object orientated sections, it will be a Tagger object.** 245 | 246 | #### Functional 247 | Use the function [mecab_new()](#mecab_newarguments-persistent) to get a mecab resource. As with the Object Orientated approach, you may or may not have to pass it a dictionary directory. 248 | ```php 249 | $mecab = mecab_new(['-d', '/path/to/dictionary/mecab/dic/ipadic-utf8']); 250 | ``` 251 | The $mecab variable will be a resource of type 'mecab'. 252 | **Throughout this guide, when I refer to *$mecab* in the functional sections, it will be a MeCab resource.** 253 | 254 | [Top](#contents) 255 | 256 | ### Splitting Strings 257 | Split methods only split a string into an array of morphemes. They provide no information about the morphemes. 258 | 259 | #### Object Orientated 260 | As of version 0.6.0, the split method is no longer on the Tagger object. The following only applies to previous versions. 261 | The [split()](#splitstring-dic_dir-user_dic-filter-persistent-static) method is static and so does not require an instance of Tagger. It might, however, need the dictionary directory path to be passed as an argument in order to function. 262 | ```php 263 | $split = \Mecab_Tagger::split('眠いです'); 264 | ``` 265 | Or if that doesnt work. 266 | ```php 267 | $split = \Mecab_Tagger::split('眠いです', '/path/to/dictionary/mecab/dic/ipadic-utf8'); 268 | 269 | print_r($split); 270 | 271 | // Results 272 | Array 273 | ( 274 | [0] => 眠い 275 | [1] => です 276 | ) 277 | 278 | ``` 279 | 280 | If you have an instance of MeCab\Tagger you can also call the method on the object. You will still need to pass the dictionary directory. 281 | ```php 282 | $split = $mecab->split('たこ焼きが食べたい'); 283 | 284 | print_r($split); 285 | 286 | // Results 287 | Array 288 | ( 289 | [0] => たこ焼き 290 | [1] => が 291 | [2] => 食べ 292 | [3] => たい 293 | ) 294 | 295 | ``` 296 | 297 | #### Functional 298 | Use the funtion [mecab_split()](#mecab_splitstring-dic_dir-user_dic-filter-persistent). It may or may not require the dictionary directory to be passed. 299 | ```php 300 | $split = mecab_split('パンダをいくらで買いますか'); 301 | ``` 302 | Or.... 303 | ```php 304 | $split = mecab_split('パンダをいくらで買いますか', '/path/to/dictionary/mecab/dic/ipadic-utf8'); 305 | 306 | print_r($split); 307 | 308 | // Results 309 | Array 310 | ( 311 | [0] => パンダ 312 | [1] => を 313 | [2] => いくら 314 | [3] => で 315 | [4] => 買い 316 | [5] => ます 317 | [6] => か 318 | ) 319 | ``` 320 | 321 | [Top](#contents) 322 | 323 | ### Parsing Strings 324 | MeCab will parse strings of Japanese text and return results in either string form or as a MeCab\Node. The MeCab\Node class seems a little awkward and difficult to deal with at first, but they give the user a lot of power and make parsing results a little easier. 325 | 326 | #### Object Orientated 327 | To parse a string and get results in string form, a couple options exist. The first is the [parse()](#parsestring-length-output_length) method. 328 | ```php 329 | $results = $mecab->parse('チョコレートがやめられない'); 330 | 331 | echo $results; 332 | 333 | // Results 334 | チョコレート 名詞,一般,*,*,*,*,チョコレート,チョコレート,チョコレート 335 | が 助詞,格助詞,一般,*,*,*,が,ガ,ガ 336 | やめ 動詞,自立,*,*,一段,未然形,やめる,ヤメ,ヤメ 337 | られ 動詞,接尾,*,*,一段,未然形,られる,ラレ,ラレ 338 | ない 助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ 339 | EOS 340 | ``` 341 | 342 | You could also use the [parseToString()](#parsetostringstring-length-output_length) method which produces the exact same results. 343 | ```php 344 | $results = $mecab->parseToString('チョコレートがやめられない'); 345 | 346 | echo $results; 347 | 348 | // Results 349 | チョコレート 名詞,一般,*,*,*,*,チョコレート,チョコレート,チョコレート 350 | が 助詞,格助詞,一般,*,*,*,が,ガ,ガ 351 | やめ 動詞,自立,*,*,一段,未然形,やめる,ヤメ,ヤメ 352 | られ 動詞,接尾,*,*,一段,未然形,られる,ラレ,ラレ 353 | ない 助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ 354 | EOS 355 | ``` 356 | 357 | To get results in node form, use [parseToNode()](#parsetonodestring-length). 358 | ```php 359 | $node = $mecab->parseToNode('ご飯作りたくない'); 360 | 361 | var_dump($node); 362 | 363 | // Results 364 | object(MeCab\Node) (0) { 365 | } 366 | ``` 367 | 368 | #### Functional 369 | To get results as a string, use the function [mecab_sparse_tostr()](#mecab_sparse_tostrmecab-string-length-output_length). 370 | ```php 371 | $node = mecab_sparse_tostr($mecab, 'パンダいらないよね'); 372 | 373 | echo $node; 374 | 375 | // Results 376 | パンダ 名詞,一般,*,*,*,*,パンダ,パンダ,パンダ 377 | いら 動詞,自立,*,*,五段・ラ行,未然形,いる,イラ,イラ 378 | ない 助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ 379 | よ 助詞,終助詞,*,*,*,*,よ,ヨ,ヨ 380 | ね 助詞,終助詞,*,*,*,*,ね,ネ,ネ 381 | EOS 382 | ``` 383 | 384 | For node results, use [mecab_sparse_tonode()](#mecab_sparse_tonodemecab-string-length). 385 | ```php 386 | $node = mecab_sparse_tonode($mecab, 'これ長くなってる'); 387 | 388 | var_dump($node); 389 | 390 | // Results 391 | resource(5) of type (node) 392 | ``` 393 | 394 | [Top](#contents) 395 | 396 | ###Using Nodes 397 | Nodes make it easy to access the information MeCab provides and give users powerful ways to navigate through results. 398 | 399 | The node returned from the parseToNode() methods discussed in the previous section is the first node in the series and only represents the first morpheme. In order to get information about the entire string, it is necessary to walk through all the nodes in the series. But before we tackle that, lets take a quick look at some of more useful methods we have at our disposal. 400 | 401 | #### Object Orientated 402 | - [getPrev()](#getprev): Get the previous node in the series. 403 | - [getNext()](#getnext): Get the next node in the series. 404 | - [getSurface()](#getsurface): Get the surface (the original morpheme) of the node. 405 | - [getFeature()](#getfeature): Get the feature (the MeCab info) of the node. 406 | - [getLength()](#getlength): Get the length of the node's surface. 407 | - [toArray()](#toarraydump_all): Get all the node's elements as an associative array. 408 | 409 | #### Functional 410 | - [mecab_node_prev()](#mecab_node_prevnode): Get the previous node in the series. 411 | - [mecab_node_next()](#mecab_node_nextnode): Get the next node in the series. 412 | - [mecab_node_surface()](#mecab_node_surfacenode): Get the surface (the original morpheme) of the node. 413 | - [mecab_node_feature()](#mecab_node_featurenode): Get the feature (the MeCab info) of the node. 414 | - [mecab_node_length()](#mecab_node_lengthnode): Get the length of the node's surface. 415 | - [mecab_node_toarray()](#mecab_node_toarraynode-dump_all): Get all the node's elements as an associative array. 416 | 417 | There are several other methods available, but these are the most useful at this point. For a full list of methods, see the [Classes and Functions](#classes-and-functions) section of this guide. 418 | So let's see how we can walk through the nodes and extract the information we need. 419 | 420 | #### Object Orientated 421 | You can go about this a couple ways. 422 | The first way simply walks through the nodes with a foreach loop. 423 | ```php 424 | $node = $mecab->parseToNode('カレーライスにしようかな'); 425 | 426 | foreach ($node as $n) { 427 | echo $n->getFeature() . "\n"; 428 | } 429 | 430 | // Results 431 | BOS/EOS,*,*,*,*,*,*,*,* 432 | 名詞,一般,*,*,*,*,カレーライス,カレーライス,カレーライス 433 | 助詞,格助詞,一般,*,*,*,に,ニ,ニ 434 | 動詞,自立,*,*,サ変・スル,未然ウ接続,する,シヨ,シヨ 435 | 助動詞,*,*,*,不変化型,基本形,う,ウ,ウ 436 | 助詞,副助詞/並立助詞/終助詞,*,*,*,*,か,カ,カ 437 | 助詞,終助詞,*,*,*,*,な,ナ,ナ 438 | BOS/EOS,*,*,*,*,*,*,*,* 439 | ``` 440 | This isn't necessairly a bad way to do it, but it's a little too magical for my liking. If $node is the first node in the series (and it is, you can var_dump and verify this), it doesn't make sense to loop through each $node as $n where $node is a single node and $n is also a single node. Instead, I prefer to use MeCab\Node's methods to explicitly define what I am doing. 441 | ```php 442 | $node = $mecab->parseToNode('これの方がいい'); 443 | 444 | do { 445 | echo $node->getFeature() . "\n"; 446 | } while ($node = $node->getNext()); 447 | 448 | // Results 449 | BOS/EOS,*,*,*,*,*,*,*,* 450 | 名詞,代名詞,一般,*,*,*,これ,コレ,コレ 451 | 助詞,連体化,*,*,*,*,の,ノ,ノ 452 | 名詞,非自立,一般,*,*,*,方,ホウ,ホー 453 | 助詞,格助詞,一般,*,*,*,が,ガ,ガ 454 | 形容詞,自立,*,*,形容詞・イイ,基本形,いい,イイ,イイ 455 | BOS/EOS,*,*,*,*,*,*,*,* 456 | ``` 457 | We can extract the logic to a general purpose looping function. 458 | 459 | ```php 460 | function walkThroughNodes(\Mecab\Node $node, $callback) 461 | { 462 | do { 463 | $callback($node); 464 | } while ($node = $node->getNext()); 465 | } 466 | ``` 467 | We can then pass our walkThroughNodes function a closure to tell it what to do with each node. 468 | ```php 469 | $node = $mecab->parseToNode('これの方がいい'); 470 | 471 | walkThroughNodes($node, function($node) { 472 | echo $node->getSurface() . "\n"; 473 | }); 474 | 475 | // Results 476 | 477 | これ 478 | の 479 | 方 480 | が 481 | いい 482 | 483 | ``` 484 | Now we have never have to worry about a basic walkthough again. We can simply pass our walkThroughNodes function a node and a callback. 485 | 486 | #### Functional 487 | As mentioned in the Object Orientated section above, we can simply walk through the nodes with a foreach loop, but I don't like that approach. Instead, lets use MeCab's nodes to our advantage. 488 | ```php 489 | $node = mecab_sparse_tonode($mecab, 'ビール飲みたい'); 490 | 491 | do { 492 | echo mecab_node_surface($node) . "\n"; 493 | } while ($node = mecab_node_next($node)); 494 | 495 | // Results 496 | 497 | ビール 498 | 飲み 499 | たい 500 | 501 | ``` 502 | Like we did in the Object Orientated section, lets extract this to a function that we can send a callback to. 503 | ```php 504 | function walkThroughNodes($node, $callback) 505 | { 506 | do { 507 | $callback($node); 508 | } while ($node = mecab_node_next($node)); 509 | } 510 | ``` 511 | We can cuse our walkThroughNodes function like this. 512 | ```php 513 | $node = mecab_sparse_tonode($mecab, 'ビール飲みたい'); 514 | 515 | walkThroughNodes($node, function ($node) { 516 | echo mecab_node_surface($node) . "\n"; 517 | }); 518 | 519 | // Results 520 | 521 | ビール 522 | 飲み 523 | たい 524 | 525 | ``` 526 | 527 | ### Basic MeCab 528 | Now that we can extract information from Japanese strings using MeCab and php-mecab, let's take a quick look at what this information means. 529 | ```php 530 | $mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']); 531 | 532 | $string = $mecab->parseToString('行く'); 533 | 534 | echo $string; 535 | 536 | // Results 537 | 行く 動詞,自立,*,*,五段・カ行促音便,基本形,行く,イク,イク 538 | EOS 539 | ``` 540 | Commonly in MeCab you will see BOS and EOS. These mean 'Beginning of Sentence' and 'End of Sentence', respectively. 541 | In output lines, there are generally two parts, the surface and the feature. The surface is the original morpheme and the feature is MeCab info. In our case, '行く' is the surface and '動詞,自立,*,*,五段・カ行促音便,基本形,行く,イク,イク' is the feature. Remember you can use nodes to easily extract this information. 542 | 543 | The feature is a comma seperated string with nine sections. 544 | Section 1: Main part of speech category 545 | Section 2: Part of speech sub-category 546 | Section 3: Part of speech sub-category 547 | Section 4: Part of speech sub-category 548 | Section 5: Inflection type 549 | Section 6: Inflection form 550 | Section 7: Lemma (the root word found in the dictionary) 551 | Section 8: Reading 552 | Section 9: Pronunciation 553 | 554 | In our example: 555 | ```php 556 | print_r(explode(',', '動詞,自立,*,*,五段・カ行促音便,基本形,行く,イク,イク')); 557 | 558 | // Results 559 | [0] => 動詞 // Main part of speech category 560 | [1] => 自立 // Part of speech sub-category 561 | [2] => * // Part of speech sub-category (none) 562 | [3] => * // Part of speech sub-category (none) 563 | [4] => 五段・カ行促音便 // Inflection type 564 | [5] => 基本形 // Inflection form 565 | [6] => 行く // Lemma (the root word found in the dictionary) 566 | [7] => イク // Reading 567 | [8] => イク // Pronunciation 568 | ``` 569 | 570 | What you do with this information is up to you! 571 | 572 | [Top](#contents) 573 | 574 | ## Classes and Functions 575 | ### Classes 576 | - [MeCab\Tagger](#mecab\tagger) 577 | - [MeCab\Node](#mecab\node) 578 | - [Mecab\Path](#mecab\path) 579 | - [MeCab\NodeIterator](#mecab\nodeiterator) 580 | 581 | #### MeCab\Tagger 582 | Main class used to parse text. 583 | ##### Methods 584 | - [version()](#version-static) 585 | - [split()](#splitstring-dic_dir-user_dic-filter-persistent-static) 586 | - [__construct()](#__constructarguments-persistent) 587 | - [getPartial()](#getpartial) 588 | - [setPartial()](#setpartialbool) 589 | - [getTheta()](#gettheta) 590 | - [setTheta()](#setthetatheta) 591 | - [getLatticeLevel()](#getlatticelevel) 592 | - [setLatticeLevel()](#setlatticelevellevel) 593 | - [getAllMorphs()](#getallmorphs) 594 | - [setAllMorphs()](#setallmorphsbool) 595 | - [parse()](#parsestring-length-output_length) 596 | - [parseToString()](#parsetostringstring-length-output_length) 597 | - [parseToNode()](#parsetonodestring-length) 598 | - [parseNBest()](#parsenbestn-string-length-output_length) 599 | - [parseNBestInit()](#parsenbestinitstring-length) 600 | - [next()](#nextoutput_length) 601 | - [nextNode()](#nextnode) 602 | - [formatNode()](#formatnodenode) 603 | - [dictionaryInfo()](#dictionaryinfo) 604 | 605 | ###### version() [static] 606 | Return Mecab version. 607 | ```php 608 | /** 609 | * @return string 610 | */ 611 | ``` 612 | 613 | ###### split($string, $dic_dir, $user_dic, $filter, $persistent) [static] 614 | Only on versions prior to 0.6.0. 615 | Split string into array of morphemes. Usually requires the dictionary directory to be passed as a parameter. 616 | ```php 617 | /** 618 | * @param string $string String to split. 619 | * @param string $dic_dir Path to dictionary directory. (Optional) 620 | * @param string $user_dic Path to user dictionary. (Optional) 621 | * @param callback $filter Filter function or method. (Optional) 622 | * @param boolean $persistent (Optional) 623 | * 624 | * @return array 625 | */ 626 | ``` 627 | Example 628 | ```php 629 | $mecab = new \Mecab_Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']); 630 | 631 | $array = $mecab::split('行きます', '/var/lib/mecab/dic/ipadic-utf8'); 632 | 633 | print_r($array); 634 | 635 | Array 636 | ( 637 | [0] => 行き 638 | [1] => ます 639 | ) 640 | ``` 641 | 642 | ###### __construct($arguments, $persistent) 643 | Construct class instance. 644 | ```php 645 | /** 646 | * @param array $arguments Command line arguments. 647 | * @param boolean $persistent (Optional) 648 | * 649 | * @return MeCab\Tagger 650 | */ 651 | ``` 652 | 653 | ###### getPartial() 654 | Get current partial parsing mode state. 655 | ```php 656 | /** 657 | * @return boolean 658 | */ 659 | ``` 660 | 661 | ###### setPartial($bool) 662 | Set partial parsing mode. 663 | ```php 664 | /** 665 | * @param boolean $bool Partial parsing mode. 666 | */ 667 | ``` 668 | 669 | ###### getTheta() 670 | Get current temparature parameter theta. 671 | ```php 672 | /** 673 | * @return float 674 | */ 675 | ``` 676 | 677 | ###### setTheta($theta) 678 | Set temparature parameter theta. 679 | ```php 680 | /** 681 | * @param float/int $theta Temparature parameter theta. 682 | */ 683 | ``` 684 | 685 | ###### getLatticeLevel() 686 | Get current lattice level. 687 | ```php 688 | /** 689 | * @return int 690 | */ 691 | ``` 692 | 693 | ###### setLatticeLevel($level) 694 | Set lattice level. 695 | ```php 696 | /** 697 | * @param int $level Lattice level. 698 | */ 699 | ``` 700 | 701 | ###### getAllMorphs() 702 | Get all-morphs output mode. 703 | ```php 704 | /** 705 | * @return bool 706 | */ 707 | ``` 708 | 709 | ###### setAllMorphs($bool) 710 | Set all-morphs output mode. 711 | ```php 712 | /** 713 | * @param bool $bool All-morphs output mode. 714 | */ 715 | ``` 716 | 717 | ###### parse($string, $length, $output_length) 718 | Parse string and output results as string. 719 | ```php 720 | /** 721 | * @param string $string String to be parsed. 722 | * @param int $length Length to be analyzed. (Optional) 723 | * @param int $output_length Maximum length of output. (Optional) 724 | * 725 | * @return string 726 | */ 727 | ``` 728 | Example 729 | ```php 730 | $mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']); 731 | 732 | $string = $mecab->parse('行きます'); 733 | 734 | print_r($string); 735 | 736 | 行き 動詞,自立,*,*,五段・カ行促音便,連用形,行く,イキ,イキ 737 | ます 助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス 738 | EOS 739 | ``` 740 | 741 | ###### parseToString($string, $length, $output_length) 742 | Parse string and output results as string. 743 | ```php 744 | /** 745 | * @param string $string String to be parsed. 746 | * @param int $length Length to be analyzed. (Optional) 747 | * @param int $output_length Maximum length of output. (Optional) 748 | * 749 | * @return string 750 | */ 751 | ``` 752 | Example 753 | ```php 754 | $mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']); 755 | 756 | $string = $mecab->parseToString('行きます'); 757 | 758 | print_r($string); 759 | 760 | 行き 動詞,自立,*,*,五段・カ行促音便,連用形,行く,イキ,イキ 761 | ます 助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス 762 | EOS 763 | ``` 764 | 765 | ###### parseToNode($string, $length) 766 | Parse string and output results as MeCab/Node. 767 | ```php 768 | /** 769 | * @param string $string String to be parsed. 770 | * @param int $length Length to be analyzed. (Optional) 771 | * 772 | * @return MeCab/Node 773 | */ 774 | ``` 775 | Example 776 | ```php 777 | $mecab = new \Mecab\Tagger(['-d', '/var/lib/mecab/dic/ipadic-utf8']); 778 | 779 | $node = $mecab->parseToNode('行きます'); 780 | 781 | print_r($node->toArray()); 782 | 783 | Array 784 | ( 785 | [surface] => 786 | [feature] => BOS/EOS,*,*,*,*,*,*,*,* 787 | [id] => 0 788 | [length] => 0 789 | [rlength] => 0 790 | [rcAttr] => 0 791 | [lcAttr] => 0 792 | [posid] => 0 793 | [char_type] => 0 794 | [stat] => 2 795 | [isbest] => 1 796 | [alpha] => 0 797 | [beta] => 0 798 | [prob] => 0 799 | [wcost] => 0 800 | [cost] => 0 801 | ) 802 | ``` 803 | 804 | ###### parseNBest($n, $string, $length, $output_length) 805 | Parse given sentence and output N-best results as string. This method causes seg faults for me. 806 | ```php 807 | /** 808 | * @param int $n Number of results to obtain. 809 | * @param string $string String to be parsed. 810 | * @param int $length Length to be analyzed. (Optional) 811 | * @param int $output_length Maximum length of output. (Optional) 812 | * 813 | * @return string 814 | */ 815 | ``` 816 | 817 | ###### parseNBestInit($string, $length) 818 | Initialize N-best enumeration with a sentence. 819 | ```php 820 | /** 821 | * @param string $string String to be parsed. 822 | * @param int $length Length to be analyzed. (Optional) 823 | 824 | * @return boolean 825 | */ 826 | ``` 827 | 828 | ###### next($output_length) 829 | Get the next result of N-Best as a string. 830 | ```php 831 | /** 832 | * @param int $output_length Maximum length of output. (Optional) 833 | * 834 | * @return string 835 | */ 836 | ``` 837 | 838 | ###### nextNode() 839 | Get the next result of N-Best as a node. 840 | ```php 841 | /** 842 | * @return MeCab\Node 843 | */ 844 | ``` 845 | 846 | ###### formatNode($node) 847 | Format a node to a string. 848 | ```php 849 | /** 850 | * @param MeCab\Node $node Node to be formatted. 851 | * 852 | * @return string 853 | */ 854 | ``` 855 | 856 | ###### dictionaryInfo() 857 | Return array of dictionary info. 858 | ```php 859 | /** 860 | * @return array 861 | */ 862 | ``` 863 | 864 | [Top](#contents) 865 | 866 | #### Mecab/Node 867 | Returned by parseToNode method on Mecab\Tagger. 868 | ##### Methods 869 | - [getIterator()](#getiterator) 870 | - [setTraverse()](#settraversemode) 871 | - [getPrev()](#getprev) 872 | - [getNext()](#getnext) 873 | - [getENext()](#getenext) 874 | - [getBNext()](#getbnext) 875 | - [getRPath()](#getrpath) 876 | - [getLPath()](#getlpath) 877 | - [getSurface()](#getsurface) 878 | - [getFeature()](#getfeature) 879 | - [getId()](#getid) 880 | - [getLength()](#getlength) 881 | - [getRLength()](#getrlength) 882 | - [getRcAttr()](#getrcattr) 883 | - [getLcAttr()](#getlcattr) 884 | - [getPosId()](#getposid) 885 | - [getCharType()](#getchartype) 886 | - [getStat()](#getstat) 887 | - [getAlpha()](#getalpha) 888 | - [getBeta()](#getbeta) 889 | - [getWCost()](#getwcost) 890 | - [getCost()](#getcost) 891 | - [getProb()](#getprob) 892 | - [isBest()](#isbest) 893 | - [toArray()](#toarraydump_all) 894 | - [toString()](#tostring) 895 | 896 | ###### getIterator() 897 | Return MeCab\NodeIterator. 898 | ```php 899 | /** 900 | * @return MeCab\NodeIterator 901 | */ 902 | ``` 903 | 904 | ###### setTraverse($mode) 905 | Set the traverse mode. 906 | ```php 907 | /** 908 | * @param long $mode Traverse mode. 909 | */ 910 | ``` 911 | 912 | ###### getPrev() 913 | Get the previous node. Return NULL if none. 914 | ```php 915 | /** 916 | * @return MeCab\Node 917 | */ 918 | ``` 919 | 920 | ###### getNext() 921 | Get the next node. Return NULL if none. 922 | ```php 923 | /** 924 | * @return MeCab\Node 925 | */ 926 | ``` 927 | 928 | ###### getENext() 929 | Get the next node which has same end point as the given node. Return NULL if none. 930 | ```php 931 | /** 932 | * @return MeCab\Node 933 | */ 934 | ``` 935 | 936 | ###### getBNext() 937 | Get the next node which has same beginning point as the given node. Return NULL if none. 938 | ```php 939 | /** 940 | * @return MeCab\Node 941 | */ 942 | ``` 943 | 944 | ###### getRPath() 945 | Get the next node which has same end point as the given node. Return NULL if none. 946 | ```php 947 | /** 948 | * @return MeCab\Path 949 | */ 950 | ``` 951 | 952 | ###### getLPath() 953 | Get the next node which has same beginning point as the given node. Return NULL if none. 954 | ```php 955 | /** 956 | * @return MeCab\Path 957 | */ 958 | ``` 959 | 960 | ###### getSurface() 961 | Get the surface of the node. 962 | ```php 963 | /** 964 | * @return string 965 | */ 966 | ``` 967 | 968 | ###### getFeature() 969 | Get the feature of the node. 970 | ```php 971 | /** 972 | * @return string 973 | */ 974 | ``` 975 | 976 | ###### getId() 977 | Get the ID of the node. 978 | ```php 979 | /** 980 | * @return int 981 | */ 982 | ``` 983 | 984 | ###### getLength() 985 | Get the length of the node's surface. 986 | ```php 987 | /** 988 | * @return int 989 | */ 990 | ``` 991 | 992 | ###### getRLength() 993 | Get the length of the node's surface including it's leading whitespace. 994 | ```php 995 | /** 996 | * @return int 997 | */ 998 | ``` 999 | 1000 | ###### getRcAttr() 1001 | Get the ID of the right context. 1002 | ```php 1003 | /** 1004 | * @return int 1005 | */ 1006 | ``` 1007 | 1008 | ###### getLcAttr() 1009 | Get the ID of the left context. 1010 | ```php 1011 | /** 1012 | * @return int 1013 | */ 1014 | ``` 1015 | 1016 | ###### getPosId() 1017 | Get the ID of the part of speech. 1018 | ```php 1019 | /** 1020 | * @return int 1021 | */ 1022 | ``` 1023 | 1024 | ###### getCharType() 1025 | Get the type of character. 1026 | ```php 1027 | /** 1028 | * @return int 1029 | */ 1030 | ``` 1031 | 1032 | ###### getStat() 1033 | Get the status of the node. 1034 | ```php 1035 | /** 1036 | * @return int 1037 | */ 1038 | ``` 1039 | 0: Normal, MECAB_NOR_NODE 1040 | 1: Unknown, MECAB_UNK_NODE 1041 | 2: Beginning of Sentence, MECAB_BOS_NODE 1042 | 3: End of Sentence, MECAB_EOS_NODE 1043 | 1044 | ###### getAlpha() 1045 | Get the forward log probability. 1046 | ```php 1047 | /** 1048 | * @return float 1049 | */ 1050 | ``` 1051 | 1052 | ###### getBeta() 1053 | Get the backward probability log. 1054 | ```php 1055 | /** 1056 | * @return float 1057 | */ 1058 | ``` 1059 | 1060 | ###### getWCost() 1061 | Get the word arising cost. 1062 | ```php 1063 | /** 1064 | * @return int 1065 | */ 1066 | ``` 1067 | 1068 | ###### getCost() 1069 | Get the cumulative cost of the node. 1070 | ```php 1071 | /** 1072 | * @return int 1073 | */ 1074 | ``` 1075 | 1076 | ###### getProb() 1077 | Get the marginal probability of the node. 1078 | ```php 1079 | /** 1080 | * @return float 1081 | */ 1082 | ``` 1083 | 1084 | ###### isBest() 1085 | Determine whether the node is the best solution. 1086 | ```php 1087 | /** 1088 | * @return boolean 1089 | */ 1090 | ``` 1091 | 1092 | ###### toArray($dump_all) 1093 | Get all elements of the node as an associative array. 1094 | ```php 1095 | /** 1096 | * @param boolean $dump_all Dump all related nodes if true. (Optional) 1097 | * 1098 | * @return array 1099 | */ 1100 | ``` 1101 | 1102 | ###### toString() 1103 | Get the formatted string of the node. 1104 | ```php 1105 | /** 1106 | * @return string 1107 | */ 1108 | ``` 1109 | 1110 | [Top](#contents) 1111 | 1112 | #### MeCab\Path 1113 | Returned by getRPath and getLPath methods on MeCab/Node class. 1114 | ##### Methods 1115 | - [getRNext()](#getrnext) 1116 | - [getLNext()](#getlnext) 1117 | - [getRNode()](#getrnode) 1118 | - [getLNode()](#getlnode) 1119 | - [getProb()](#getprob-1) 1120 | - [getCost()](#getcost-1) 1121 | 1122 | ###### getRNext() 1123 | Get the rnext path. Return NULL if none. 1124 | ```php 1125 | /** 1126 | * @return MeCab/Path 1127 | */ 1128 | ``` 1129 | 1130 | ###### getLNext() 1131 | Get the lext path. Return NULL if none. 1132 | ```php 1133 | /** 1134 | * @return MeCab/Path 1135 | */ 1136 | ``` 1137 | 1138 | ###### getRNode() 1139 | Get the rnode. Return NULL if none. 1140 | ```php 1141 | /** 1142 | * @return MeCab/Node 1143 | */ 1144 | ``` 1145 | 1146 | ###### getLNode() 1147 | Get the lnode. Return NULL if none. 1148 | ```php 1149 | /** 1150 | * @return MeCab/Node 1151 | */ 1152 | ``` 1153 | 1154 | ###### getProb() 1155 | Get the marginal probability of the path. 1156 | ```php 1157 | /** 1158 | * @return float 1159 | */ 1160 | ``` 1161 | 1162 | ###### getCost() 1163 | Get the cumulative cost of the path. 1164 | ```php 1165 | /** 1166 | * @return int 1167 | */ 1168 | ``` 1169 | 1170 | [Top](#contents) 1171 | 1172 | #### MeCab\NodeIterator 1173 | Node iterator class. 1174 | ##### Methods 1175 | - [current()](#current) 1176 | - [key()](#key) 1177 | - [next()](#next) 1178 | - [rewind()](#rewind) 1179 | - [valid()](#valid) 1180 | 1181 | ###### current() 1182 | Return the current element. 1183 | ```php 1184 | /** 1185 | * @return MeCab\Node 1186 | */ 1187 | ``` 1188 | 1189 | ###### key() 1190 | ```php 1191 | /** 1192 | * @return int 1193 | */ 1194 | ``` 1195 | 1196 | ###### next() 1197 | Set pointer to next element. 1198 | 1199 | ###### rewind() 1200 | Set pointer to beginning. 1201 | 1202 | ###### valid() 1203 | Check if there is a current element after calls to rewind() or next(). 1204 | ```php 1205 | /** 1206 | * @return boolean 1207 | */ 1208 | ``` 1209 | 1210 | [Top](#contents) 1211 | 1212 | ### Functions 1213 | - [mecab_version()](#mecab_version) 1214 | - [mecab_split()](#mecab_splitstring-dic_dir-user_dic-filter-persistent) 1215 | - [mecab_new()](#mecab_newarguments-persistent) 1216 | - [mecab_destroy()](#mecab_destroymecab) 1217 | - [mecab_get_partial](#mecab_get_partialmecab) 1218 | - [mecab_set_partial()](#mecab_set_partialmecab-partial) 1219 | - [mecab_get_theta()](#mecab_get_thetamecab) 1220 | - [mecab_set_theta()](#mecab_set_thetamecab-theta) 1221 | - [mecab_get_lattice_level()](#mecab_get_lattice_levelmecab) 1222 | - [mecab_set_lattice_level()](#mecab_set_lattice_levelmecab-level) 1223 | - [mecab_get_all_morphs()](#mecab_get_all_morphsmecab) 1224 | - [mecab_set_all_morphs()](#mecab_set_all_morphsmecab-bool) 1225 | - [mecab_sparse_tostr()](#mecab_sparse_tostrmecab-string-length-output_length) 1226 | - [mecab_sparse_tonode()](#mecab_sparse_tonodemecab-string-length) 1227 | - [mecab_nbest_sparse_tostr()](#mecab_nbest_sparse_tostrmecab-n-string-length-output_length) 1228 | - [mecab_nbest_init()](#mecab_nbest_initmecab-string-length) 1229 | - [mecab_nbest_next_tostr()](#mecab_nbest_next_tostrmecab-output_length) 1230 | - [mecab_nbest_next_tonode()](#mecab_nbest_next_tonodemecab) 1231 | - [mecab_format_node()](#mecab_format_nodemecab-node) 1232 | - [mecab_dictionary_info()](#mecab_dictionary_infomecab) 1233 | - [mecab_node_toarray()](#mecab_node_toarraynode-dump_all) 1234 | - [mecab_node_tostring()](#mecab_node_tostringnode) 1235 | - [mecab_node_prev()](#mecab_node_prevnode) 1236 | - [mecab_node_next()](#mecab_node_nextnode) 1237 | - [mecab_node_enext()](#mecab_node_enextnode) 1238 | - [mecab_node_bnext()](#mecab_node_bnextnode) 1239 | - [mecab_node_rpath()](#mecab_node_rpathnode) 1240 | - [mecab_node_lpath()](#mecab_node_lpathnode) 1241 | - [mecab_node_surface()](#mecab_node_surfacenode) 1242 | - [mecab_node_feature()](#mecab_node_featurenode) 1243 | - [mecab_node_id()](#mecab_node_idnode) 1244 | - [mecab_node_length()](#mecab_node_lengthnode) 1245 | - [mecab_node_rlength()](#mecab_node_rlengthnode) 1246 | - [mecab_node_rcattr()](#mecab_node_rcattrnode) 1247 | - [mecab_node_lcattr()](#mecab_node_lcattrnode) 1248 | - [mecab_node_posid()](#mecab_node_posidnode) 1249 | - [mecab_node_char_type()](#mecab_node_char_typenode) 1250 | - [mecab_node_stat()](#mecab_node_statnode) 1251 | - [mecab_node_alpha()](#mecab_node_alphanode) 1252 | - [mecab_node_beta()](#mecab_node_betanode) 1253 | - [mecab_node_wcost()](#mecab_node_wcostnode) 1254 | - [mecab_node_cost()](#mecab_node_costnode) 1255 | - [mecab_node_prob()](#mecab_node_probnode) 1256 | - [mecab_node_isbest()](#mecab_node_isbestnode) 1257 | - [mecab_path_rnext()](#mecab_path_rnextpath) 1258 | - [mecab_path_lnext()](#mecab_path_lnextpath) 1259 | - [mecab_path_rnode()](#mecab_path_rnodepath) 1260 | - [mecab_path_lnode()](#mecab_path_lnodepath) 1261 | - [mecab_path_prob()](#mecab_path_probpath) 1262 | - [mecab_path_cost()](#mecab_path_costpath) 1263 | 1264 | ###### mecab_version() 1265 | Return MeCab version. 1266 | Return MeCab version. 1267 | ```php 1268 | /** 1269 | * @return string 1270 | */ 1271 | ``` 1272 | 1273 | ###### mecab_split($string, $dic_dir, $user_dic, $filter, $persistent) 1274 | Split string into array of morphemes. 1275 | ```php 1276 | /** 1277 | * @param string $string String to split. 1278 | * @param string $dic_dir Path to dictionary directory. (Optional) 1279 | * @param string $user_dic Path to user dictionary. (Optional) 1280 | * @param callback $filter Filter function or method. (Optional) 1281 | * @param boolean $persistent (Optional) 1282 | * 1283 | * @return array 1284 | */ 1285 | ``` 1286 | 1287 | ###### mecab_new($arguments, $persistent) 1288 | Create new MeCab resource. 1289 | ```php 1290 | /** 1291 | * @param array $arguments Command line arguments. 1292 | * @param boolean $persistent (Optional) 1293 | * 1294 | * @return MeCab 1295 | */ 1296 | ``` 1297 | 1298 | ###### mecab_destroy($mecab) 1299 | Free the tagger. 1300 | ```php 1301 | /** 1302 | * @param MeCab $mecab MeCab resource. 1303 | */ 1304 | ``` 1305 | 1306 | ###### mecab_get_partial($mecab) 1307 | Get current partial parsing mode state. 1308 | ```php 1309 | /** 1310 | * @param MeCab $mecab MeCab resource. 1311 | * 1312 | * @return boolean 1313 | */ 1314 | ``` 1315 | 1316 | ###### mecab_set_partial($mecab, $partial) 1317 | Set partial parsing mode. 1318 | ```php 1319 | /** 1320 | * @param MeCab $mecab MeCab resource. 1321 | * @param boolean $bool Partial parsing mode. 1322 | */ 1323 | ``` 1324 | 1325 | ###### mecab_get_theta($mecab) 1326 | Get current temparature parameter theta. 1327 | ```php 1328 | /** 1329 | * @param MeCab $mecab MeCab resource. 1330 | * 1331 | * @return float 1332 | */ 1333 | ``` 1334 | 1335 | ###### mecab_set_theta($mecab, $theta) 1336 | Set temparature parameter theta. 1337 | ```php 1338 | /** 1339 | * @param MeCab $mecab MeCab resource. 1340 | * @param float/int $theta Temparature parameter theta. 1341 | */ 1342 | ``` 1343 | 1344 | ###### mecab_get_lattice_level($mecab) 1345 | Get current lattice level. 1346 | ```php 1347 | /** 1348 | * @param MeCab $mecab MeCab resource. 1349 | * 1350 | * @return int 1351 | */ 1352 | ``` 1353 | 1354 | ###### mecab_set_lattice_level($mecab, $level) 1355 | Set lattice level. 1356 | ```php 1357 | /** 1358 | * @param MeCab $mecab MeCab resource. 1359 | * @param int $level Lattice level. 1360 | */ 1361 | ``` 1362 | 1363 | ###### mecab_get_all_morphs($mecab) 1364 | Get all-morphs output mode. 1365 | ```php 1366 | /** 1367 | * @param MeCab $mecab MeCab resource. 1368 | * 1369 | * @return bool 1370 | */ 1371 | ``` 1372 | 1373 | ###### mecab_set_all_morphs($mecab, $bool) 1374 | Set all-morphs output mode. 1375 | ```php 1376 | /** 1377 | * @param MeCab $mecab MeCab resource. 1378 | * @param bool $bool All-morphs output mode. 1379 | */ 1380 | ``` 1381 | 1382 | ###### mecab_sparse_tostr($mecab, $string, $length, $output_length) 1383 | Parse string and output results as string. 1384 | ```php 1385 | /** 1386 | * @param MeCab $mecab MeCab resource. 1387 | * @param string $string String to be parsed. 1388 | * @param int $length Length to be analyzed. (Optional) 1389 | * @param int $output_length Maximum length of output. (Optional) 1390 | * 1391 | * @return string 1392 | */ 1393 | ``` 1394 | 1395 | ###### mecab_sparse_tonode($mecab, $string, $length) 1396 | Parse string and output results as MeCab/Node. 1397 | ```php 1398 | /** 1399 | * @param MeCab $mecab MeCab resource. 1400 | * @param string $string String to be parsed. 1401 | * @param int $length Length to be analyzed. (Optional) 1402 | * 1403 | * @return MeCab/Node 1404 | */ 1405 | ``` 1406 | 1407 | ###### mecab_nbest_sparse_tostr($mecab, $n, $string, $length, $output_length) 1408 | Parse given sentence and output N-best results as string. This method causes seg faults for me. 1409 | ```php 1410 | /** 1411 | * @param MeCab $mecab MeCab resource. 1412 | * @param int $n Number of results to obtain. 1413 | * @param string $string String to be parsed. 1414 | * @param int $length Length to be analyzed. (Optional) 1415 | * @param int $output_length Maximum length of output. (Optional) 1416 | * 1417 | * @return string 1418 | */ 1419 | ``` 1420 | 1421 | ###### mecab_nbest_init($mecab, $string, $length) 1422 | Initialize N-best enumeration with a sentence. 1423 | ```php 1424 | /** 1425 | * @param MeCab $mecab MeCab resource. 1426 | * @param string $string String to be parsed. 1427 | * @param int $length Length to be analyzed. (Optional) 1428 | 1429 | * @return boolean 1430 | */ 1431 | ``` 1432 | 1433 | ###### mecab_nbest_next_tostr($mecab, $output_length) 1434 | Get the next result of N-Best as a string. 1435 | ```php 1436 | /** 1437 | * @param MeCab $mecab MeCab resource. 1438 | * @param int $output_length Maximum length of output. (Optional) 1439 | * 1440 | * @return string 1441 | */ 1442 | ``` 1443 | 1444 | ###### mecab_nbest_next_tonode($mecab) 1445 | Get the next result of N-Best as a node. 1446 | ```php 1447 | /** 1448 | * @param MeCab $mecab MeCab resource. 1449 | * 1450 | * @return MeCab\Node 1451 | */ 1452 | ``` 1453 | 1454 | ###### mecab_format_node($mecab, $node) 1455 | Format a node to a string. 1456 | ```php 1457 | /** 1458 | * @param MeCab $mecab MeCab resource. 1459 | * @param MeCab\Node $node Node of source string. 1460 | * 1461 | * @return string 1462 | */ 1463 | ``` 1464 | 1465 | ###### mecab_dictionary_info($mecab) 1466 | Return array of dictionary info. 1467 | ```php 1468 | /** 1469 | * @return array 1470 | */ 1471 | ``` 1472 | 1473 | ###### mecab_node_toarray($node, $dump_all) 1474 | Get all elements of the node as an associative array. 1475 | ```php 1476 | /** 1477 | * @param MeCab\Node $node Node of source string. 1478 | * @param boolean $dump_all Dump all related nodes if true. (Optional) 1479 | * 1480 | * @return array 1481 | */ 1482 | ``` 1483 | 1484 | ###### mecab_node_tostring($node) 1485 | Get the formatted string of the node. 1486 | ```php 1487 | /** 1488 | * @param MeCab\Node $node Node of source string. 1489 | * 1490 | * @return string 1491 | */ 1492 | ``` 1493 | 1494 | ###### mecab_node_prev($node) 1495 | Get the previous node. Return NULL if none. 1496 | ```php 1497 | /** 1498 | * @param MeCab\Node $node Node of source string. 1499 | * 1500 | * @return MeCab\Node 1501 | */ 1502 | ``` 1503 | 1504 | ###### mecab_node_next($node) 1505 | Get the next node. Return NULL if none. 1506 | ```php 1507 | /** 1508 | * @param MeCab\Node $node Node of source string. 1509 | * 1510 | * @return MeCab\Node 1511 | */ 1512 | ``` 1513 | 1514 | ###### mecab_node_enext($node) 1515 | Get the next node which has same end point as the given node. Return NULL if none. 1516 | ```php 1517 | /** 1518 | * @param MeCab\Node $node Node of source string. 1519 | * 1520 | * @return MeCab\Node 1521 | */ 1522 | ``` 1523 | 1524 | ###### mecab_node_bnext($node) 1525 | Get the next node which has same beginning point as the given node. Return NULL if none. 1526 | ```php 1527 | /** 1528 | * @param MeCab\Node $node Node of source string. 1529 | * 1530 | * @return MeCab\Node 1531 | */ 1532 | ``` 1533 | 1534 | ###### mecab_node_rpath($node) 1535 | Get the next node which has same end point as the given node. Return NULL if none. 1536 | ```php 1537 | /** 1538 | * @param MeCab\Node $node Node of source string. 1539 | * 1540 | * @return MeCab\Path 1541 | */ 1542 | ``` 1543 | 1544 | ###### mecab_node_lpath($node) 1545 | Get the next node which has same beginning point as the given node. Return NULL if none. 1546 | ```php 1547 | /** 1548 | * @param MeCab\Node $node Node of source string. 1549 | * 1550 | * @return MeCab\Path 1551 | */ 1552 | ``` 1553 | 1554 | ###### mecab_node_surface($node) 1555 | Get the surface of the node. 1556 | ```php 1557 | /** 1558 | * @param MeCab\Node $node Node of source string. 1559 | * 1560 | * @return string 1561 | */ 1562 | ``` 1563 | 1564 | ###### mecab_node_feature($node) 1565 | Get the feature of the node. 1566 | ```php 1567 | /** 1568 | * @param MeCab\Node $node Node of source string. 1569 | * 1570 | * @return string 1571 | */ 1572 | ``` 1573 | 1574 | ###### mecab_node_id($node) 1575 | Get the ID of the node. 1576 | ```php 1577 | /** 1578 | * @param MeCab\Node $node Node of source string. 1579 | * 1580 | * @return int 1581 | */ 1582 | ``` 1583 | 1584 | ###### mecab_node_length($node) 1585 | Get the length of the node's surface. 1586 | ```php 1587 | /** 1588 | * @param MeCab\Node $node Node of source string. 1589 | * 1590 | * @return int 1591 | */ 1592 | ``` 1593 | 1594 | ###### mecab_node_rlength($node) 1595 | Get the length of the node's surface including it's leading whitespace. 1596 | ```php 1597 | /** 1598 | * @param MeCab\Node $node Node of source string. 1599 | * 1600 | * @return int 1601 | */ 1602 | ``` 1603 | 1604 | ###### mecab_node_rcattr($node) 1605 | Get the ID of the right context. 1606 | ```php 1607 | /** 1608 | * @param MeCab\Node $node Node of source string. 1609 | * 1610 | * @return int 1611 | */ 1612 | ``` 1613 | 1614 | ###### mecab_node_lcattr($node) 1615 | Get the ID of the left context. 1616 | ```php 1617 | /** 1618 | * @param MeCab\Node $node Node of source string. 1619 | * 1620 | * @return int 1621 | */ 1622 | ``` 1623 | 1624 | ###### mecab_node_posid($node) 1625 | Get the ID of the part of speech. 1626 | ```php 1627 | /** 1628 | * @param MeCab\Node $node Node of source string. 1629 | * 1630 | * @return int 1631 | */ 1632 | ``` 1633 | 1634 | ###### mecab_node_char_type($node) 1635 | Get the type of character. 1636 | ```php 1637 | /** 1638 | * @param MeCab\Node $node Node of source string. 1639 | * 1640 | * @return int 1641 | */ 1642 | ``` 1643 | 1644 | ###### mecab_node_stat($node) 1645 | Get the status of the node. 1646 | ```php 1647 | /** 1648 | * @param MeCab\Node $node Node of source string. 1649 | * 1650 | * @return int 1651 | */ 1652 | ``` 1653 | 0: Normal, MECAB_NOR_NODE 1654 | 1: Unknown, MECAB_UNK_NODE 1655 | 2: Beginning of Sentence, MECAB_BOS_NODE 1656 | 3: End of Sentence, MECAB_EOS_NODE 1657 | 1658 | ###### mecab_node_alpha($node) 1659 | Get the forward log probability. 1660 | ```php 1661 | /** 1662 | * @param MeCab\Node $node Node of source string. 1663 | * 1664 | * @return float 1665 | */ 1666 | ``` 1667 | 1668 | ###### mecab_node_beta($node) 1669 | Get the backward probability log. 1670 | ```php 1671 | /** 1672 | * @param MeCab\Node $node Node of source string. 1673 | * 1674 | * @return float 1675 | */ 1676 | ``` 1677 | 1678 | ###### mecab_node_wcost($node) 1679 | Get the word arising cost. 1680 | ```php 1681 | /** 1682 | * @param MeCab\Node $node Node of source string. 1683 | * 1684 | * @return int 1685 | */ 1686 | ``` 1687 | 1688 | ###### mecab_node_cost($node) 1689 | Get the cumulative cost of the node. 1690 | ```php 1691 | /** 1692 | * @param MeCab\Node $node Node of source string. 1693 | * 1694 | * @return int 1695 | */ 1696 | ``` 1697 | 1698 | ###### mecab_node_prob($node) 1699 | Get the marginal probability of the node. 1700 | ```php 1701 | /** 1702 | * @param MeCab\Node $node Node of source string. 1703 | * 1704 | * @return float 1705 | */ 1706 | ``` 1707 | 1708 | ###### mecab_node_isbest($node) 1709 | Determine whether the node is the best solution. 1710 | ```php 1711 | /** 1712 | * @param MeCab\Node $node Node of source string. 1713 | * 1714 | * @return boolean 1715 | */ 1716 | ``` 1717 | 1718 | ###### mecab_path_rnext($path) 1719 | Get the rnext path. Return NULL if none. 1720 | ```php 1721 | /** 1722 | * @param MeCab\Path $path Path of source string. 1723 | * 1724 | * @return MeCab\Path 1725 | */ 1726 | ``` 1727 | 1728 | ###### mecab_path_lnext($path) 1729 | Get the lext path. Return NULL if none. 1730 | ```php 1731 | /** 1732 | * @param MeCab\Path $path Path of source string. 1733 | * 1734 | * @return MeCab\Path 1735 | */ 1736 | ``` 1737 | 1738 | ###### mecab_path_rnode($path) 1739 | Get the rnode. Return NULL if none. 1740 | ```php 1741 | /** 1742 | * @param MeCab\Path $path Path of source string. 1743 | * 1744 | * @return MeCab\Node 1745 | */ 1746 | ``` 1747 | 1748 | ###### mecab_path_lnode($path) 1749 | Get the lnode. Return NULL if none. 1750 | ```php 1751 | /** 1752 | * @param MeCab\Path $path Path of source string. 1753 | * 1754 | * @return MeCab\Node 1755 | */ 1756 | ``` 1757 | 1758 | ###### mecab_path_prob($path) 1759 | Get the marginal probability of the path. 1760 | ```php 1761 | /** 1762 | * @param MeCab\Path $path Path of source string. 1763 | * 1764 | * @return float 1765 | */ 1766 | ``` 1767 | 1768 | ###### mecab_path_cost($path) 1769 | Get the cumulative cost of the path. 1770 | ```php 1771 | /** 1772 | * @param MeCab\Path $path Path of source string. 1773 | * 1774 | * @return int 1775 | */ 1776 | ``` 1777 | 1778 | [Top](#contents) 1779 | 1780 | ## Other Resources 1781 | The University of the Ryukyus Department of Mechanical Systems Engineering maintains a php-mecab API documentation page that can be useful. 1782 | [http://mechsys.tec.u-ryukyu.ac.jp/~oshiro/php_mecab_apis.html](http://mechsys.tec.u-ryukyu.ac.jp/~oshiro/php_mecab_apis.html) 1783 | 1784 | The MeCab documentation is here on github, but its in Japanese only and is a little outdated. 1785 | [http://taku910.github.io/mecab/](http://taku910.github.io/mecab/) 1786 | 1787 | jordwest has translated parts the MeCab documentation into English here. 1788 | [https://github.com/jordwest/mecab-docs-en](https://github.com/jordwest/mecab-docs-en) 1789 | 1790 | The MeCab api documentation is up on googlecode. 1791 | [https://mecab.googlecode.com/svn/trunk/mecab/doc/doxygen/index.html](https://mecab.googlecode.com/svn/trunk/mecab/doc/doxygen/index.html) 1792 | 1793 | If you're using an IDE, fumikito has a gist that can help with php-mecab class recognition. 1794 | [https://gist.github.com/fumikito/bb172b4cf5648c7f8451](https://gist.github.com/fumikito/bb172b4cf5648c7f8451) 1795 | 1796 | If an app your using requires php-mecab and you'd like to use Travis CI, check out the [example-travis.yml](https://github.com/nihongodera/php-mecab-documentation/blob/master/example-travis.yml) file and the accompanying [travis-install-php.sh](https://github.com/nihongodera/php-mecab-documentation/blob/master/travis-install-php-mecab.sh) file in this repository. 1797 | 1798 | [Top](#contents) 1799 | 1800 | ## Contributing 1801 | Please help me to improve this guide. If you find errors or places where you feel this guide is lacking, please create an issue or make a pull request. Also, I would love to see this guide translated into other languages, especially Japanese. Any help with translations would be much appreciated. 1802 | --------------------------------------------------------------------------------