├── Parser.py └── README.md /Parser.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Fri Nov 3 03:47:48 2017 4 | 5 | @author: Ahmed 6 | """ 7 | 8 | class Parser(object): 9 | """ 10 | A natural language parser is a program that works out the grammatical 11 | structure of sentences, for instance, which groups of words go together 12 | (as “phrases”) and which words are the subject or object of a verb. 13 | Probabilistic parsers use knowledge of language gained from hand-parsed 14 | sentences to try to produce the most likely analysis of new sentences. 15 | These statistical parsers still make some mistakes, but commonly work 16 | rather well. Their development was one of the biggest breakthroughs in 17 | natural language processing in the 1990s. 18 | """ 19 | 20 | def __init__(self, model_path, path_to_jar, path_to_models_jar): 21 | # nltk package 22 | from nltk.parse.stanford import StanfordParser 23 | self.__model_path = model_path 24 | self.__path_to_jar = path_to_jar 25 | self.__path_to_model_jar = path_to_models_jar 26 | self.__stf_parser = StanfordParser( 27 | path_to_jar=path_to_jar, 28 | path_to_models_jar=path_to_models_jar, 29 | model_path=model_path, 30 | encoding='utf-8' 31 | ) 32 | 33 | 34 | def parse_sentence(self, text): 35 | """ 36 | Arguments: 37 | text -- input text string to be parsed 38 | 39 | Returns: 40 | list of the parsed result in the form (parent_tag(tag, word)) 41 | """ 42 | self.__text = text 43 | return list(self.__stf_parser.raw_parse(text)) 44 | 45 | 46 | def tree_print(self): 47 | """ 48 | Arguments: 49 | -- None 50 | Returns: 51 | -- None 52 | """ 53 | for line in self.__stf_parser.raw_parse(self.__text): 54 | for sentence in line: 55 | print(sentence) 56 | 57 | 58 | def tree_draw(self): 59 | """ 60 | Arguments: 61 | -- None 62 | Returns: 63 | -- None 64 | """ 65 | for line in self.__stf_parser.raw_parse(self.__text): 66 | for sentence in line: 67 | sentence.draw() 68 | 69 | 70 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Arabic_Parser_NLTK 2 | Arabic Parser Using Stanford API interface with python nltk 3 | 4 | ### What is Paser ? 5 | A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as “phrases”) and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. These statistical parsers still make some mistakes, but commonly work rather well. Their development was one of the biggest breakthroughs in natural language processing in the 1990s. 6 | 7 | ### What is nltk ? 8 | NLTK is the most famous Python Natural Language Processing Toolkit, [NLTK](http://www.nltk.org/) is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. 9 | 10 | ### Requirements 11 | * python (ver 3.5/3.6 is recommend if you work on arabic otherwise it doesn't matter) 12 | * java 13 | * [nltk package](http://www.nltk.org/install.html) 14 | * [stanford software](https://nlp.stanford.edu/software/lex-parser.shtml) 15 | 16 | ### How to use ? 17 | Once you have downloaded Stanford API, it's a little tricky but you can run the arabic parser successfuly. 18 | 19 | code snippet example: 20 | 21 | * *model_path:* [pretrained model](https://nlp.stanford.edu/software/stanford-arabic-corenlp-2017-06-09-models.jar) you can find it in the stanford-arabic-corenlp-yyyy-mm-dd-models/edu/stanford/nlp/models/lexparser/arabicFactored.ser.gz 22 | 23 | * *path_to_jar:* path_to/stanford-parser-full-yyyy-mm-dd/stanford-parser.jar 24 | 25 | * *path_to_models_jar:* path_to/stanford-parser-full-yyyy-mm-dd/stanford-parser-xx.xx.xx-models.jar' 26 | 27 | ```python 28 | from Parser import Parser 29 | parser = Parser(model_path=ar_model_path, path_to_jar=my_path_to_jar, path_to_models_jar=my_path_to_models_jar) 30 | result = parser.parse_sentence(u'ذهبت الى منزلى الذى كان بعيداً بعد الفجر') 31 | print(result) 32 | 33 | >>> [Tree('ROOT', [Tree('S', [Tree('VP', [Tree('VBD', ['ذهبت']), Tree('PP', [Tree('IN', ['الى']), Tree('NP' 34 | >>> [Tree('NN'['منزلى']), Tree('SBAR', [Tree('WHNP', [Tree('WP', ['الذى'])]), Tree('S', [Tree('VP', [Tree('VBD', ['كان']), 35 | >>> Tree('NP', [Tree('JJ', ['بعيدا'])]), Tree('NP', [Tree('NN', ['بعد']), Tree('NP', [Tree('DTNN' 36 | >>> ['الفجر'])])])])])])])])])])])] 37 | ``` 38 | Congrats !! you can friendly use the parser now 39 | 40 | ## NOTES 41 | Java is not required by nltk, however some third party software may be dependent on it. NLTK finds the java binary via the system PATH environment variable, or through JAVAHOME or JAVA_HOME. 42 | To search for java binaries (jar files), nltk checks the java CLASSPATH variable, however there are usually independent environment variables which are also searched for each dependency individually. 43 | 44 | ### For Windows Users 45 | * you Java to set your [HOMEPATH](#1589F0) variable must be set in Environemt variables otherwise you will get many errors 46 | * [Here](https://confluence.atlassian.com/doc/setting-the-java_home-variable-in-windows-8895.html) how you can set it easily 47 | 48 | ### Linux(Ubuntu) Users 49 | * you have to set your [CLASSPATH](#1589F0) variable must be set in Environemt variables otherwise you will get many errors 50 | * It is best to use the package manager to install java. 51 | * [Here](https://introcs.cs.princeton.edu/java/15inout/classpath.html) how you can set it easily for MacOSx or ubuntu 52 | 53 | It's easy and available for everyone but usually installing third party software is boring and tricky and you can [check](https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software) if you want to know how nltk discover third party software 54 | 55 | [Here](https://nlp.stanford.edu/) all Stanford official work with NLP group you can check it if you want to learn more 56 | [Here](http://www.nltk.org/) NLTK official documentation 57 | --------------------------------------------------------------------------------