├── stemmers └── README.txt ├── escape-unicode.sh ├── unescape-unicode.sh ├── arabic-test.txt ├── nltk-isri-test.py ├── README.md ├── jslib ├── jquery.ba-hashchange.min.js ├── shortcut.js ├── xregexp-all-min.js └── jquery.js ├── jsastem.js └── demo.html /stemmers/README.txt: -------------------------------------------------------------------------------- 1 | This directory contains: 2 | 3 | - The NLTK project (git clone https://github.com/nltk/nltk.git) 4 | 5 | - ATMINE (svn checkout http://atmine.googlecode.com/svn/trunk/ atmine-read-only) 6 | 7 | - Buckwalter's Aramorph 1.0: http://www.ldc.upenn.edu/Catalog/LDC2002L49.html 8 | -------------------------------------------------------------------------------- /escape-unicode.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # To test: 4 | # echo '\u062a\u0645\u0644 \u0647\u0645\u0644 \u062a\u0627\u0646 \u062a\u064a\u0646 \u0643\u0645\u0644' | 5 | # unescape-unicode.pl | escape-unicode.pl 6 | 7 | # Do the reverse of: 8 | #perl -C -Mutf8 -pi -e 's/\\u([0-9a-fA-F]{4})/chr(hex(join("",$1)))/ge' 9 | perl -C -Mutf8 -MEncode -pi -e 's/(.)/{ ord($1) > 255 ? sprintf("\\u%04x", ord $1) : $1 }/ge' 10 | -------------------------------------------------------------------------------- /unescape-unicode.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # To test: 4 | # echo '\u062a\u0645\u0644 \u0647\u0645\u0644 \u062a\u0627\u0646 \u062a\u064a\u0646 \u0643\u0645\u0644' | 5 | # unescape-unicode.pl 6 | 7 | # Hmm this one lines takes up a whole page as sub unescape on: 8 | # http://cpansearch.perl.org/src/ITWARRIOR/Unicode-Escape-0.0.2/lib/Unicode/Escape.pm 9 | 10 | perl -C -Mutf8 -pi -e 's/\\u([0-9a-fA-F]{4})/chr(hex(join("",$1)))/ge' 11 | -------------------------------------------------------------------------------- /arabic-test.txt: -------------------------------------------------------------------------------- 1 | Arabic Pangrams: 2 | 3 | صِف خَلقَ خَودِ كَمِثلِ الشَمسِ إِذ بَزَغَت — يَحظى الضَجيعُ بِها نَجلاءَ مِعطارِ (A poem by Al Farāhīdi) 4 | نص حكيم له سر قاطع وذو شأن عظيم مكتوب على ثوب أخضر ومغلف بجلد أزرق 5 | nṣ ḥkym lh sr qāṭʿ uḏu šān ʿẓym mktub ʿala ṯub aẖḍr umġlf bǧld azrq 6 | A wise text which has an absolute secret and great importance, written on a green cloth and covered with blue leather (it has a riddle built into it) 7 | ابجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ 8 | 9 | Some other difficult words for the stemmer to chew on: 10 | يعد منعت إضربن كتابهما حفظة 11 | بالإستنصارهما مفاتيح مدرسة 12 | نصكما قنا تنبت 13 | etc. etc. etc. 14 | -------------------------------------------------------------------------------- /nltk-isri-test.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | import sys 4 | ### Nope all this didn't seem to make it work when redirecting output to a pipe 5 | #import os 6 | #import codecs 7 | #sys.stdout = codecs.getwriter('utf8')(sys.stdout) 8 | #sys.stdin = codecs.getwriter('utf8')(sys.stdin) 9 | #os.environ['PYTHONIOENCODING'] = 'UTF_8' 10 | 11 | # However this 'hack' did, oh well 12 | reload(sys) 13 | sys.setdefaultencoding('utf-8') 14 | 15 | 16 | from nltk.stem.isri import ISRIStemmer 17 | 18 | import imp 19 | 20 | foo = imp.load_source('nltk.stem.isri', './stemmers/nltk/nltk/stem/isri.py') 21 | st = foo.ISRIStemmer() 22 | 23 | for line in sys.stdin: 24 | #print line 25 | for word in line.split(): 26 | print word, ' -> ', st.stem(word), ' ' 27 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | JSASTEM - JavaScript Arabic Stemmer 3 | =================================== 4 | 5 | This little project aims to create a simple arabic stemmer 6 | implemented in JavaScript. 7 | 8 | Why? 9 | ==== 10 | 11 | Good question. Because I need one for my other project, 12 | Mabhathu Tulab (A student's place of research) which 13 | is an Arabic - Arabic dictionary, similar to http://baheth.info 14 | but better of course ;) Its main feature is that the 15 | user may click on any word of a dictionary entry that may then 16 | be further explained by an overlay. 17 | 18 | How? 19 | ==== 20 | 21 | It seems no-one has implemented one in JavaScript yet, for 22 | good reason I imagine. The ISRI stemmer as implemented by the 23 | NLTK project seems like a very straight forward stemmer, nothing 24 | too complicated and time consuming for my needs. This project 25 | therefore aims to port the python script into javascript. 26 | 27 | Plans 28 | ===== 29 | 30 | Hopefully I can find some time to increase accuracy. Some words 31 | can be derived from multiple possible theoretical roots so these 32 | could be enumerated in the return value. Prior to returning them 33 | they could be compared against some known lists of existing roots 34 | to filter out unknown roots. 35 | 36 | License, Copyright & Contact 37 | ============================ 38 | 39 | License: GPL 40 | Copyright: Erik Taal ((http://ejtaal.net) 41 | 42 | -------------------------------------------------------------------------------- /jslib/jquery.ba-hashchange.min.js: -------------------------------------------------------------------------------- 1 | /* 2 | * jQuery hashchange event - v1.3 - 7/21/2010 3 | * http://benalman.com/projects/jquery-hashchange-plugin/ 4 | * 5 | * Copyright (c) 2010 "Cowboy" Ben Alman 6 | * Dual licensed under the MIT and GPL licenses. 7 | * http://benalman.com/about/license/ 8 | */ 9 | (function($,e,b){var c="hashchange",h=document,f,g=$.event.special,i=h.documentMode,d="on"+c in e&&(i===b||i>7);function a(j){j=j||location.href;return"#"+j.replace(/^[^#]*#?(.*)$/,"$1")}$.fn[c]=function(j){return j?this.bind(c,j):this.trigger(c)};$.fn[c].delay=50;g[c]=$.extend(g[c],{setup:function(){if(d){return false}$(f.start)},teardown:function(){if(d){return false}$(f.stop)}});f=(function(){var j={},p,m=a(),k=function(q){return q},l=k,o=k;j.start=function(){p||n()};j.stop=function(){p&&clearTimeout(p);p=b};function n(){var r=a(),q=o(m);if(r!==m){l(m=r,q);$(e).trigger(c)}else{if(q!==m){location.href=location.href.replace(/#.*/,"")+q}}p=setTimeout(n,$.fn[c].delay)}$.browser.msie&&!d&&(function(){var q,r;j.start=function(){if(!q){r=$.fn[c].src;r=r&&r+a();q=$('