├── TO_DO.md └── Readme.md /TO_DO.md: -------------------------------------------------------------------------------- 1 | **TO DO:** 2 | 3 | - convert this to a latex document 4 | - summary information of corpora and lexicons in a table 5 | - automatically generate a .bib file for bibliographic references 6 | - automatically generate table of contents 7 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | ## Introduction 2 | 3 | The Tunisian dialect is an under-resourced language. This project is a personal effort to put scattered sources of information, 4 | resources, and tools related to automated processing of text written in the Tunisian dialect in one place for all to use. 5 | 6 | Hopefully it will save NLP practitioners valuable time spent otherwise chasing after information scattered all over the Web. 7 | 8 | I have categorized this information into the following categories: 9 | 10 | 1. Publicly Available Corpora and Lexicons 11 | 2. NLP Software 12 | 3. Scientific Papers and Articles 13 | 4. Books 14 | 5. Web Articles & Links 15 | 6. Academic research groups & labs 16 | 7. Conferences & Workshops 17 | 18 | 19 | If you would like to contribute or add a new listing, please either send a git pull request or email me at chiraz.benabdelkader@gmail.com 20 | 21 | 22 | 23 | ## Publicly Available Corpora and Lexicons 24 | 25 | - The MADAR Arabic Dialect Corpus and Lexicon, H. Bouamor et al., http://nlp.qatar.cmu.edu/madar/ 26 | 27 | Description: 28 | "The latest version of the lexicon is available for browsing online." 29 | 30 | - Tunisian Arabic Corpus, by Karen McNeil and Miled Faiza, http://www.tunisiya.org 31 | 32 | Description: 33 | "There are currently 2,006 texts in the corpus, comprising 881,964 words. 34 | The main categories currently included are 35 | 1) traditional written sources (folklore, songs, folk poetry, proverb collections, screenplays) 36 | 2) new written sources (blogs, email, Facebook, forum postings) -- currently the dominant source 37 | 3) transcribed audio (e.g. from radio programming)." 38 | "The corpus is available freely online at tunisiya.org, where users can perform complex concordance 39 | searches and view search results in context, with access to the full text. " 40 | 41 | - DID-LREC-2018: Training and test data for the Arabic dialect identification (DID) shared task at LREC 2018, https://github.com/drelhaj/ArabicDialects/tree/master/ArabicSharedTask 42 | 43 | - Various small lexicons contributed by N. Karmani Ben Moussa as part of her PhD thesis work, last updated August 2016, https://github.com/NadiaBMKarmani 44 | 45 | - AOC: Arabic online commentary dataset by Omar Zaidan, last updated August 2012, https://github.com/sjeblee/AOC 46 | 47 | Description: 48 | "The AOC dataset was created by crawling the websites of three Arabic newspapers, 49 | and extracting online articles and readers' comments. The readers' comments are 50 | arguably more "interesting", which is why we call this the *commentary* dataset, 51 | but the articles themselves are also included." 52 | 53 | - TSAC: Tunisian sentiment analysis corpus, https://github.com/fbougares/TSAC 54 | 55 | Description: 56 | "About 17k user comments manually annotated to positive and negative polarities. 57 | This corpus is collected from Facebook users comments written on official pages 58 | of Tunisian radios and TV channels namely Mosaique FM, JawhraFM, Shemes FM, 59 | HiwarElttounsi TV and Nessma TV. The corpus is collected from a period spanning 60 | January 2015 until June 2016." 61 | 62 | - CODA Seed Lexicon, https://sites.google.com/a/nyu.edu/coda/dialect-specific 63 | 64 | 65 | ## NLP Software 66 | 67 | ### Open-source 68 | 69 | - FARASA toolkit, http://qatsdemo.cloudapp.net/farasa/ 70 | 71 | Description: 72 | "Farasa (which means “insight” in Arabic), is a fast and accurate text processing toolkit for Arabic text. 73 | Farasa consists of the segmentation/tokenization module, POS tagger, Arabic text Diacritizer, and Dependency Parser. 74 | We measure the performance of the segmenter in terms of accuracy and efficiency, in two NLP tasks, namely Machine Translation (MT) 75 | and Information Retrieval (IR). Farasa outperforms or equalizes state-of-the-art Arabic segmenters (Stanford and MADAMIRA), 76 | while being more than one order of magnitude faster." 77 | 78 | 79 | - BAMA morphological analyzer 80 | 81 | - CODA (Conventional Orthography for Dialectal Arabic), https://sites.google.com/a/nyu.edu/coda/ 82 | 83 | ### Free online demo-only tools 84 | 85 | - ADIDA: Automatic Dialect Identification for Arabic, https://adida.abudhabi.nyu.edu/ 86 | 87 | Description: 88 | "This interface is a demo of the MADAR project dialect identification system developed by Salameh, Bouamor and Habash (2018). 89 | The system is able to distinguish amongst 25 cities (from Rabat to Muscat) in addition to Modern Standard Arabic. 90 | To use the demo enter text in Standard or Dialectal Arabic. The cities and regions that our system identifies will be lit up. 91 | The ADIDA web interface is described in the paper by Obeid, Salameh, Bouamor and Habash (2019). " 92 | 93 | - Farasa demo version, http://qatsdemo.cloudapp.net/farasa/demo.html 94 | 95 | 96 | ### Commercial tools & resources 97 | 98 | - Ramitechs (http://www.ramitechs.com/) , a company that creates and annotates several types of corpora and lexicons using expert linguists. 99 | 100 | 101 | 102 | ## Scientific Papers and Articles 103 | 104 | 105 | ### Survey papers 106 | 107 | - I. Guellil, H. Saadane, F. Azouaou, B. Gueni, and D. Nouvel, "Arabic natural language processing: An overview", In ArXiv, 2019. 108 | URL: https://arxiv.org/ftp/arxiv/papers/1903/1903.02784.pdf 109 | 110 | - A. Mekki, I. Zribi, M. Ellouze, and L. Hadrich Belguith, "Critical description of TA linguistic resources", 111 | In The 4th International Conference on Arabic Computational Linguistics (ACLing 2018), 2018. 112 | Abstract: This paper presents a critical description of natural language processing for Tunisian Arabic. Indeed, several linguistic resources 113 | were proposed for the three types of Tunisian Arabic (intellectualized dialect, spontaneous dialect and electronic dialect). We 114 | present different linguistic resources (corpora, lexicons and linguistic analysis tools). This study can be used as a quick reference 115 | for the scientific community working on natural language processing in general and more precisely those studying Tunisian Arabic. 116 | 117 | 118 | - J. Younes, H. Achour, E. Souissi, and A. Ferchichi, "Survey on Corpora Availability for the Tunisian Dialect Automatic Processing", 119 | In JCCO Joint International Conference on ICT in Education and Training, International Conference on Computing in Arabic, 120 | and International Conference on Geocomputing (JCCO: TICET-ICCA-GECO), 2018. 121 | URL: https://ieeexplore.ieee.org/document/8726213 122 | 123 | Abstract: The language study and automatic processing require the availability of large raw and annotated corpora. 124 | Collecting data and constructing such language resources are non-trivial tasks in the NLP field, especially when it 125 | comes to deal with low-resource languages. In this paper, we are concerned with the Tunisian dialect (TD) and propose 126 | to survey the availability of corpora for its automatic processing. From the study of the main works that have been 127 | carried out in TD language processing, we were able to identify and categorize the different types of corpora that were 128 | constructed as part of these works. We present, in this paper, a summary of the identified TD corpora characteristics 129 | as well as an inventory of those which are accessible online. 130 | 131 | - S. Harrat, K. Meftouh, K. Smaıli, "Maghrebi Arabic dialect processing: an overview", 132 | in International Conference on Natural Language, Signal and Speech Processing (ICNLSSP), 2017 133 | 134 | Keywords: Arabic dialect, Maghrebi Arabic dialects, Tunisian Arabic, Algerian Arabic, Moroccan Arabic, survey paper 135 | 136 | - Wajdi Zaghouani, "Critical Survey of the Freely Available Arabic Corpora", arXiv, 2017 137 | URL: https://arxiv.org/ftp/arxiv/papers/1702/1702.07835.pdf 138 | 139 | - Abdulhadi Shoufan and Sumaya Al-Ameri, "Natural Language Processing for Dialectical Arabic: A Survey", 140 | in Proceedings of the Second Workshop on Arabic Natural Language Processing, pages 36–48, 2015 141 | 142 | - Kareem Darwich and Walid Magdy, "Arabic Information Retrieval", in Foundations and Trends in Information Retrieval, 2014 143 | 144 | 145 | ### Construction of corpora and linguistic resources 146 | 147 | - K. Meftouh, S. Harrat, and K. Smaıli, "PADIC: extension and new experiments" 148 | In 7th International Conference on Advanced Technologies ICAT, Apr 2018. 149 | 150 | Abstract (excerpt): PADIC is a multidialectal parallel Arabic corpus. 151 | 152 | Notes and excerpts: 153 | "It was composed initially by five Arabic dialects, three from 154 | the Maghreb and two from the Middle East, in addition to 155 | standard Arabic. In this paper, we present an augmented 156 | version of PADIC with a Moroccan dialect. " 157 | 158 | 159 | - Houda Bouamor, Nizar Habash,y Mohammad Salameh, Wajdi Zaghouani, Owen Rambow, Dana Abdulrahim, Ossama Obeid, 160 | Salam Khalifa, Fadhl Eryani, Alexander Erdmann, Kemal Oflazer, "The MADAR Arabic Dialect Corpus and Lexicon", 161 | In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2018 162 | 163 | - W. Zaghouani and A. Charfi, "ArapTweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification", 164 | In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2018. 165 | 166 | - N. Karmani Ben Moussa, H. Soussou ; A. M. Alimi, "Tunisian Arabic aeb Wordnet: Current State and Future Extensions", 167 | In First International Conference on Arabic Computational Linguistics (ACLing), 2015. 168 | URL: https://ieeexplore.ieee.org/abstract/document/7422271 169 | 170 | - J. Younes and E. Souissi, "A quantitative view of Tunisian dialect electronic writing", 2015. 171 | 172 | Index terms: dialect, Tunisian, corpus, language, electronic writing, translation, normalization 173 | 174 | Notes and excerpts: 175 | "This paper focuses specifically on electronic writing with Latin letters in Tunisian dialect. 176 | We describe the methodology used for the construction of a dialectal corpus, 177 | present the characteristics of this new form of writing and detail its peculiarities with numbers. 178 | The built corpus, consisting of 43222 messages." 179 | "We showed that over 60% of the extracted messages are written in Latin letters, 64% of which 180 | contain Tunisian dialect words ... and the majority of words containing numbers are in dialect." 181 | "it would be interesting to make an automatic identification tool for the words written in dialect using the 182 | Latin alphabet and then proceed to either its translation or normalization. " 183 | 184 | - I. Zribi, M. Ellouze, L. H. Belguith, and P. Blache, “Spoken Tunisian Arabic corpus” STAC”: Transcription and annotation.” 185 | In Research in Computing Science, vol. 90, pp. 123–135, 2015 186 | 187 | Notes and excerpts: 188 | "transcribed 5 hours of spontaneous Tunisian Arabic speech enriched with morpho-syntactic and disfluencies annotations." 189 | 190 | - J. Younes, H. Achour, and E. Souissi, "Constructing Linguistic Resources for the Tunisian Dialect Using Textual User-Generated Contents on the Social Web", 191 | In Proceedings of the 1st International Workshop on Natural Language Processing for Informal Text (NLPIT), pp. 3–14, 2015. 192 | 193 | Notes and excerpts: 194 | "The authors extracted textual user-generated content from social networks that they filtered and classified 195 | automatically. From the built corpora they drew a picture of the main features related to the Tunisian dialect." 196 | 197 | - A. Masmoudi, Y. Esteve, M. E. Khmekhem, F. Bougares, and L. H. Belguith, “Phonetic tool for the Tunisian Arabic,” 198 | In Spoken Language Technologies for Under-Resourced Languages, 2014. 199 | URL: https://link.springer.com/chapter/10.1007%2F978-3-319-24800-4_1 200 | 201 | Keywords: Tunisian dialect Language identification Corpus construction Dictionary construction Social web textual contents 202 | 203 | Notes and excerpts: 204 | "The authors generated automatically phonetic dictionaries for the Tunisian dialect by using a rules-based approach. 205 | The work is part of an automatic speech recognition framework for Tunisian Arabic in the field of railway transport." 206 | 207 | - A. Hamdi, N. Gala, and A. Nasr, “Automatically building a Tunisian lexicon for deverbal nouns,” in Proceedings of the First 208 | In Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, pp. 95–102, 2014. 209 | 210 | Notes and excerpts: 211 | "The authors presented a bilingual lexicon of deverbal nouns between MSA and Tunisian dialect that has been created automatically. 212 | They extended an existing Tunisian verbal lexicon by using a table of deverbal patterns in order to generate pairs of Tunisian and MSA deverbal nouns." 213 | 214 | - Rihab Bouchlaghem and Aymen Elkhlifi, "Tunisian dialect Wordnet creation and enrichment using web resources and other Wordnets", 2014. 215 | 216 | Notes and excerpts: 217 | "we propose TunDiaWN (Tunisian dialect Wordnet) a lexical resource for the dialect language spoken in Tunisia. 218 | Our TunDiaWN construction approach is founded, in one hand, on a corpus based method to analyze and extract Tunisian dialect words. 219 | A CLUSTERING technique is adapted and applied to mine the possible relations existing between the Tunisian dialect extracted words and 220 | to group them into meaningful groups." 221 | 222 | - Ryan Cotterell and Chris Callison-Burch, "A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic", 223 | In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2014. 224 | 225 | - J. Karoui, M. Graja, M. Boudabous, and L. H. Belguith, “Domain ontology construction from a Tunisian spoken dialogue corpus,” 226 | In International Conference on Web and Information Technologies, ICWIT’2013, 2013. 227 | 228 | Notes and excerpts: 229 | "This work is related to the construction of a railway domain ontology from a Tunisian speech corpus. The authors used a 230 | statistical method for term and concept extraction whereas for semantic relation extraction they choose a linguistic approach." 231 | 232 | - I. Zribi, M. E. Khemakhem, and L. H. Belguith, “Morphological analysis of Tunisian dialect,” 233 | In International Joint Conference on Natural Language Processing, pp. 992–996, 2013. 234 | "proposes a morphological analyzer for the Tunisian dialect based on a MSA morphological analyzer, 235 | as well as a lexicon for the Tunisian dialect as an expansion of an exisiting MSA lexicon." 236 | 237 | - O. F. Zaidan and C. Callison-Burch, “The Arabic online commentary (AOC) dataset: an annotated dataset of informal arabic with high dialectal content,” 238 | In Proceedings of the Association for Computational Linguistics, Portland, Oregon, USA, 2011. 239 | 240 | Notes and excerpts: 241 | "a small portion of this corpus contains Tunisian dialect text ..." 242 | 243 | 244 | 245 | ### Morphological segmentation & analysis 246 | 247 | - I. Zribi, M. Ellouze, L. Hadrich-Belguith, and P. Blache, “Morphological disambiguation of Tunisian dialect,” 248 | In Journal of King Saud University - Computer and Information Sciences, vol. 29, no. 2, pp. 147 – 155, 2017. 249 | "proposes a method to disambiguate the output of the morphological analyzer of Zribi et al. (2013) by using machine-learning techniques." 250 | 251 | - N. B. M. Karmani, H. Soussou and A. M. Alimi, "Intelligent Tunisian Arabic morphological analyzer," 252 | In IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Agadir, 2016, pp. 1-8, 2016. 253 | URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7945666&isnumber=7945610 254 | 255 | Keywords: expert systems;information retrieval;Internet;natural language processing;intelligent Tunisian Arabic morphological analyzer;Internet content;financial environments;social environments;political environments;economic environments;Web 2.0 monitoring;natural language processing tools;Tunisian Arabic processing tools;Tunisian Internet users;words morphemes;grammatical labels;expert system;aebWordNet;Tunisian Arabic lexical dictionary;morphemes decomposition;morphemes labeling;Morphology;Dictionaries;Web 2.0;Expert systems;Labeling;Buildings;Arabic dialect;Tunisian Arabic;Natural language processing;Morphological analyzer;Tokenizer;Artificial Intelligence;Expert system;aebWordNet;Tunisian Arabic lexical dictionary, 256 | 257 | 258 | - A. Abdelali, K. Darwish, N. Durrani, and H. Mubarak, "A Fast and Furious Segmenter for Arabic", 259 | In Proceedings of NAACL-HLT 2016 (Demonstrations), pages 11–16, 2016. 260 | 261 | - S. Khalifa, N. Zalmout and N. Habash, "YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer", 262 | In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, pp. 223–227, 2016. 263 | 264 | - A. Hamdi, A. Nasr, N. Habash, and N. Gala, “POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools,” 265 | In Workshop on Arabic Natural Language Processing, Beijing, China, pp. 59 – 68, 2015. 266 | 267 | Notes and excerpts: 268 | "exploits the closeness between standard Arabic and Tunisian dialect to develop a POS tagger by converting a Tunisian 269 | sentence to MSA lattice, followed by a disambiguation step; a MSA target sentence is then produced and tagged simply with a MSA tagger." 270 | 271 | - R. Boujelbane, M. Mallek, M. Ellouze, and L. Hadrich-Belguith, “Fine grained POS tagging of spoken Tunisian dialect corpora,” 272 | In International Conference on Applications of Natural Language to Data Bases/Information Systems. Springer, pp. 59–62, 2014. 273 | 274 | Notes and excerpts: 275 | "uses the lexicon of Zribi et al (2013) to convert a standard Arabic corpus for creating a large Tunisian dialect corpus in order to train a POS tagger." 276 | 277 | - I. Zribi, M. E. Khemakhem, and L. Hadrich-Belguith, “Morphological analysis of Tunisian dialect,” 278 | In International Joint Conference on Natural Language Processing, pp. 992–996, 2013. 279 | 280 | Notes and excerpts: 281 | "proposes a morphological analyzer for the Tunisian dialect based on a MSA morphological analyzer, 282 | as well as a lexicon for the Tunisian dialect as an expansion of an exisiting MSA lexicon." 283 | 284 | - K. McNeil, Tunisian Arabic Morphological Parser, Working Paper (?), 2012 285 | 286 | 287 | ### Language identification for Arabic language and its dialects 288 | 289 | #### Dictionary-based 290 | 291 | - H. Saadane, H. Seffih, C. Fluhr, K. Choukri, and N. Semmar, "Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach", 292 | In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2018. 293 | 294 | 295 | #### Word-level 296 | 297 | - Heba Elfardy and Mona Diab, "Sentence Level Dialect Identification in Arabic", 298 | In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 456–461, 2013 299 | 300 | - Ben King and Steven Abney, "Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods", 301 | In Proceedings of NAACL-HLT 2013, pages 1110–1119, 2013. 302 | 303 | - Heba ElFardy and Mona Diab, "Token Level Identification of Linguistic Code Switching" 304 | In Proceedings of COLING 2012: Posters, pages 287–296, 2012. 305 | 306 | 307 | #### Sentence-level 308 | 309 | - Ossama Obeid, Mohammad Salameh, Houda Bouamor, Nizar Habash, "ADIDA: Automatic Dialect Identification for Arabic", 310 | In Proceedings of NAACL-HLT 2019: Demonstrations, pages 6–11, 2019 311 | 312 | - Mohammad Salameh, Houda Bouamor, Nizar Habash, "Fine-Grained Arabic Dialect Identification", 313 | In Proceedings of the 27th International Conference on Computational Linguistics, pages 1332–1344, 2018 314 | 315 | - Samantha Wray, "Classification of Closely Related Sub-dialects of Arabic Using Support-Vector Machines", 316 | In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2018 317 | 318 | - Fei Huang, "Improved Arabic Dialect Classification with Social Media Data", in EMNLP, 2015 319 | 320 | Notes and excerpts: 321 | "focuses on the classification of MSA (msa) and 3 Arabic dialects: Egyptian (egy), Gulf (gul) and Levantine (lev). See p.4 for more details ..." 322 | "uses semi-supervised learning as well ..." 323 | 324 | - Kareem Darwish, Hassan Sajjad, Hamdy Mubara, "Verifiably Effective Arabic Dialect Identification", 2014 325 | 326 | Notes and excerpts: 327 | "classifies Egyptian dialect (ARZ) vs. MSA, using random forests, and using lexical, morphological, and phonological features as classification features." 328 | "We show that effective dialect identification requires that we account for the distinguishing lexical, morphological, and phonological phenomena of dialects." 329 | "There seems to be a necessity to identify lexical and linguistic features that discriminate between MSA and different dialects. In this paper, we highlight some such features that help in separating between MSA and ARZ." 330 | "We identify common ARZ words that do not overlap with MSA and identify specific linguistic phenomena that exist in ARZ, and not MSA, such as morphological patterns, word concatenations, and verb negation constructs." 331 | 332 | - Christoph Tillmann, Yaser Al-Onaizan, Saab Mansour, "Improved Sentence-Level Arabic Dialect Classification", 333 | In Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, pages 110–119, 2014. 334 | "Extends work of Zaidan and Callison-Burch (2014) and Elfardy and Diab (2013)" 335 | 336 | - Omar F. Zaidan and Chris Callison-Burch, "Arabic Dialect Identification", Journal of Computational Linguistics, 2014. 337 | 338 | 339 | ### Orthography (standard rules of writing) 340 | 341 | - I. Zribi, R. Boujelbane, A. Masmoudi, M. Ellouze, L. H. Belguith, and N. Habash, “A conventional orthography for Tunisian Arabic.” 342 | In Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 2355–2361, 2014. 343 | 344 | Notes and excerpts: 345 | "Adapts the CODA map (Conventional Orthography for Dialectal Arabic) to the Tunisian dialect." 346 | 347 | - I. Zribi, M. Graja, M. E. Khmekhem, M. Jaoua, and L. H. Belguith, “Orthographic transcription for spoken Tunisian Arabic,” in 348 | In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, pp. 153–163, 2013. 349 | "Presented orthography guidelines for transcribing Tunisian speech corpora based on the standard Arabic transcription conventions." 350 | 351 | - N. Habash, M. T. Diab, and O. Rambow, “Conventional orthography for dialectal Arabic.” in Proceedings of the International 352 | In Conference on Language Resources and Evaluation (LREC), pp. 711–718, 2012. 353 | 354 | 355 | ### Sentiment analysis 356 | 357 | - S. Medhaffar, F. Bougares, Y. Esteve, and L. Hadrich-Belguith, “Sentiment analysis of Tunisian dialects: Linguistic ressources and experiments,” pp. 55–61, 2017 358 | 359 | 360 | ### Translation approach for handling dialectal Arabic 361 | 362 | - F. Sadat, F. Mallek, M. Boudabous, R. Sellami, and A. Farzindar, “Collaboratively constructed linguistic resources for language variants and their exploitation in nlp application, the case of Tunisian Arabic and the social media,” 363 | In Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing, pp. 102–110, 2014. 364 | 365 | 366 | Notes and excerpts: 367 | "translate Tunisian dialect text of social media into MSA by using a bilingual lexicon and a set of grammatical mapping rules and a disambiguation step." 368 | 369 | 370 | ### Transliterated forms of Arabic dialects; romanized Arabic 371 | 372 | - J. Younes and E. Souissi, H. Achour, and A. Ferchichi, "A Sequence-to-Sequence based Approach For the double Transliteration of Tunisian Dialect", 373 | In The 4th International Conference on Arabic Computational Linguistics (ACLing 2018), 2018. 374 | 375 | - J. Younes and E. Souissi, "A quantitative view of Tunisian dialect electronic writing", 2015. 376 | 377 | Index terms: dialect, Tunisian, corpus, language, electronic writing, translation, normalization 378 | "This paper focuses specifically on electronic writing with Latin letters in Tunisian dialect. 379 | We describe the methodology used for the construction of a dialectal corpus, 380 | present the characteristics of this new form of writing and detail its peculiarities with numbers. 381 | The built corpus, consisting of 43222 messages." 382 | 383 | 384 | ### Transcription of dialectal speech 385 | 386 | - I. Zribi, M. Ellouze, L. H. Belguith, and P. Blache, “Spoken Tunisian Arabic corpus” STAC”: Transcription and annotation.” 387 | In Research in Computing Science, vol. 90, pp. 123–135, 2015. 388 | 389 | Notes and excerpts: 390 | "transcribed 5 hours of spontaneous Tunisian Arabic speech enriched with morpho-syntactic and disfluencies annotations." 391 | 392 | 393 | 394 | ## Books 395 | 396 | - N. Habash, "Introduction to Arabic Natural Language Processing", https://www.morganclaypool.com/doi/abs/10.2200/s00277ed1v01y201008hlt010, 2010. 397 | 398 | Notes and excerpts: 399 | "This book provides system developers and researchers in natural language processing and computational linguistics with the necessary 400 | background information for working with the Arabic language." 401 | 402 | 403 | 404 | 405 | ## Web Articles & Links 406 | 407 | - Derja: Tunisian Association of the Tunisian Dialect, http://www.bettounsi.com/index.html 408 | 409 | - Tunisian Arabic, Wikipedia entry, https://en.wikipedia.org/wiki/Tunisian_Arabic 410 | 411 | - Tunisian Arabic, by Turki, H., Zribi, R., Gibson, M., & Adel, E. , Wikimedia Foundation, https://www.academia.edu/28846187/Tunisian_Arabic, 2015. 412 | 413 | - Tunisian Arabic: A Wonderful Mosaic of Dialects, by Lilia Khachrou, https://lingualism.com/arabic/tunisian-arabic/tunisian-arabic-wonderful-mosaic-dialects/ 414 | 415 | 416 | ## Academic Research Groups & Labs 417 | 418 | - ANLP Research group, MIRACL, University of Sfax, Tunisia. 419 | 420 | - REGIM Lab, National Engineering School of Sfax (ENIS), University of Sfax, Tunisia. http://www.regim.org/ 421 | 422 | - MADAR Project, NYU-AD, CMU-Q https://camel.abudhabi.nyu.edu/madar/ 423 | "MADAR (Multi-Arabic Dialect Applications and Resources) is a three-year joint project among the NLP Group 424 | at Carnegie Mellon University in Qatar (CMU-Q), the Computational Approaches to Modeling Language (CAMEL) Lab 425 | at New York University Abu Dhabi (NYUAD), and Columbia University. 426 | The project also involves collaborators from the University of Bahrain (UoB). " 427 | 428 | 429 | 430 | ## Conferences & Workshops 431 | 432 | **TO DO ...** 433 | 434 | --------------------------------------------------------------------------------