└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # awesome-tibetan-nlp ![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) 2 | 3 | > Tibetan NLP projects and resources 4 | 5 | ## Contents 6 | - [Datasets](#Datasets) 7 | - [OCR](#OCR) 8 | - [Speech recognition](#Speech-recognition) 9 | - [Machine Translation](#Machine-Translation) 10 | - [Cleanup](#Cleanup) 11 | - [Word segmentation](#Word-segmentation) 12 | - [Sentence boundary disambiguation](#Sentence-boundary-disambiguation) 13 | - [Lemmatization](#Lemmatization) 14 | - [Word sense disambiguation](#Word-sense-disambiguation) 15 | - [POS tagging](#POS-tagging) 16 | - [Dependency parsing](#Dependency-parsing) 17 | - [Coreference_resolution](#Coreference-resolution) 18 | - [Spellchecking](#Spellchecking) 19 | - [NER - Named Entity Recognition](#Named-Entity-Recognition) 20 | - [IR - Information Retrival](#Information-Retrival) 21 | - [Text summarization](#Text-summarization) 22 | - [Summarization](#Summarization) 23 | - [Text similarity](#Text-similarity) 24 | - [Community](#community) 25 | - [Resources](#resources) 26 | 27 | ## Datasets 28 | 29 | ### Corpora 30 | - [Tibetan Corpora on Zenodo](https://zenodo.org/search?page=1&size=20&q=keywords:%22Tibetan%20language%22&type=dataset&keywords=tibetan) 31 | - [Tibetan Corpus on Sketchengine](https://www.sketchengine.eu/tibetan-corpus/) 32 | 33 | ### Models 34 | - [Fastext Vector](https://fasttext.cc/docs/en/crawl-vectors.html) 35 | 36 | ## OCR 37 | - [Namsel: An Optical Character Recognition System for Tibetan Text](https://escholarship.org/uc/item/6d5781k5) 38 | 39 | ## Speech recognition 40 | - [An Improved Tibetan Lhasa Speech Recognition Method Based on Deep Neural Network](https://www.semanticscholar.org/paper/An-Improved-Tibetan-Lhasa-Speech-Recognition-Method-Ruan-Gan/3b8fba018ffd32ec42176e1dbb4784ecb09a6186), 2017 41 | - [Tibetan-Mandarin bilingual speech recognition based on end-to-end framework](https://ieeexplore.ieee.org/abstract/document/8282215), 2017 42 | - [Deep Feature Learning for Tibetan Speech Recognition using Sparse Auto-encoder](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.856.1715&rep=rep1&type=pdf), 2015 43 | 44 | ## Machine Translation 45 | - [Tibetan-Chinese Neural Machine Translation based on Syllable Segmentation](http://www.aclweb.org/anthology/W18-2203) (Compared Syllable Segmentation with Word Segmentation), 2018 46 | 47 | ## Cleanup 48 | 49 | ## Word segmentation 50 | - [An Algorithm Rapidly Segmenting Chinese Sentences into Individual Words](https://www.researchgate.net/publication/331019958_An_Algorithm_Rapidly_Segmenting_Chinese_Sentences_into_Individual_Words), 2019 51 | - [Research and Implementation of Tibetan Word Segmentation Based on Syllable Methods](https://www.researchgate.net/publication/324086486_Research_and_Implementation_of_Tibetan_Word_Segmentation_Based_on_Syllable_Methods), 2018 52 | - [Segmenting and POS tagging Classical Tibetan using a Memory-Based Tagger](https://escholarship.org/uc/item/8b83z79n), 2017 53 | - [Towards describing Tibetan syntax: From word segmentation to rewrite rules through a semi-automated workflow](https://escholarship.org/uc/item/3q29t25v), 2016 54 | 55 | ## Sentence boundary disambiguation 56 | 57 | ## Lemmatization 58 | - [The methods of lemmatization of bound case markers in modern Tibetan](https://ieeexplore.ieee.org/document/1275980) 59 | 60 | ## Word sense disambiguation 61 | 62 | ## POS tagging 63 | - [Segmenting and POS tagging Classical Tibetan using a Memory-Based Tagger](https://escholarship.org/uc/item/8b83z79n), 2017 64 | 65 | ## Dependency parsing 66 | 67 | ## Coreference_resolution 68 | 69 | ## Spellchecking 70 | 71 | ## Named Entity Recognition 72 | 73 | ## Information Retrival 74 | 75 | ## Text summarization 76 | 77 | ## Summarization 78 | 79 | ## Text similarity 80 | 81 | ## Community 82 | 83 | - [TibetTech Slack](https://tibettech.slack.com) 84 | 85 | ## Resources 86 | - [Introduction (to special issue on Tibetan Natural Language Processing)](https://www.researchgate.net/publication/305776014_Introduction_to_special_issue_on_Tibetan_Natural_Language_Processing) 87 | - [Practical Applications for Corpora: The Role of Research-based Linguistics in Literacy & Education for the Tibetan Language](https://escholarship.org/uc/item/4qn512vh) 88 | --------------------------------------------------------------------------------