├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Web64 AS 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Norwegian NLP Resources 2 | 3 | 4 | 5 | 6 | A work-in-progress list of useful NLP resources for Norwegian. 7 | 8 | Please let us know if there are useful NLP resources we might have missed! 9 | 10 | Contact me at olav@web64.com 11 | 12 | ## Facebook Group 13 | Join our Facebook Group: https://www.facebook.com/groups/nlpnorway/ 14 | 15 | 16 | ## Open Source Libraries 17 | Libraries with support for the Norwegian language 18 | * [Polyglot](https://github.com/aboSamoor/polyglot) 19 | * [Textrank](https://github.com/summanlp/textrank) 20 | * [Spacy](https://github.com/explosion/spaCy) 21 | 22 | ## Spacy 23 | * https://spacy.io/models/nb - Official support for Norwegian from Spacy(2.2.0) 24 | * https://github.com/web64/spacy-norwegian - Train norwegian models for Spacy 25 | * https://github.com/jarib/spacy-nb - Scripts to build a Norwegian model for spacy 26 | * https://github.com/ohenrik/nb_news_ud_sm - Experimental Norwegian (Bokmål) language model for Spacy (Including NER) 27 | * https://github.com/ohenrik/nb_dep_ud_sm - Experimental Norwegian (Bokmål) language model for Spacy 28 | * https://github.com/navikt/ai-lab-spacy-bokmaal - Norwegian model for spaCy 29 | 30 | ## BERT 31 | * https://github.com/NBAiLab/notram - NoTraM - Norwegian Transformer Mode 32 | * http://wiki.nlpl.eu/Vectors/norlm/norbert - NorBERT: Bidirectional Encoder Representations from Transformers 33 | * https://github.com/botxo/nordic_bert - Nordic BERT: Norwegian Model: (Trained on 4.5gb text) 34 | 35 | ## NLTK 36 | * [Teaching NLTK Norwegian](https://www.duo.uio.no/bitstream/handle/10852/59276/11/Teaching_NLTK_Norwegian.pdf) - Master thesis by Bo Bjerke (PDF) 37 | 38 | ## Models 39 | * https://github.com/explosion/spacy-models/releases/tag/nb_core_news_sm-2.2.0 - Pretrained statistical models for Norwegian Bokmål 40 | * https://github.com/ljos/navnkjenner - Named-Entity Recognition for Norwegian Bokmål and Nynorsk 41 | * https://github.com/HIT-SCIR/ELMoForManyLangs - Pre-trained ELMo Representations 42 | * https://github.com/ltgoslo/norec-baselines - NoReC baseline models, trained on the NoReC dataset. 43 | * https://github.com/tensorflow/models/blob/master/syntaxnet/g3doc/universal.md - Syntaxnet models 44 | * https://github.com/andrely/Norwegian-NLP-models - 2013 45 | * https://github.com/emanlapponi/norlem-norwegian-lemmatizer - Lemmatizer for Norwegian that uses lexical and contextual information from the Norwegian Dependency Treebank (NDT) 46 | * https://stanfordnlp.github.io/stanfordnlp/installation_download.html#human-languages-supported-by-stanfordnlp - StanfordNLP Pretrained models: Bokmål, Nynorsk, NynorskLIA 47 | * https://github.com/mollerhoj/Scandinavian-ULMFiT - The weights for the embedding layer of a Scandinavian UMLFiT language models 48 | 49 | ## Word Vectors 50 | * http://vectors.nlpl.eu/repository/ - NLPL word embeddings repository 51 | * https://github.com/bheinzerling/bpemb - GloVe word vectors based on Byte-Pair Encoding (BPE) 52 | * https://github.com/Kyubyong/wordvectors - Word2Vec & fastText word vectors for bokmål and nynorsk. 53 | * https://fasttext.cc/docs/en/crawl-vectors.html - fastText word vectors trained on common crawl and wikipedia. 54 | 55 | 56 | ## Norwegian specific libraries 57 | * https://github.com/textlab/mtag - The Oslo-Bergen Multitagger for Norwegian Bokmål and Nynorsk (python) 58 | * https://github.com/ljos/anna_lyse - Language parser for Norwegian Bokmål and Nynorsk 59 | * https://github.com/petterhh/ndt-tools - Norwegian Dependency Treebank(NDT) Tools 60 | * https://github.com/ljos/egennavn - Named-entity chunker for Norwegian 61 | * https://github.com/noklesta/The-Oslo-Bergen-Tagger - The Oslo Bergen Tagger 62 | * https://github.com/draperunner/obt - Python library for The Oslo-Bergen Tagger 63 | 64 | ## Universal Dependencies 65 | * http://universaldependencies.org/ - Bokmål, Nynorsk, NynorskLIA 66 | * [UD_Norwegian-Bokmaal](https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal) 67 | * [UD_Norwegian-Nynorsk](https://github.com/UniversalDependencies/UD_Norwegian-Nynorsk) 68 | * [UD_Norwegian-NynorskLIA](https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA) 69 | * [Joint UD Parsing of Norwegian Bokmål and Nynorsk](https://github.com/erikve/bm-nn-parsing) 70 | 71 | 72 | ## Data & Corpus 73 | * https://www.nb.no/sprakbanken/repositorium#ticketsfrom?lang=en&query=alle&tokens=&from=1&size=12&collection=sbr (Språkbankens ressurskatalog) 74 | Norwegian N-grams, lexicons, news corpus. 75 | * https://github.com/ltgoslo/norec - NoReC: The Norwegian Review Corpu 76 | * https://github.com/ltgoslo/talk-of-norway - Talk of Norway (ToN) dataset, a collection of Norwegian parliament speeches from 1998 to 2016 77 | * https://github.com/stopwords-iso/stopwords-no - Norwegian stopwords in JSON or txt format 78 | * https://github.com/ltgoslo/norne - NORwegian Named Entities 79 | * https://www.sketchengine.eu/notenten-norwegian-corpus/ - noTenTen: Corpus of the Norwegian Web 80 | * https://github.com/unhammer/fugeord - Fugeord 81 | 82 | ## Sentiment Analysis for Norwegian Text 83 | * https://www.usit.uio.no/om/organisasjon/itf/ds/faglig/seminarer/spraak-teknologi-betydning/sant.pdf (PDF) SANT: Sentiment Analysis for Norwegian Text 84 | * http://www.mn.uio.no/ifi/english/research/projects/sant/index.html 85 | * https://github.com/ltgoslo/norsentlex - NorSentLex: Norwegian sentiment lexicon of positive and negative words 86 | * https://github.com/olavski/afinn/blob/master/afinn/data/AFINN-no-165.txt - Work-in-progress AFINN Norwegian sentiment lexicon 87 | * https://github.com/web64/norec-fasttext - Train NoReC FastText Sentiment Analysis models 88 | 89 | ## Machine Translation 90 | * https://github.com/UKPLab/EasyNMT 91 | * https://github.com/Animenosekai/translate 92 | 93 | **Apertium** 94 | 95 | Main library: https://github.com/apertium/apertium-python 96 | 97 | Language model: 98 | 99 | * https://github.com/apertium/apertium-nno-nob 100 | * https://github.com/apertium/apertium-nno 101 | * https://github.com/apertium/apertium-nob 102 | 103 | ## English-Norwegian parallel corpus 104 | * http://data.europa.eu/euodp/en/data/dataset/elrc_1061 105 | 106 | ## Commercial APIs 107 | * [repustate.com](https://www.repustate.com/norwegian-sentiment-analysis/) 108 | Norwegian Sentiment Analysis 109 | * [orbit.ai](http://orbit.ai) 110 | Text generation, Entity Extraction 111 | * [tagbox.ai](http://tagbox.ai) 112 | Automated geotagging 113 | * [lexalytics.com](https://www.lexalytics.com/) 114 | Sentiment analysis 115 | * [monkeylearn.com](http://monkeylearn.com/) 116 | Text Classification 117 | * [tisane.ai](http://tisane.ai/) 118 | Sentiment analysis & topics detection 119 | * [fairhair.ai](https://fairhair.ai/) 120 | Web data & information extraction 121 | * [textoptimizer.com](https://textoptimizer.com/m) 122 | User intent and topic extraction 123 | 124 | ## Dictionaries 125 | * [LibreOffice - no](https://github.com/LibreOffice/dictionaries/tree/master/no) 126 | * [dictionary-nb](https://github.com/wooorm/dictionaries/tree/master/dictionaries/nb) Norwegian Bokmål spelling dictionary in UTF-8. 127 | * [dictionary-nn](https://github.com/wooorm/dictionaries/tree/master/dictionaries/nn) Norwegian Nynorsk spelling dictionary in UTF-8.. 128 | 129 | ## Papers 130 | * [An automatic analysis of Norwegian compounds](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.7469&rep=rep1&type=pdf) 131 | * [Evaluating Semantic Vectors for Norwegian](https://www.duo.uio.no/handle/10852/61756) 132 | * [Joint UD Parsing of Norwegian Bokmål and Nynorsk](http://www.ep.liu.se/ecp/131/001/ecp17131001.pdf) 133 | 134 | ## Related Resources 135 | * [Saami language technology](https://giellatekno.uit.no/) 136 | * [Translation Memory](https://giellalt.uit.no/tm/TranslationMemory.html) 137 | * [DaNLP](https://github.com/alexandrainst/danlp) Repository for NLP resources for the Danish Language 138 | * [GitHub Topic: norwegian](https://github.com/topics/norwegian) 139 | * [GitHub Topic: norsk](https://github.com/topics/norsk) 140 | * [GitHub Topic: nynorsk](https://github.com/topics/nynorsk) 141 | * [GitHub Topic: bokmal](https://github.com/topics/bokmal) 142 | 143 | 144 |
145 | Join our Facebook Group here https://www.facebook.com/groups/nlpnorway/ 146 |

147 | 148 |

149 | --------------------------------------------------------------------------------