├── LICENSE
└── README.md
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Web64 AS
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Norwegian NLP Resources
2 |
3 |
4 |
5 |
6 | A work-in-progress list of useful NLP resources for Norwegian.
7 |
8 | Please let us know if there are useful NLP resources we might have missed!
9 |
10 | Contact me at olav@web64.com
11 |
12 | ## Facebook Group
13 | Join our Facebook Group: https://www.facebook.com/groups/nlpnorway/
14 |
15 |
16 | ## Open Source Libraries
17 | Libraries with support for the Norwegian language
18 | * [Polyglot](https://github.com/aboSamoor/polyglot)
19 | * [Textrank](https://github.com/summanlp/textrank)
20 | * [Spacy](https://github.com/explosion/spaCy)
21 |
22 | ## Spacy
23 | * https://spacy.io/models/nb - Official support for Norwegian from Spacy(2.2.0)
24 | * https://github.com/web64/spacy-norwegian - Train norwegian models for Spacy
25 | * https://github.com/jarib/spacy-nb - Scripts to build a Norwegian model for spacy
26 | * https://github.com/ohenrik/nb_news_ud_sm - Experimental Norwegian (Bokmål) language model for Spacy (Including NER)
27 | * https://github.com/ohenrik/nb_dep_ud_sm - Experimental Norwegian (Bokmål) language model for Spacy
28 | * https://github.com/navikt/ai-lab-spacy-bokmaal - Norwegian model for spaCy
29 |
30 | ## BERT
31 | * https://github.com/NBAiLab/notram - NoTraM - Norwegian Transformer Mode
32 | * http://wiki.nlpl.eu/Vectors/norlm/norbert - NorBERT: Bidirectional Encoder Representations from Transformers
33 | * https://github.com/botxo/nordic_bert - Nordic BERT: Norwegian Model: (Trained on 4.5gb text)
34 |
35 | ## NLTK
36 | * [Teaching NLTK Norwegian](https://www.duo.uio.no/bitstream/handle/10852/59276/11/Teaching_NLTK_Norwegian.pdf) - Master thesis by Bo Bjerke (PDF)
37 |
38 | ## Models
39 | * https://github.com/explosion/spacy-models/releases/tag/nb_core_news_sm-2.2.0 - Pretrained statistical models for Norwegian Bokmål
40 | * https://github.com/ljos/navnkjenner - Named-Entity Recognition for Norwegian Bokmål and Nynorsk
41 | * https://github.com/HIT-SCIR/ELMoForManyLangs - Pre-trained ELMo Representations
42 | * https://github.com/ltgoslo/norec-baselines - NoReC baseline models, trained on the NoReC dataset.
43 | * https://github.com/tensorflow/models/blob/master/syntaxnet/g3doc/universal.md - Syntaxnet models
44 | * https://github.com/andrely/Norwegian-NLP-models - 2013
45 | * https://github.com/emanlapponi/norlem-norwegian-lemmatizer - Lemmatizer for Norwegian that uses lexical and contextual information from the Norwegian Dependency Treebank (NDT)
46 | * https://stanfordnlp.github.io/stanfordnlp/installation_download.html#human-languages-supported-by-stanfordnlp - StanfordNLP Pretrained models: Bokmål, Nynorsk, NynorskLIA
47 | * https://github.com/mollerhoj/Scandinavian-ULMFiT - The weights for the embedding layer of a Scandinavian UMLFiT language models
48 |
49 | ## Word Vectors
50 | * http://vectors.nlpl.eu/repository/ - NLPL word embeddings repository
51 | * https://github.com/bheinzerling/bpemb - GloVe word vectors based on Byte-Pair Encoding (BPE)
52 | * https://github.com/Kyubyong/wordvectors - Word2Vec & fastText word vectors for bokmål and nynorsk.
53 | * https://fasttext.cc/docs/en/crawl-vectors.html - fastText word vectors trained on common crawl and wikipedia.
54 |
55 |
56 | ## Norwegian specific libraries
57 | * https://github.com/textlab/mtag - The Oslo-Bergen Multitagger for Norwegian Bokmål and Nynorsk (python)
58 | * https://github.com/ljos/anna_lyse - Language parser for Norwegian Bokmål and Nynorsk
59 | * https://github.com/petterhh/ndt-tools - Norwegian Dependency Treebank(NDT) Tools
60 | * https://github.com/ljos/egennavn - Named-entity chunker for Norwegian
61 | * https://github.com/noklesta/The-Oslo-Bergen-Tagger - The Oslo Bergen Tagger
62 | * https://github.com/draperunner/obt - Python library for The Oslo-Bergen Tagger
63 |
64 | ## Universal Dependencies
65 | * http://universaldependencies.org/ - Bokmål, Nynorsk, NynorskLIA
66 | * [UD_Norwegian-Bokmaal](https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal)
67 | * [UD_Norwegian-Nynorsk](https://github.com/UniversalDependencies/UD_Norwegian-Nynorsk)
68 | * [UD_Norwegian-NynorskLIA](https://github.com/UniversalDependencies/UD_Norwegian-NynorskLIA)
69 | * [Joint UD Parsing of Norwegian Bokmål and Nynorsk](https://github.com/erikve/bm-nn-parsing)
70 |
71 |
72 | ## Data & Corpus
73 | * https://www.nb.no/sprakbanken/repositorium#ticketsfrom?lang=en&query=alle&tokens=&from=1&size=12&collection=sbr (Språkbankens ressurskatalog)
74 | Norwegian N-grams, lexicons, news corpus.
75 | * https://github.com/ltgoslo/norec - NoReC: The Norwegian Review Corpu
76 | * https://github.com/ltgoslo/talk-of-norway - Talk of Norway (ToN) dataset, a collection of Norwegian parliament speeches from 1998 to 2016
77 | * https://github.com/stopwords-iso/stopwords-no - Norwegian stopwords in JSON or txt format
78 | * https://github.com/ltgoslo/norne - NORwegian Named Entities
79 | * https://www.sketchengine.eu/notenten-norwegian-corpus/ - noTenTen: Corpus of the Norwegian Web
80 | * https://github.com/unhammer/fugeord - Fugeord
81 |
82 | ## Sentiment Analysis for Norwegian Text
83 | * https://www.usit.uio.no/om/organisasjon/itf/ds/faglig/seminarer/spraak-teknologi-betydning/sant.pdf (PDF) SANT: Sentiment Analysis for Norwegian Text
84 | * http://www.mn.uio.no/ifi/english/research/projects/sant/index.html
85 | * https://github.com/ltgoslo/norsentlex - NorSentLex: Norwegian sentiment lexicon of positive and negative words
86 | * https://github.com/olavski/afinn/blob/master/afinn/data/AFINN-no-165.txt - Work-in-progress AFINN Norwegian sentiment lexicon
87 | * https://github.com/web64/norec-fasttext - Train NoReC FastText Sentiment Analysis models
88 |
89 | ## Machine Translation
90 | * https://github.com/UKPLab/EasyNMT
91 | * https://github.com/Animenosekai/translate
92 |
93 | **Apertium**
94 |
95 | Main library: https://github.com/apertium/apertium-python
96 |
97 | Language model:
98 |
99 | * https://github.com/apertium/apertium-nno-nob
100 | * https://github.com/apertium/apertium-nno
101 | * https://github.com/apertium/apertium-nob
102 |
103 | ## English-Norwegian parallel corpus
104 | * http://data.europa.eu/euodp/en/data/dataset/elrc_1061
105 |
106 | ## Commercial APIs
107 | * [repustate.com](https://www.repustate.com/norwegian-sentiment-analysis/)
108 | Norwegian Sentiment Analysis
109 | * [orbit.ai](http://orbit.ai)
110 | Text generation, Entity Extraction
111 | * [tagbox.ai](http://tagbox.ai)
112 | Automated geotagging
113 | * [lexalytics.com](https://www.lexalytics.com/)
114 | Sentiment analysis
115 | * [monkeylearn.com](http://monkeylearn.com/)
116 | Text Classification
117 | * [tisane.ai](http://tisane.ai/)
118 | Sentiment analysis & topics detection
119 | * [fairhair.ai](https://fairhair.ai/)
120 | Web data & information extraction
121 | * [textoptimizer.com](https://textoptimizer.com/m)
122 | User intent and topic extraction
123 |
124 | ## Dictionaries
125 | * [LibreOffice - no](https://github.com/LibreOffice/dictionaries/tree/master/no)
126 | * [dictionary-nb](https://github.com/wooorm/dictionaries/tree/master/dictionaries/nb) Norwegian Bokmål spelling dictionary in UTF-8.
127 | * [dictionary-nn](https://github.com/wooorm/dictionaries/tree/master/dictionaries/nn) Norwegian Nynorsk spelling dictionary in UTF-8..
128 |
129 | ## Papers
130 | * [An automatic analysis of Norwegian compounds](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.7469&rep=rep1&type=pdf)
131 | * [Evaluating Semantic Vectors for Norwegian](https://www.duo.uio.no/handle/10852/61756)
132 | * [Joint UD Parsing of Norwegian Bokmål and Nynorsk](http://www.ep.liu.se/ecp/131/001/ecp17131001.pdf)
133 |
134 | ## Related Resources
135 | * [Saami language technology](https://giellatekno.uit.no/)
136 | * [Translation Memory](https://giellalt.uit.no/tm/TranslationMemory.html)
137 | * [DaNLP](https://github.com/alexandrainst/danlp) Repository for NLP resources for the Danish Language
138 | * [GitHub Topic: norwegian](https://github.com/topics/norwegian)
139 | * [GitHub Topic: norsk](https://github.com/topics/norsk)
140 | * [GitHub Topic: nynorsk](https://github.com/topics/nynorsk)
141 | * [GitHub Topic: bokmal](https://github.com/topics/bokmal)
142 |
143 |
144 |
147 |
148 |