├── danish_itos.pkl
├── finnish_itos.pkl
├── norwegian_itos.pkl
├── .gitattributes
├── danish_enc.h5
├── danish_enc.pth
├── finnish_enc.h5
├── finnish_enc.pth
├── norwegian_enc.h5
├── norwegian_enc.pth
└── README.md


/danish_itos.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mollerhoj/Scandinavian-ULMFiT/HEAD/danish_itos.pkl


--------------------------------------------------------------------------------
/finnish_itos.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mollerhoj/Scandinavian-ULMFiT/HEAD/finnish_itos.pkl


--------------------------------------------------------------------------------
/norwegian_itos.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mollerhoj/Scandinavian-ULMFiT/HEAD/norwegian_itos.pkl


--------------------------------------------------------------------------------
/.gitattributes:
--------------------------------------------------------------------------------
1 | *.h5 filter=lfs diff=lfs merge=lfs -text
2 | *.pth filter=lfs diff=lfs merge=lfs -text
3 | 


--------------------------------------------------------------------------------
/danish_enc.h5:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:f2a7820b7ed76d2e0346eb6ed0e5fe04897e610a8f3b3bed7d3a25d0087249ec
3 | size 128851518
4 | 


--------------------------------------------------------------------------------
/danish_enc.pth:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:181ba8e616d7e0326a879b6a9cd00a25e4b4a68cd78f7592928a0c0d652aaf7f
3 | size 128851663
4 | 


--------------------------------------------------------------------------------
/finnish_enc.h5:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:abdf858f780a3f34ee244a21432fd085146786fdfd4738141407166be3a06e76
3 | size 128852062
4 | 


--------------------------------------------------------------------------------
/finnish_enc.pth:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:d68a6383c1646b0e6a29aa9df966366f3acce7c2a1f0ed390f921ef917db5a26
3 | size 128851761
4 | 


--------------------------------------------------------------------------------
/norwegian_enc.h5:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:5fb3ed4f2fc56d0c987af9c6a6d21073afaa7dd79ba6ec3ebb73bd8c4aaddd6f
3 | size 128852067
4 | 


--------------------------------------------------------------------------------
/norwegian_enc.pth:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:21d838d8d8a6492a7429203010b99cd7262171baee279fb3be01935e307ed942
3 | size 128851761
4 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | We're in the process of releasing BERT models as well. Get the first one here: https://github.com/mollerhoj/danish_bert
 2 | 
 3 | # Scandinavian ULMFiT
 4 | 
 5 | Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.
 6 | 
 7 | This repository contains the weights for the embedding layer of a UMLFiT language model that can be used as the first step in fine-tuning any Natural Language Processing task.
 8 | 
 9 | The weights were trained on 90% of all text in the corresponding language wikipedia as per 3. July 2018. The remaining 10% was used for validation.
10 | 
11 | # Supported Languages:
12 | 
13 | - Danish
14 | 
15 | Trained on 78,373,122 tokens, and validated on 7,837,310 tokens. We achieve a perplexity of 30.9.
16 | Download files: [Link](https://www.dropbox.com/s/mipfzhj71ecptbd/danish.zip?dl=0)
17 | 
18 | - Norwegian
19 | 
20 | Trained on 80,284,231 tokens, and validated on 8,920,387 tokens. We achieve a perplexity of 26.31.
21 | Download files: [Link](https://www.dropbox.com/s/lwr5kvbxri1gvv9/norwegian.zip?dl=0)
22 | 
23 | - Finnish
24 | 
25 | Trained on 68,775,370 tokens, and validated on 7,641,571 tokens. We achieve a perplexity of 27.66
26 | 
27 | Training even higher performance models is possible, but require more (costly) training time. If you need a model with higher performance, feel free to contact us.
28 | Download files: [Link](https://www.dropbox.com/s/3wl620c603ewvgo/finnish.zip?dl=0)
29 | 
30 | Our servers crashed when training the Swedish model, but if you're in need of it, contact us and we can train it for you.
31 | 
32 | ### Paper
33 | 
34 | See Universal Language Model Fine-tuning for Text Classification, Jeremy Howard, Sebastian Ruder, https://arxiv.org/abs/1801.06146
35 | 
36 | ### File descriptions
37 | 
38 | - enc.h5  Contains the weights in 'Hierarchical Data Format'
39 | 
40 | - enc.pth  Contains the weights in 'Pytorch model format'
41 | 
42 | - itos.pkl (Integers to Strings) contains the vocabulary mapping from ids (0 - 30000) to strings
43 | 
44 | ### Sponsor
45 | 
46 | This work was sponsored by Danish chatbot company BotXO
47 | http://www.botxo.co/
48 | 
49 | ### Thanks 
50 | 
51 | Thanks to Tobias Lindberg from Damvad Analytics for converting the vectors to pth-format.
52 | 


--------------------------------------------------------------------------------