└── README.md /README.md: -------------------------------------------------------------------------------- 1 | Ready to use gensim Word2Vec embedding models for Spanish language. Models are created using a window of +/- 5 words, discarding those words with less than 5 instances and creating a vector of 400 dimensions for each word. The text used to create the embeddings has been recovered from news, Wikipedia, the Spanish BOE, web crawling and open literary sources. The used text has a total of 3.257.329.900 words and 18.852.481.207 characters. 2 | 3 | The models are shared at Zenodo: https://zenodo.org/record/1410403 4 | 5 | We support two types of models: Gensim full models (complete_model.zip) and KeyedVectors (keyed_vectors.zip). You can check the differences between them in the following URL: https://radimrehurek.com/gensim/models/keyedvectors.html 6 | 7 | To load the full model use: 8 | ``` 9 | model = Word2Vec.load("complete.model") 10 | ``` 11 | 12 | To load the KeyedVectors use: 13 | ``` 14 | word_vectors = KeyedVectors.load('complete.kv', mmap='r') 15 | ``` 16 | 17 | If you use our models in you programs or research, please use the following citation: 18 | ``` 19 | Aitor Almeida, & Aritz Bilbao. (2018). Spanish 3B words Word2Vec Embeddings (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1410403 20 | 21 | Bilbao-Jayo, A., & Almeida, A. (2018). Automatic political discourse analysis with multi-scale convolutional neural networks and contextual data. International Journal of Distributed Sensor Networks, 14(11), 1550147718811827. 22 | ``` 23 | 24 | ## Other datasets 25 | 26 | Take a look at our other datasets: 27 | * City4Age Behaviour dataset: https://zenodo.org/record/2602652#.XJtz26SkGUl 28 | * Spanish 3B words Word2Vec Embeddings: https://github.com/aitoralmeida/spanish_word2vec 29 | * Political party and candidate tweets for the campaign period of the 2016 Spanish general election: https://github.com/aitoralmeida/spanish_general_election_2016 30 | * Political party and candidate tweets for the campaign period of the 2015 Spanish general election: https://github.com/aitoralmeida/spanish_general_election_2015 31 | * Tweets for the campaign period of the 2014 European Parliament election: https://github.com/aitoralmeida/european_parliament_election_2014 32 | 33 | --------------------------------------------------------------------------------