├── .editorconfig
├── .github
├── FUNDING.yml
└── workflows
│ └── crystal.yml
├── .gitignore
├── .travis.yml
├── LICENSE
├── README.md
├── docs
├── CNAME
├── Cadmium.html
├── Cadmium
│ ├── AggressiveTokenizer.html
│ ├── BayesClassifier.html
│ ├── CaseTokenizer.html
│ ├── Classifiers.html
│ ├── CountInflector.html
│ ├── EdgeWeightedDigraph.html
│ ├── EdgeWeightedDigraph
│ │ ├── Bag.html
│ │ └── DirectedEdge.html
│ ├── Graph.html
│ ├── Graph
│ │ └── ShortestPath.html
│ ├── IntExtension.html
│ ├── JaroWinklerDistance.html
│ ├── LevenshteinDistance.html
│ ├── LogisticRegression.html
│ ├── Metaphone.html
│ ├── NGrams.html
│ ├── Normalizers.html
│ ├── Normalizers
│ │ └── RemoveDiacritics.html
│ ├── NounInflector.html
│ ├── PairDistance.html
│ ├── Phonetics.html
│ ├── PorterStemmer.html
│ ├── PragmaticTokenizer.html
│ ├── PragmaticTokenizer
│ │ ├── Languages.html
│ │ ├── Languages
│ │ │ ├── Common.html
│ │ │ ├── Deutsch.html
│ │ │ └── English.html
│ │ ├── MentionsOptions.html
│ │ ├── NumbersOptions.html
│ │ └── PunctuationOptions.html
│ ├── PresentVerbInflector.html
│ ├── Readability.html
│ ├── RegexTokenizer.html
│ ├── SentenceTokenizer.html
│ ├── Sentiment.html
│ ├── SoundEx.html
│ ├── Stemmer.html
│ ├── Stemmer
│ │ └── Token.html
│ ├── StringExtension.html
│ ├── TenseInflector.html
│ ├── TenseInflector
│ │ └── FormSet.html
│ ├── TfIdf.html
│ ├── TfIdf
│ │ └── Document.html
│ ├── Tokenizer.html
│ ├── Transliterator.html
│ ├── Transliterator
│ │ └── StringExtension.html
│ ├── TreebankWordTokenizer.html
│ ├── Trie.html
│ ├── Util.html
│ ├── Util
│ │ ├── Paragraph.html
│ │ ├── Sentence.html
│ │ ├── StopWords.html
│ │ ├── Syllable.html
│ │ └── Syllable
│ │ │ └── Guess.html
│ ├── VisibleCharTokenizer.html
│ ├── WhitespaceTokenizer.html
│ ├── WordNet.html
│ ├── WordNet
│ │ ├── DB.html
│ │ ├── Lemma.html
│ │ ├── Pointer.html
│ │ └── Synset.html
│ ├── WordPunctuationTokenizer.html
│ └── WordTokenizer.html
├── css
│ └── style.css
├── index.html
├── index.json
├── js
│ └── doc.js
└── search-index.js
├── img
├── cadmium.gvdesign
└── cadmium.png
├── shard.yml
└── spec
└── spec_helper.cr
/.editorconfig:
--------------------------------------------------------------------------------
1 | [*.cr]
2 | charset = utf-8
3 | end_of_line = lf
4 | insert_final_newline = true
5 | indent_style = space
6 | indent_size = 2
7 | trim_trailing_whitespace = true
8 |
--------------------------------------------------------------------------------
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | # These are supported funding model platforms
2 |
3 | github: watzon
4 | patreon: watzon
5 | open_collective: cadmium
6 | ko_fi: # Replace with a single Ko-fi username
7 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
8 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
9 | liberapay: # Replace with a single Liberapay username
10 | issuehunt: # Replace with a single IssueHunt username
11 | otechie: # Replace with a single Otechie username
12 | custom: # Replace with a single custom sponsorship URL
13 |
--------------------------------------------------------------------------------
/.github/workflows/crystal.yml:
--------------------------------------------------------------------------------
1 | name: Crystal CI
2 |
3 | on: [push]
4 |
5 | jobs:
6 | build:
7 |
8 | runs-on: ubuntu-latest
9 |
10 | container:
11 | image: crystallang/crystal
12 |
13 | steps:
14 | - uses: actions/checkout@v1
15 | - name: Install dependencies
16 | run: shards install
17 | - name: Check Formatting
18 | run: crystal tool format --check
19 | - name: Run tests
20 | run: crystal spec
21 | - name: Ameba
22 | run: bin/ameba
23 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | /lib/
2 | /bin/
3 | /.shards/
4 | .gvdesign
5 | .ameba.yml
6 |
7 | # Libraries don't need dependency lock
8 | # Dependencies will be locked in application that uses them
9 | /shard.lock
10 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: crystal
2 |
3 | install:
4 | - shards install
5 |
6 | script:
7 | - crystal spec
8 | - crystal tool format --check
9 | - bin/ameba
10 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2018 Chris Watson
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | **Cadmium** is a *Natural Language Processing* (NLP) library for [Crystal](https://crystal-lang.org/).
4 |
5 | For full API documentation check out [the docs](https://cadmiumcr.github.io/cadmium/).
6 |
7 | For more complete and up to date information about specific parts of Cadmium, check out each relevant shard repository.
8 |
9 |
10 |
11 | | Shard name | Description |
12 | | ------------------------------------------------------------ | ------------------------------------------------------------ |
13 | | [cadmium_tokenizer](https://github.com/cadmiumcr/tokenizer) | Contains several types of string tokenizers |
14 | | [cadmium_stemmer](https://github.com/cadmiumcr/stemmer) | Contains a Porter stemmer, useful to get the stems of english words |
15 | | [cadmium_ngrams](https://github.com/cadmiumcr/ngrams) | Contains methods to obtain unigram, bigrams, trigrams or ngrams from strings |
16 | | [cadmium_classifier](https://github.com/cadmiumcr/classifier) | Contains two probabilistic classifiers used in NLP operations like language detection or POS tagging for example |
17 | | [cadmium_readability](https://github.com/cadmiumcr/readability) | Analyzes blocks of text and determine, using various algorithms, the readability of the text. |
18 | | [cadmium_tfidf](https://github.com/cadmiumcr/tfidf) | Calculates the Term Frequency–Inverse Document Frequency of a corpus |
19 | | [cadmium_glove](https://github.com/cadmiumcr/glove) | Pure Crystal implementation of Global Vectors for Word Representations |
20 | | [cadmium_pos_tagger](https://github.com/cadmiumcr/pos_tagger) | Tags each token of a text with its Part Of Speech category |
21 | | [cadmium_lemmatizer](https://github.com/cadmiumcr/lemmatizer) | Returns the lemma of each given string token |
22 | | [cadmium_summarizer](https://github.com/cadmiumcr/summarizer) | Extracts the most meaningful sentences of a text to create a summary |
23 | | [cadmium_sentiment](https://github.com/cadmiumcr/sentiment) | Evaluates the sentiment of a text |
24 | | [cadmium_distance](https://github.com/cadmiumcr/distance) | Provides two string distance algorithms |
25 | | [cadmium_transliterator](https://github.com/cadmiumcr/transliterator) | Provides the ability to transliterate UTF-8 strings into pure ASCII so that they can be safely displayed in URL slugs or file names. |
26 | | [cadmium_phonetics](https://github.com/cadmiumcr/phonetics) | Allows to match a string with its sound representation |
27 | | [cadmium_inflector](https://github.com/cadmiumcr/inflector) | Allows to inflect english words (nouns, verbs and numbers) |
28 | | [cadmium_graph](https://github.com/cadmiumcr/graph) | EdgeWeightedDigraph represents a digraph, you can add an edge, get the number vertexes, edges, get all edges and use toString to print the Digraph. |
29 | | [cadmium_trie](https://github.com/cadmiumcr/trie) | A [trie](https://en.wikipedia.org/wiki/Trie) is a data structure for efficiently storing and retrieving strings with identical prefixes, like "**mee**t" and "**mee**k". |
30 | | [cadmium_wordnet](https://github.com/cadmiumcr/wordnet) | Pure crystal implementation of Stanford NLPs WordNet |
31 | | [cadmium_util](https://github.com/cadmiumcr/utilities) | A collection of useful utilities used internally in Cadmium. |
32 | | [cadmium_language_detector](https://github.com/cadmiumcr/language_detector) | Returns the most probable language code of the analysed text. |
33 |
34 |
35 |
36 |
37 | ## Installation
38 |
39 | Your project *should* only include the Cadmium shard(s) you need.
40 |
41 | However, in case you want to test out **all of Cadmium** in a simple way, you can install all modules of the project in a few lines.
42 |
43 | Add this to your application's `shard.yml`:
44 |
45 | ```yaml
46 | dependencies:
47 | cadmium:
48 | github: cadmiumcr/cadmium
49 | branch: master
50 | ```
51 |
52 | ## Contributing
53 |
54 | 1. Fork it ( https://github.com/cadmiumcr/cadmium/fork )
55 | 2. Create your feature branch (git checkout -b my-new-feature)
56 | 3. Commit your changes (git commit -am 'Add some feature')
57 | 4. Push to the branch (git push origin my-new-feature)
58 | 5. Create a new Pull Request
59 |
60 | ## Contributors
61 |
62 | This project exists thanks to all the people who contribute.
63 |
--------------------------------------------------------------------------------
/docs/CNAME:
--------------------------------------------------------------------------------
1 | api.cadmiumcr.com
--------------------------------------------------------------------------------
/docs/Cadmium/Classifiers.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |