├── .gitignore ├── requirements.txt ├── Dockerfile ├── data ├── online-job-posts.csv ├── online-job-posts.zip └── sample_screenshot.jpg ├── models ├── job_description_bow_corpus.pkl ├── job_description_dictionary.gensim ├── job_description_tfidf_corpus.pkl ├── job_requirement_bow_corpus.pkl ├── job_requirement_dictionary.gensim ├── job_requirement_tfidf_corpus.pkl ├── job_description_bow_lda_model.gensim ├── job_description_bow_lda_model.gensim.id2word ├── job_description_bow_lda_model.gensim.state ├── job_description_tfidf_lda_model.gensim ├── job_description_tfidf_lda_model.gensim.state ├── job_requirement_bow_lda_model.gensim ├── job_requirement_bow_lda_model.gensim.id2word ├── job_requirement_bow_lda_model.gensim.state ├── job_requirement_tfidf_lda_model.gensim ├── job_requirement_tfidf_lda_model.gensim.state ├── required_qualification_bow_corpus.pkl ├── job_description_tfidf_lda_model.gensim.id2word ├── job_requirement_tfidf_lda_model.gensim.id2word ├── required_qualification_bow_lda_model.gensim ├── required_qualification_dictionary.gensim ├── required_qualification_tfidf_corpus.pkl ├── required_qualification_bow_lda_model.gensim.state ├── required_qualification_tfidf_lda_model.gensim ├── job_description_bow_lda_model.gensim.expElogbeta.npy ├── job_requirement_bow_lda_model.gensim.expElogbeta.npy ├── required_qualification_bow_lda_model.gensim.id2word ├── required_qualification_tfidf_lda_model.gensim.state ├── job_description_tfidf_lda_model.gensim.expElogbeta.npy ├── job_requirement_tfidf_lda_model.gensim.expElogbeta.npy ├── required_qualification_tfidf_lda_model.gensim.id2word ├── required_qualification_bow_lda_model.gensim.expElogbeta.npy └── required_qualification_tfidf_lda_model.gensim.expElogbeta.npy ├── output ├── sample-static-output-visualization.JPG └── notebook-cells-output.txt ├── README.md ├── LICENSE ├── .gitattributes └── src └── jobs_indicator_web_services.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Directories to be excluded 2 | src/__pycache__/ 3 | 4 | # Files to be excluded -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | flask 2 | spacy 3 | nltk 4 | random 5 | pickle 6 | gensim 7 | numpy 8 | pandas -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.6.7 2 | WORKDIR /app 3 | RUN pip install -r requirements.txt 4 | ENTRYPOINT ["python"] 5 | CMD ["src/jobs_indicator_web_services.py"] -------------------------------------------------------------------------------- /data/online-job-posts.csv: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:70706c395ecd8b67d158c86e242abde79828bd2481e56a4a071b2a302d716cd5 3 | size 95435519 4 | -------------------------------------------------------------------------------- /data/online-job-posts.zip: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:cb779d6cf4e312bd79e047626d17ca165559730495f0263280c16e8158d8f34e 3 | size 13254376 4 | -------------------------------------------------------------------------------- /data/sample_screenshot.jpg: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:4bab8b510d70b5054a5f362628362decc0a234964793b7e6be8b2b08e2340ee3 3 | size 819666 4 | -------------------------------------------------------------------------------- /models/job_description_bow_corpus.pkl: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:c48afa3630e670a249618723045c04091e6113786cab822a94c039aecc753f75 3 | size 2316149 4 | -------------------------------------------------------------------------------- /models/job_description_dictionary.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:3713ca1ae87e44a0973ffece98c56c42b532d31da8840ae91a2a5d54ece84a15 3 | size 248274 4 | -------------------------------------------------------------------------------- /models/job_description_tfidf_corpus.pkl: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:59b897fe4bf89862ceea667c72b6475cc8e5e6b8766d133da0be6a6c33b05426 3 | size 2607134 4 | -------------------------------------------------------------------------------- /models/job_requirement_bow_corpus.pkl: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:a8e334a93ceb476cec536bab5a122a4485b2f669013db8ab10a0d3211506bd7e 3 | size 5715308 4 | -------------------------------------------------------------------------------- /models/job_requirement_dictionary.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:ef7d9bc1d9f8fe1a7121e26c0e5e712aa5584b53111f531429bd02870762828e 3 | size 329148 4 | -------------------------------------------------------------------------------- /models/job_requirement_tfidf_corpus.pkl: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:53889cdf929f0874bc583a8006e474c486128718d349d1542d163ede96bb20a0 3 | size 6098902 4 | -------------------------------------------------------------------------------- /models/job_description_bow_lda_model.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:c8adf0b2bfde815c22d99b30bf13af67a1c7ff05b52995ac186df9a57cef5663 3 | size 49736 4 | -------------------------------------------------------------------------------- /models/job_description_bow_lda_model.gensim.id2word: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/job_description_bow_lda_model.gensim.id2word -------------------------------------------------------------------------------- /models/job_description_bow_lda_model.gensim.state: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/job_description_bow_lda_model.gensim.state -------------------------------------------------------------------------------- /models/job_description_tfidf_lda_model.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:3713ca1ae87e44a0973ffece98c56c42b532d31da8840ae91a2a5d54ece84a15 3 | size 248274 4 | -------------------------------------------------------------------------------- /models/job_description_tfidf_lda_model.gensim.state: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/job_description_tfidf_lda_model.gensim.state -------------------------------------------------------------------------------- /models/job_requirement_bow_lda_model.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:ef7d9bc1d9f8fe1a7121e26c0e5e712aa5584b53111f531429bd02870762828e 3 | size 329148 4 | -------------------------------------------------------------------------------- /models/job_requirement_bow_lda_model.gensim.id2word: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/job_requirement_bow_lda_model.gensim.id2word -------------------------------------------------------------------------------- /models/job_requirement_bow_lda_model.gensim.state: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/job_requirement_bow_lda_model.gensim.state -------------------------------------------------------------------------------- /models/job_requirement_tfidf_lda_model.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:29b99de089d5907cfce8eb8b55e05ecb859b2c5e2a52d93d6d33b224529b88f3 3 | size 63951 4 | -------------------------------------------------------------------------------- /models/job_requirement_tfidf_lda_model.gensim.state: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/job_requirement_tfidf_lda_model.gensim.state -------------------------------------------------------------------------------- /models/required_qualification_bow_corpus.pkl: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:639478db241dbdf80039a2269932cb66fbbf2ee5cf501142bc8e153b41eefeed 3 | size 5714777 4 | -------------------------------------------------------------------------------- /output/sample-static-output-visualization.JPG: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:29d4298dbba9ca7b5002375b9cf708062fc567895d21f4d666f864a6702892bc 3 | size 76812 4 | -------------------------------------------------------------------------------- /models/job_description_tfidf_lda_model.gensim.id2word: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/job_description_tfidf_lda_model.gensim.id2word -------------------------------------------------------------------------------- /models/job_requirement_tfidf_lda_model.gensim.id2word: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/job_requirement_tfidf_lda_model.gensim.id2word -------------------------------------------------------------------------------- /models/required_qualification_bow_lda_model.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:9c90399245ab4eb9abb48e03ab9fecb2d7a878e737d3e12afcf5b97841d518d5 3 | size 260200 4 | -------------------------------------------------------------------------------- /models/required_qualification_dictionary.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:9c90399245ab4eb9abb48e03ab9fecb2d7a878e737d3e12afcf5b97841d518d5 3 | size 260200 4 | -------------------------------------------------------------------------------- /models/required_qualification_tfidf_corpus.pkl: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:b08c7d96af60dfc4862beea34d7f45f024cfacb209c7e12ac13fc444d51eee1e 3 | size 6017765 4 | -------------------------------------------------------------------------------- /models/required_qualification_bow_lda_model.gensim.state: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/required_qualification_bow_lda_model.gensim.state -------------------------------------------------------------------------------- /models/required_qualification_tfidf_lda_model.gensim: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:9c90399245ab4eb9abb48e03ab9fecb2d7a878e737d3e12afcf5b97841d518d5 3 | size 260200 4 | -------------------------------------------------------------------------------- /models/job_description_bow_lda_model.gensim.expElogbeta.npy: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:0e29f2e890f1ae9a0e04f3181366033ee8a742fd960d1dfb70d3c2387792bd7f 3 | size 596448 4 | -------------------------------------------------------------------------------- /models/job_requirement_bow_lda_model.gensim.expElogbeta.npy: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:76d2eef6ee920cd9af3cce30ebcf1d13c98fe638df591041d874d524ce9d867c 3 | size 785808 4 | -------------------------------------------------------------------------------- /models/required_qualification_bow_lda_model.gensim.id2word: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/required_qualification_bow_lda_model.gensim.id2word -------------------------------------------------------------------------------- /models/required_qualification_tfidf_lda_model.gensim.state: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/required_qualification_tfidf_lda_model.gensim.state -------------------------------------------------------------------------------- /models/job_description_tfidf_lda_model.gensim.expElogbeta.npy: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:6e9c86bf9bb06fa3815dce050a0f63b0117389db02d2921356ee9e517dea43f2 3 | size 596448 4 | -------------------------------------------------------------------------------- /models/job_requirement_tfidf_lda_model.gensim.expElogbeta.npy: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:8029b0cff41cec8220a2104d3bd1e8d8ab7142280cf2ca5f8ca996da90002e93 3 | size 785808 4 | -------------------------------------------------------------------------------- /models/required_qualification_tfidf_lda_model.gensim.id2word: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/indranildchandra/JobDescription-Keywords-Extractor/HEAD/models/required_qualification_tfidf_lda_model.gensim.id2word -------------------------------------------------------------------------------- /models/required_qualification_bow_lda_model.gensim.expElogbeta.npy: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:71ff0b7853a48dedc3a654c544348a3a0f89228198d29d533c2d5406515d4703 3 | size 620768 4 | -------------------------------------------------------------------------------- /models/required_qualification_tfidf_lda_model.gensim.expElogbeta.npy: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:ae12b30f53d9f774cc0307d03a390026af7cff383344b41e80510037a83b9760 3 | size 620768 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # JobDescription-Keywords-Extractor 2 | JobDescription-Keywords-Extractor aims to extract important keywords (topics) from any given job description posted online for better search engine optimization (SEO) using Topic Modeling techniques. It can also be used to build a Resume Screening Tool where any given resume would be matched against the topics extracted from JD_Keywords_Extractor. 3 | 4 | ### Dataset: 5 | Dataset used for building this model was obtained from Kaggle @ https://www.kaggle.com/madhab/jobposts 6 | 7 | ### Instruction to containerize the application: 8 | 9 | - Build the image using the following command - 10 | $ docker build -t jobs-indicator-app:latest . 11 | 12 | - Run the Docker container using the following command - 13 | $ docker run -d -p 5000:5000 jobs-indicator-app 14 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Indranil Chandra 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # Handle line endings automatically for files detected as text and leave all files detected as binary untouched. 2 | * text=auto 3 | # These files are text and should be normalized (Convert crlf => lf) 4 | *.md text 5 | *.MD text 6 | *.dat text 7 | *.DAT text 8 | *.txt text 9 | *.TXT text 10 | *.xml text 11 | *.XML text 12 | # Documents 13 | *.doc diff=astextplain 14 | *.DOC diff=astextplain 15 | *.docx diff=astextplain 16 | *.DOCX diff=astextplain 17 | *.dot diff=astextplain 18 | *.DOT diff=astextplain 19 | *.pdf diff=astextplain 20 | *.PDF diff=astextplain 21 | *.rtf diff=astextplain 22 | *.RTF diff=astextplain 23 | # Graphics 24 | *.png binary 25 | *.jpg filter=lfs diff=lfs merge=lfs -text 26 | *.jpeg binary 27 | *.gif binary 28 | *.tif binary 29 | *.tiff binary 30 | *.ico binary 31 | # Media files 32 | *.avi filter=lfs diff=lfs merge=lfs -text 33 | *.mp4 filter=lfs diff=lfs merge=lfs -text 34 | # Source files 35 | *.pxd text 36 | *.py text 37 | *.py3 text 38 | *.pyw text 39 | *.pyx text 40 | *.csv filter=lfs diff=lfs merge=lfs -text 41 | *.zip filter=lfs diff=lfs merge=lfs -text 42 | # Model Files 43 | *.pkl filter=lfs diff=lfs merge=lfs -text 44 | *.gensim filter=lfs diff=lfs merge=lfs -text 45 | *.npy filter=lfs diff=lfs merge=lfs -text 46 | -------------------------------------------------------------------------------- /src/jobs_indicator_web_services.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, jsonify, request 2 | 3 | import spacy 4 | import nltk 5 | import random 6 | import pickle 7 | import gensim 8 | import numpy 9 | import pandas as pd 10 | 11 | from spacy.lang.en import English 12 | from nltk.corpus import wordnet as wn 13 | from nltk.stem.wordnet import WordNetLemmatizer 14 | from gensim import corpora, models 15 | from gensim.models import Phrases 16 | from gensim.models import Word2Vec 17 | 18 | app = Flask(__name__) 19 | parser = spacy.load('en') 20 | 21 | @app.route("/getKeywords", methods=['POST']) 22 | def getKeywords(): 23 | data = request.get_json()['text'] 24 | print("\n------------------------------------------------------------") 25 | print("Input Data: " + data) 26 | print("------------------------------------------------------------\n") 27 | keywords = processKeywords(data) 28 | print("\n------------------------------------------------------------") 29 | print("Output Keywords: ") 30 | print("Noun Phrases: " + str(keywords['noun_phrases'])) 31 | print("BOW Topics: " + str(keywords['bow_topics'])) 32 | print("RF-IDF Topics: " + str(keywords['tfidf_topics'])) 33 | print("------------------------------------------------------------\n") 34 | 35 | return jsonify(keywords) 36 | 37 | def tokenize(text): 38 | lda_tokens = [] 39 | tokens = parser(text) 40 | for token in tokens: 41 | if token.orth_.isspace(): 42 | continue 43 | elif token.is_punct: 44 | continue 45 | elif token.like_url: 46 | lda_tokens.append('URL') 47 | elif token.orth_.startswith('@'): 48 | lda_tokens.append('at') 49 | elif token.pos_ == "ADJ" or token.pos_ == "VERB" or token.pos_ == "RBR" or token.pos_ == "RBS" or token.pos_ == "RB" or token.pos_ == "RP": 50 | continue 51 | else: 52 | lda_tokens.append(token.lower_) 53 | return lda_tokens 54 | 55 | def get_lemma(word): 56 | lemma = wn.morphy(word) 57 | if lemma is None: 58 | return word 59 | else: 60 | return lemma 61 | 62 | def get_lemma2(word): 63 | return WordNetLemmatizer().lemmatize(word) 64 | 65 | def prepare_text(text): 66 | en_stop = set(nltk.corpus.stopwords.words('english')) 67 | tokens = tokenize(text) 68 | tokens = [token for token in tokens if len(token) > 3] 69 | tokens = [token for token in tokens if token not in en_stop] 70 | tokens = [get_lemma(token) for token in tokens] 71 | return tokens 72 | 73 | def processKeywords(new_doc): 74 | keywords = { 75 | "noun_phrases": [], 76 | "bow_topics": [], 77 | "tfidf_topics": [] 78 | } 79 | noun_phrases = [] 80 | bow_topics = [] 81 | tfidf_topics = [] 82 | 83 | job_description_dictionary = gensim.corpora.Dictionary.load('./../models/job_description_dictionary.gensim') 84 | job_description_bow_corpus = pickle.load(open('./../models/job_description_bow_corpus.pkl', 'rb')) 85 | job_description_bow_lda_model = gensim.models.ldamodel.LdaModel.load('./../models/job_description_bow_lda_model.gensim') 86 | job_description_tfidf_corpus = pickle.load(open('./../models/job_description_tfidf_corpus.pkl', 'rb')) 87 | job_description_tfidf_lda_model = gensim.models.ldamodel.LdaModel.load('./../models/job_description_tfidf_lda_model.gensim') 88 | job_description_tfidf_model = models.TfidfModel(job_description_tfidf_corpus) 89 | 90 | document = parser(new_doc) 91 | new_doc = prepare_text(new_doc) 92 | new_doc_bow = job_description_dictionary.doc2bow(new_doc) 93 | print(new_doc_bow) 94 | new_doc_tfidf = job_description_tfidf_model[new_doc_bow] 95 | print(new_doc_tfidf) 96 | 97 | print("Noun Phrases...") 98 | print("--- [Format: Noun Phrase -> Root Text] ---") 99 | for noun_phrase in document.noun_chunks: 100 | phrase_entry = { 101 | 'phrase':"", 102 | 'root_text':"" 103 | } 104 | phrase_entry['phrase'] = noun_phrase.text 105 | phrase_entry['root_text'] = noun_phrase.root.text 106 | noun_phrases.append(phrase_entry) 107 | print(noun_phrase.text + " -> " + noun_phrase.root.text) 108 | 109 | print("\n\nTopics relevant to new document are (BoW): ") 110 | 111 | counter = 0 112 | for index, score in sorted(job_description_bow_lda_model[new_doc_bow], key=lambda tup: tup[1], reverse=True): 113 | topic = { 114 | 'topic_index':"", 115 | 'topic_conf':"", 116 | 'topic': "" 117 | } 118 | if counter == 0: 119 | topic['topic_index'] = str(index) 120 | topic['topic_conf'] = str(score) 121 | topic['topic'] = job_description_bow_lda_model.print_topic(index, 10) 122 | bow_topics.append(topic) 123 | print("Score: {}\t Topic: {}".format(score, job_description_bow_lda_model.print_topic(index, 10))) 124 | highest_score = score 125 | counter = counter + 1 126 | elif highest_score - score <= 0.2: 127 | topic['topic_index'] = str(index) 128 | topic['topic_conf'] = str(score) 129 | topic['topic'] = job_description_bow_lda_model.print_topic(index, 10) 130 | bow_topics.append(topic) 131 | print("Score: {}\t Topic: {}".format(score, job_description_bow_lda_model.print_topic(index, 10))) 132 | counter = counter + 1 133 | else: 134 | break 135 | 136 | 137 | 138 | # counter = 0 139 | # print("\n\nTopics relevant to new document are (TF-IDF): ") 140 | # for index, score in sorted(job_description_tfidf_lda_model[new_doc_tfidf], key=lambda tup: tup[1], reverse=True): 141 | # topic = { 142 | # 'topic_index':"", 143 | # 'topic_conf':"", 144 | # 'topic': "" 145 | # } 146 | # if counter == 0: 147 | # topic['topic_index'] = str(index) 148 | # topic['topic_conf'] = str(score) 149 | # topic['topic'] = job_description_tfidf_lda_model.print_topic(index, 10) 150 | # tfidf_topics.append(topic) 151 | # print("Score: {}\t Topic: {}".format(score, job_description_tfidf_lda_model.print_topic(index, 10))) 152 | # highest_score = score 153 | # counter = counter + 1 154 | # elif highest_score - score <= 0.2: 155 | # topic['topic_index'] = str(index) 156 | # topic['topic_conf'] = str(score) 157 | # topic['topic'] = job_description_tfidf_lda_model.print_topic(index, 10) 158 | # tfidf_topics.append(topic) 159 | # print("Score: {}\t Topic: {}".format(score, job_description_tfidf_lda_model.print_topic(index, 10))) 160 | # counter = counter + 1 161 | # else: 162 | # break 163 | 164 | 165 | keywords['noun_phrases'] = noun_phrases 166 | keywords['bow_topics'] = bow_topics 167 | keywords['tfidf_topics'] = tfidf_topics 168 | 169 | return keywords 170 | 171 | 172 | if __name__ == '__main__': 173 | app.run(debug=True, port=5000) 174 | 175 | 176 | # REST API Endpoint Examples 177 | # http://127.0.0.1:5000/getKeywords 178 | 179 | # REST API Request Body Examples 180 | # { 181 | # "text": "Investment Consulting Company is seeking a Chief Financial Officer. This position manages the company's fiscal and administrative functions, provides highly responsible and technically complex staff assistance to the Executive Director. The work performed requires a high level of technical proficiency in financial management and investment management, as well as management, supervisory and administrative skills." 182 | # } 183 | # { 184 | # "text": "Design right ML & AI algorithms and manage the solution implementation end-to-end Manage client engagements and ensure project delivery as per client expectations. Present solutions in intuitive and effective way to the audience. Actively participate in all activities leading to career progression participating in sales pitches, training in house resources, coaching staff on best practices, client relationship management etc." 185 | # } 186 | # { 187 | # "text": "1.Must have a working knowledge of Asp.net (C#) (both n-tier and MVC must). 2.Must have a working knowledge of Sql Server . 3.Must have a working knowledge of web services, Bootstrap, CSS, Ajax, Java Script and J-query. 4.Report creation . 5.Good Communication skill and convincing power" 188 | # } -------------------------------------------------------------------------------- /output/notebook-cells-output.txt: -------------------------------------------------------------------------------- 1 | Print 10 example words from each of the three dictionaries formed: 2 | 3 | 7454 words present in job_description_dictionary. 4 | Examples from job_description_dictionary... 5 | 0 ameria 6 | 1 assistance 7 | 2 chief 8 | 3 company 9 | 4 consult 10 | 5 director 11 | 6 executive 12 | 7 financial 13 | 8 function 14 | 9 highly 15 | 10 investment 16 | 17 | 9821 words present in job_requirement_dictionary. 18 | Examples from job_requirement_dictionary... 19 | 0 accounting 20 | 1 action 21 | 2 activity 22 | 3 adequacy 23 | 4 administration 24 | 5 article 25 | 6 assigning 26 | 7 assist 27 | 8 board 28 | 9 budget 29 | 10 cash 30 | 31 | 7758 words present in required_qualification_dictionary. 32 | Examples from required_qualification_dictionary... 33 | 0 ability 34 | 1 acca 35 | 2 accounting 36 | 3 accounting/ 37 | 4 activity 38 | 5 administration 39 | 6 analysis 40 | 7 analytical 41 | 8 and/or 42 | 9 application 43 | 10 audience 44 | 45 | ---------------------------------------------------------------- 46 | Preview Bag-of-Words on a sample pre-processed document: 47 | 48 | Examples from job_description_bow_corpus... 49 | Word 14 ("position") appears 1 time. 50 | Word 149 ("communication") appears 1 time. 51 | Word 150 ("concept") appears 1 time. 52 | Word 151 ("design") appears 1 time. 53 | Word 152 ("designer") appears 1 time. 54 | Word 153 ("experience") appears 1 time. 55 | Word 154 ("field") appears 1 time. 56 | Word 155 ("graphic") appears 1 time. 57 | Word 156 ("medium") appears 1 time. 58 | Word 157 ("since") appears 1 time. 59 | Word 158 ("study") appears 1 time. 60 | 61 | Examples from job_requirement_bow_corpus... 62 | Word 11 ("client") appears 1 time. 63 | Word 25 ("development") appears 1 time. 64 | Word 28 ("documentation") appears 1 time. 65 | Word 64 ("product") appears 1 time. 66 | Word 82 ("team") appears 1 time. 67 | Word 111 ("communication") appears 1 time. 68 | Word 123 ("group") appears 1 time. 69 | Word 137 ("project") appears 1 time. 70 | Word 164 ("software") appears 1 time. 71 | Word 266 ("also") appears 1 time. 72 | Word 267 ("designer") appears 1 time. 73 | Word 268 ("every") appears 1 time. 74 | Word 269 ("graphic") appears 1 time. 75 | Word 270 ("guide") appears 1 time. 76 | Word 271 ("house") appears 1 time. 77 | Word 272 ("interface") appears 1 time. 78 | Word 273 ("marketing") appears 1 time. 79 | Word 274 ("material") appears 1 time. 80 | Word 275 ("packaging") appears 1 time. 81 | Word 276 ("presentation") appears 1 time. 82 | Word 277 ("subsidiary") appears 1 time. 83 | Word 278 ("user") appears 2 time. 84 | Word 279 ("website") appears 1 time. 85 | 86 | Examples from required_qualification_bow_corpus... 87 | Word 0 ("ability") appears 2 time. 88 | Word 6 ("analysis") appears 1 time. 89 | Word 8 ("and/or") appears 1 time. 90 | Word 18 ("communication") appears 1 time. 91 | Word 20 ("computer") appears 1 time. 92 | Word 29 ("development") appears 1 time. 93 | Word 33 ("english") appears 1 time. 94 | Word 36 ("experience") appears 2 time. 95 | Word 52 ("knowledge") appears 1 time. 96 | Word 80 ("skill") appears 2 time. 97 | Word 89 ("team") appears 1 time. 98 | Word 95 ("well") appears 2 time. 99 | Word 97 ("work") appears 1 time. 100 | Word 102 ("language") appears 1 time. 101 | Word 111 ("field") appears 1 time. 102 | Word 134 ("2000") appears 1 time. 103 | Word 155 ("organization") appears 1 time. 104 | Word 165 ("office") appears 1 time. 105 | Word 188 ("member") appears 1 time. 106 | Word 240 ("capability") appears 1 time. 107 | Word 241 ("community") appears 1 time. 108 | Word 242 ("compulsory") appears 1 time. 109 | Word 243 ("consultancy") appears 1 time. 110 | Word 244 ("document") appears 1 time. 111 | Word 245 ("economics") appears 1 time. 112 | Word 246 ("education") appears 1 time. 113 | Word 247 ("higher") appears 1 time. 114 | Word 248 ("initiative") appears 1 time. 115 | Word 249 ("overtime") appears 1 time. 116 | Word 250 ("pressure") appears 1 time. 117 | Word 251 ("sector") appears 1 time. 118 | Word 252 ("sufficient") appears 1 time. 119 | Word 253 ("time") appears 1 time. 120 | Word 254 ("travel") appears 1 time. 121 | 122 | ---------------------------------------------------------------- 123 | Display top 10 words in each topic of each model: 124 | 125 | Job Description Topics (BoW): 126 | Topic: 0 127 | Words: 0.058*"center" + 0.056*"language" + 0.036*"specialist" + 0.036*"education" + 0.035*"armenia" + 0.031*"international" + 0.030*"student" + 0.025*"card" + 0.022*"english" + 0.019*"position" 128 | Topic: 1 129 | Words: 0.067*"program" + 0.051*"training" + 0.031*"armenia" + 0.025*"yerevan" + 0.022*"position" + 0.021*"foundation" + 0.019*"coordinator" + 0.017*"country" + 0.015*"office" + 0.014*"staff" 130 | Topic: 2 131 | Words: 0.079*"sales" + 0.045*"company" + 0.036*"product" + 0.033*"incumbent" + 0.031*"representative" + 0.028*"customer" + 0.028*"manager" + 0.025*"medical" + 0.023*"store" + 0.015*"services" 132 | Topic: 3 133 | Words: 0.066*"system" + 0.041*"network" + 0.035*"maintenance" + 0.031*"engineer" + 0.030*"support" + 0.029*"database" + 0.028*"incumbent" + 0.028*"design" + 0.024*"administrator" + 0.021*"construction" 134 | Topic: 4 135 | Words: 0.067*"bank" + 0.047*"credit" + 0.035*"specialist" + 0.030*"department" + 0.029*"cjsc" + 0.027*"loan" + 0.022*"incumbent" + 0.022*"head" + 0.021*"armenia" + 0.020*"position" 136 | Topic: 5 137 | Words: 0.142*"project" + 0.029*"armenia" + 0.022*"consultant" + 0.019*"team" + 0.019*"implementation" + 0.018*"health" + 0.014*"services" + 0.014*"supervision" + 0.012*"manager" + 0.011*"support" 138 | Topic: 6 139 | Words: 0.044*"armenia" + 0.043*"development" + 0.041*"project" + 0.039*"program" + 0.025*"usaid" + 0.022*"specialist" + 0.019*"sector" + 0.015*"market" + 0.014*"component" + 0.013*"value" 140 | Topic: 7 141 | Words: 0.096*"marketing" + 0.055*"sales" + 0.053*"business" + 0.044*"company" + 0.043*"manager" + 0.036*"development" + 0.035*"product" + 0.033*"strategy" + 0.027*"market" + 0.026*"incumbent" 142 | Topic: 8 143 | Words: 0.085*"software" + 0.062*"development" + 0.047*"engineer" + 0.047*"application" + 0.036*"team" + 0.034*"design" + 0.026*"developer" + 0.024*"senior" + 0.022*"product" + 0.021*"project" 144 | Topic: 9 145 | Words: 0.084*"chief" + 0.083*"accountant" + 0.049*"accounting" + 0.029*"company" + 0.026*"translation" + 0.022*"incumbent" + 0.021*"lawyer" + 0.019*"news" + 0.019*"language" + 0.017*"translator" 146 | Topic: 10 147 | Words: 0.029*"management" + 0.026*"incumbent" + 0.026*"analysis" + 0.023*"data" + 0.022*"procedure" + 0.022*"business" + 0.020*"risk" + 0.019*"report" + 0.018*"policy" + 0.018*"finance" 148 | Topic: 11 149 | Words: 0.044*"programme" + 0.028*"office" + 0.026*"expert" + 0.017*"activity" + 0.016*"incumbent" + 0.016*"osce" + 0.015*"yerevan" + 0.014*"application" + 0.014*"area" + 0.014*"analyst" 150 | Topic: 12 151 | Words: 0.020*"communications" + 0.019*"incumbent" + 0.017*"relations" + 0.017*"information" + 0.016*"food" + 0.016*"border" + 0.014*"communication" + 0.012*"shop" + 0.012*"armenia" + 0.012*"public" 152 | Topic: 13 153 | Words: 0.038*"development" + 0.030*"project" + 0.030*"implementation" + 0.030*"activity" + 0.025*"program" + 0.020*"community" + 0.019*"management" + 0.018*"monitoring" + 0.017*"incumbent" + 0.015*"armenia" 154 | Topic: 14 155 | Words: 0.082*"customer" + 0.062*"service" + 0.022*"incumbent" + 0.018*"quality" + 0.016*"opportunity" + 0.014*"center" + 0.014*"support" + 0.013*"need" + 0.013*"intern" + 0.013*"services" 156 | Topic: 15 157 | Words: 0.166*"position" + 0.138*"candidate" + 0.062*"company" + 0.039*"highly" + 0.038*"cjsc" + 0.031*"person" + 0.022*"manager" + 0.017*"specialist" + 0.017*"team" + 0.016*"supervision" 158 | Topic: 16 159 | Words: 0.069*"developer" + 0.049*"team" + 0.047*"development" + 0.042*"company" + 0.036*"software" + 0.028*"experience" + 0.027*"position" + 0.023*"project" + 0.022*"highly" + 0.020*"product" 160 | Topic: 17 161 | Words: 0.054*"designer" + 0.036*"medium" + 0.035*"design" + 0.034*"website" + 0.031*"content" + 0.026*"hotel" + 0.013*"graphic" + 0.012*"material" + 0.011*"picsart" + 0.009*"work" 162 | Topic: 18 163 | Words: 0.052*"office" + 0.049*"director" + 0.044*"management" + 0.043*"manager" + 0.034*"assistant" + 0.027*"supervision" + 0.026*"operations" + 0.022*"executive" + 0.022*"activity" + 0.022*"incumbent" 164 | Topic: 19 165 | Words: 0.030*"research" + 0.025*"armenia" + 0.017*"management" + 0.015*"program" + 0.015*"event" + 0.013*"individual" + 0.013*"tool" + 0.010*"staff" + 0.010*"system" + 0.009*"documentation" 166 | 167 | 168 | Job Description Topics (TF-IDF): 169 | Topic: 0 170 | Words: 0.033*"cjsc" + 0.023*"haypost" + 0.019*"credit" + 0.018*"position" + 0.016*"specialist" + 0.016*"self" + 0.015*"division" + 0.015*"skill" + 0.014*"bank" + 0.014*"internal" 171 | Topic: 1 172 | Words: 0.031*"medical" + 0.027*"representative" + 0.015*"education" + 0.014*"pharmaceutical" + 0.013*"product" + 0.013*"pharmacist" + 0.013*"region" + 0.012*"doctor" + 0.011*"telecommunication" + 0.011*"promotion" 173 | Topic: 2 174 | Words: 0.019*"cascade" + 0.015*"holding" + 0.013*"insurance" + 0.013*"candidate" + 0.013*"position" + 0.013*"hotel" + 0.012*"idea" + 0.012*"foundation" + 0.010*"general" + 0.010*"style" 175 | Topic: 3 176 | Words: 0.049*"accountant" + 0.048*"accounting" + 0.030*"chief" + 0.023*"company" + 0.020*"financial" + 0.017*"supervision" + 0.017*"director" + 0.017*"assistant" + 0.015*"department" + 0.015*"duty" 177 | Topic: 4 178 | Words: 0.026*"feature" + 0.021*"news" + 0.019*"sourcio" + 0.015*"language" + 0.014*"partner" + 0.013*"agency" + 0.012*"english" + 0.012*"repair" + 0.011*"product" + 0.011*"translation" 179 | Topic: 5 180 | Words: 0.026*"branch" + 0.025*"bank" + 0.023*"loan" + 0.020*"customer" + 0.018*"lending" + 0.015*"service" + 0.015*"converse" + 0.012*"kamurj" + 0.012*"incumbent" + 0.010*"cash" 181 | Topic: 6 182 | Words: 0.015*"secretary" + 0.010*"retail" + 0.010*"sponsorship" + 0.010*"translator" + 0.010*"language" + 0.009*"chain" + 0.009*"commercial" + 0.009*"baldi" + 0.009*"sponsor" + 0.008*"candidate" 183 | Topic: 7 184 | Words: 0.021*"customer" + 0.013*"sales" + 0.012*"client" + 0.011*"phone" + 0.011*"service" + 0.010*"plant" + 0.009*"manager" + 0.008*"systrotech" + 0.007*"incumbent" + 0.007*"data" 185 | Topic: 8 186 | Words: 0.014*"shop" + 0.013*"customer" + 0.011*"driver" + 0.011*"engineering" + 0.011*"company" + 0.010*"team" + 0.010*"support" + 0.010*"joomag" + 0.009*"incumbent" + 0.008*"aspect" 187 | Topic: 9 188 | Words: 0.010*"credit" + 0.009*"project" + 0.009*"management" + 0.008*"risk" + 0.008*"procedure" + 0.008*"policy" + 0.007*"incumbent" + 0.007*"analysis" + 0.006*"activity" + 0.006*"bank" 189 | Topic: 10 190 | Words: 0.016*"manual" + 0.015*"trading" + 0.012*"expertise" + 0.011*"focus" + 0.010*"area" + 0.010*"service" + 0.010*"food" + 0.010*"water" + 0.009*"class" + 0.009*"volume" 191 | Topic: 11 192 | Words: 0.015*"mainly" + 0.014*"parts" + 0.013*"candidate" + 0.012*"armentel" + 0.011*"tourism" + 0.010*"company" + 0.010*"import" + 0.010*"territory" + 0.009*"flash" + 0.009*"vehicle" 193 | Topic: 12 194 | Words: 0.035*"developer" + 0.025*"desk" + 0.024*"application" + 0.024*"primarily" + 0.022*"president" + 0.020*"company" + 0.018*"android" + 0.017*"multiplatform" + 0.016*"monitis" + 0.016*"java" 195 | Topic: 13 196 | Words: 0.024*"network" + 0.024*"administrator" + 0.022*"system" + 0.017*"database" + 0.015*"security" + 0.014*"maintenance" + 0.011*"server" + 0.011*"installation" + 0.011*"mobile" + 0.011*"tech" 197 | Topic: 14 198 | Words: 0.022*"experience" + 0.020*"developer" + 0.019*"programming" + 0.019*"software" + 0.018*"development" + 0.016*"knowledge" + 0.016*"user" + 0.015*".net" + 0.013*"application" + 0.012*"highly" 199 | Topic: 15 200 | Words: 0.017*"tumo" + 0.013*"creative" + 0.012*"center" + 0.012*"store" + 0.009*"technology" + 0.009*"operation" + 0.009*"incumbent" + 0.008*"transmission" + 0.008*"stock" + 0.007*"mysql" 201 | Topic: 16 202 | Words: 0.062*"software" + 0.029*"developer" + 0.027*"engineer" + 0.024*"application" + 0.022*"team" + 0.022*"product" + 0.021*"quality" + 0.021*"design" + 0.020*"senior" + 0.020*"project" 203 | Topic: 17 204 | Words: 0.055*"marketing" + 0.042*"sales" + 0.020*"company" + 0.020*"market" + 0.018*"manager" + 0.018*"business" + 0.016*"advertising" + 0.015*"brand" + 0.015*"strategy" + 0.013*"designer" 205 | Topic: 18 206 | Words: 0.018*"construction" + 0.010*"website" + 0.009*"warehouse" + 0.009*"university" + 0.009*"teacher" + 0.008*"teaching" + 0.008*"thinking" + 0.008*"estate" + 0.008*"french" + 0.007*"servicing" 207 | Topic: 19 208 | Words: 0.018*"project" + 0.015*"program" + 0.012*"armenia" + 0.010*"office" + 0.009*"implementation" + 0.009*"development" + 0.008*"activity" + 0.008*"support" + 0.008*"programme" + 0.008*"manager" 209 | 210 | 211 | Job Requirement Topics (BoW): 212 | Topic: 0 213 | Words: 0.030*"program" + 0.024*"project" + 0.023*"development" + 0.021*"management" + 0.017*"staff" + 0.016*"implementation" + 0.016*"policy" + 0.015*"programme" + 0.014*"activity" + 0.014*"donor" 214 | Topic: 1 215 | Words: 0.048*"test" + 0.048*"design" + 0.043*"software" + 0.035*"application" + 0.034*"development" + 0.025*"team" + 0.023*"testing" + 0.023*"code" + 0.018*"requirement" + 0.017*"documentation" 216 | Topic: 2 217 | Words: 0.020*"store" + 0.018*"control" + 0.018*"safety" + 0.017*"quality" + 0.015*"guest" + 0.014*"standard" + 0.013*"equipment" + 0.013*"food" + 0.012*"work" + 0.012*"service" 218 | Topic: 3 219 | Words: 0.057*"office" + 0.031*"meeting" + 0.030*"document" + 0.020*"correspondence" + 0.016*"support" + 0.016*"assist" + 0.015*"duty" + 0.015*"translation" + 0.013*"report" + 0.013*"english" 220 | Topic: 4 221 | Words: 0.032*"customer" + 0.029*"information" + 0.025*"call" + 0.020*"vehicle" + 0.019*"answer" + 0.017*"order" + 0.017*"duty" + 0.015*"staff" + 0.015*"inquiry" + 0.015*"perform" 222 | Topic: 5 223 | Words: 0.056*"training" + 0.032*"staff" + 0.031*"employee" + 0.016*"personnel" + 0.015*"process" + 0.015*"resource" + 0.015*"program" + 0.015*"activity" + 0.015*"development" + 0.015*"recruitment" 224 | Topic: 6 225 | Words: 0.060*"design" + 0.036*"content" + 0.028*"site" + 0.026*"production" + 0.022*"website" + 0.019*"material" + 0.019*"company" + 0.012*"marketing" + 0.012*"page" + 0.011*"user" 226 | Topic: 7 227 | Words: 0.097*"project" + 0.028*"activity" + 0.023*"implementation" + 0.019*"report" + 0.014*"plan" + 0.013*"procurement" + 0.011*"work" + 0.010*"consultant" + 0.009*"evaluation" + 0.009*"training" 228 | Topic: 8 229 | Words: 0.037*"community" + 0.027*"sector" + 0.025*"child" + 0.023*"program" + 0.018*"support" + 0.017*"coordinator" + 0.017*"education" + 0.016*"marz" + 0.016*"capacity" + 0.015*"stakeholder" 230 | Topic: 9 231 | Words: 0.048*"medium" + 0.038*"event" + 0.030*"material" + 0.026*"activity" + 0.020*"information" + 0.019*"organization" + 0.017*"communication" + 0.015*"organize" + 0.014*"press" + 0.014*"website" 232 | Topic: 10 233 | Words: 0.056*"sales" + 0.048*"customer" + 0.044*"product" + 0.037*"market" + 0.036*"company" + 0.036*"marketing" + 0.022*"plan" + 0.020*"client" + 0.020*"business" + 0.018*"strategy" 234 | Topic: 11 235 | Words: 0.079*"data" + 0.049*"analysis" + 0.036*"report" + 0.021*"business" + 0.020*"process" + 0.017*"management" + 0.017*"reporting" + 0.017*"system" + 0.013*"information" + 0.012*"cost" 236 | Topic: 12 237 | Words: 0.072*"system" + 0.033*"network" + 0.025*"equipment" + 0.024*"database" + 0.023*"security" + 0.022*"support" + 0.021*"software" + 0.017*"information" + 0.017*"maintenance" + 0.016*"services" 238 | Topic: 13 239 | Words: 0.077*"branch" + 0.030*"service" + 0.027*"unit" + 0.021*"operations" + 0.018*"cash" + 0.017*"staff" + 0.016*"management" + 0.015*"customer" + 0.015*"performance" + 0.014*"within" 240 | Topic: 14 241 | Words: 0.041*"accounting" + 0.040*"report" + 0.023*"cash" + 0.022*"payment" + 0.022*"audit" + 0.022*"control" + 0.021*"company" + 0.020*"account" + 0.018*"system" + 0.017*"prepare" 242 | Topic: 15 243 | Words: 0.077*"team" + 0.035*"business" + 0.035*"development" + 0.031*"member" + 0.028*"project" + 0.022*"process" + 0.020*"work" + 0.020*"requirement" + 0.019*"management" + 0.017*"quality" 244 | Topic: 16 245 | Words: 0.071*"credit" + 0.066*"loan" + 0.055*"client" + 0.018*"portfolio" + 0.017*"customer" + 0.017*"analysis" + 0.016*"card" + 0.016*"report" + 0.015*"application" + 0.014*"business" 246 | Topic: 17 247 | Words: 0.034*"bank" + 0.030*"contract" + 0.025*"company" + 0.022*"document" + 0.021*"customer" + 0.017*"risk" + 0.015*"process" + 0.013*"issue" + 0.012*"compliance" + 0.012*"information" 248 | 249 | 250 | Job Requirement Topics (TF-IDF): 251 | Topic: 0 252 | Words: 0.047*"test" + 0.025*"software" + 0.020*"testing" + 0.016*"design" + 0.014*"development" + 0.014*"defect" + 0.013*"case" + 0.013*"team" + 0.011*"requirement" + 0.010*"part" 253 | Topic: 1 254 | Words: 0.010*"carrier" + 0.009*"healthcare" + 0.008*"modeling" + 0.007*"oracle" + 0.007*"set" + 0.006*"trading" + 0.006*"layer" + 0.006*"voltage" + 0.006*"logic" + 0.006*"continue/" 255 | Topic: 2 256 | Words: 0.014*"laboratory" + 0.010*"leaflet" + 0.008*"cluster" + 0.008*"validation" + 0.007*"corner" + 0.007*"booklet" + 0.006*"tester" + 0.006*"orient" + 0.006*"run" + 0.006*"error" 257 | Topic: 3 258 | Words: 0.049*"application" + 0.036*"design" + 0.029*"software" + 0.021*"effectively" + 0.021*"team" + 0.020*"communicate" + 0.019*"feature" + 0.019*"documentation" + 0.019*"member" + 0.017*"develop" 259 | Topic: 4 260 | Words: 0.040*"credit" + 0.037*"loan" + 0.018*"branch" + 0.017*"client" + 0.016*"customer" + 0.011*"portfolio" + 0.011*"lending" + 0.010*"repayment" + 0.009*"committee" + 0.008*"risk" 261 | Topic: 5 262 | Words: 0.022*"sales" + 0.019*"customer" + 0.016*"market" + 0.015*"marketing" + 0.014*"product" + 0.013*"client" + 0.012*"company" + 0.009*"strategy" + 0.009*"business" + 0.008*"services" 263 | Topic: 6 264 | Words: 0.019*"accounting" + 0.011*"bank" + 0.011*"cash" + 0.010*"report" + 0.009*"control" + 0.009*"company" + 0.009*"audit" + 0.008*"payment" + 0.008*"transactions" + 0.007*"account" 265 | Topic: 7 266 | Words: 0.018*"doctor" + 0.017*"store" + 0.014*"visit" + 0.013*"good" + 0.012*"pharmacist" + 0.012*"product" + 0.011*"sales" + 0.010*"customer" + 0.010*"company" + 0.009*"supplier" 267 | Topic: 8 268 | Words: 0.015*"compensation" + 0.011*"debt" + 0.010*"payables" + 0.009*"estate" + 0.008*"backend" + 0.008*"code" + 0.008*"debtor" + 0.007*"employ" + 0.007*"naming" + 0.006*"welfare" 269 | Topic: 9 270 | Words: 0.019*"design" + 0.014*"application" + 0.010*"user" + 0.010*"software" + 0.009*"code" + 0.009*"technology" + 0.008*"development" + 0.008*"system" + 0.008*"solution" + 0.008*"team" 271 | Topic: 10 272 | Words: 0.019*"verification" + 0.013*"hotel" + 0.011*"parts" + 0.010*"tour" + 0.010*"research" + 0.010*"plant" + 0.009*"application" + 0.008*"design" + 0.007*"guest" + 0.007*"disseminate" 273 | Topic: 11 274 | Words: 0.008*"choice" + 0.007*"appeal" + 0.007*"vault" + 0.007*"apis" + 0.007*"visual" + 0.007*"multi" + 0.007*"beginning" + 0.007*"british" + 0.007*"diagnose" + 0.006*"mentorship" 275 | Topic: 12 276 | Words: 0.013*"trainee" + 0.011*"transmission" + 0.010*"client/" + 0.008*"club" + 0.008*"involve" + 0.006*"scan" + 0.006*"easily" + 0.006*"complaint" + 0.006*"consumables" + 0.005*"saas" 277 | Topic: 13 278 | Words: 0.018*"correspondence" + 0.017*"office" + 0.017*"translation" + 0.016*"meeting" + 0.015*"call" + 0.014*"english" + 0.013*"document" + 0.011*"telephone" + 0.011*"language" + 0.011*"translate" 279 | Topic: 14 280 | Words: 0.016*"equipment" + 0.015*"system" + 0.015*"network" + 0.011*"maintenance" + 0.010*"security" + 0.009*"server" + 0.009*"computer" + 0.009*"data" + 0.008*"database" + 0.008*"hardware" 281 | Topic: 15 282 | Words: 0.020*"project" + 0.010*"program" + 0.009*"training" + 0.008*"implementation" + 0.008*"activity" + 0.007*"community" + 0.006*"staff" + 0.006*"development" + 0.006*"programme" + 0.006*"organization" 283 | Topic: 16 284 | Words: 0.025*"medium" + 0.018*"website" + 0.015*"press" + 0.014*"material" + 0.013*"content" + 0.013*"news" + 0.010*"event" + 0.010*"article" + 0.009*"site" + 0.008*"release" 285 | Topic: 17 286 | Words: 0.032*"code" + 0.024*"read" + 0.019*"conformance" + 0.019*"coding" + 0.019*"software" + 0.018*"assure" + 0.017*"specification" + 0.017*"food" + 0.014*"source" + 0.014*"team" 287 | 288 | 289 | Required Qualification Topics (BoW): 290 | Topic: 0 291 | Words: 0.096*"knowledge" + 0.074*"experience" + 0.033*"language" + 0.029*"skill" + 0.026*"years" + 0.026*"development" + 0.024*"database" + 0.024*"plus" + 0.020*"java" + 0.019*"work" 292 | Topic: 1 293 | Words: 0.052*"finance" + 0.051*"skill" + 0.043*"banking" + 0.036*"economics" + 0.031*"ability" + 0.027*"experience" + 0.024*"analysis" + 0.022*"business" + 0.020*"bank" + 0.019*"team" 294 | Topic: 2 295 | Words: 0.163*"knowledge" + 0.079*"experience" + 0.064*"language" + 0.053*"work" + 0.046*"education" + 0.044*"years" + 0.035*"field" + 0.029*"least" + 0.027*"legislation" + 0.018*"banking" 296 | Topic: 3 297 | Words: 0.074*"security" + 0.031*"information" + 0.024*"operation" + 0.023*"equipment" + 0.023*"safety" + 0.021*"vehicle" + 0.021*"level" + 0.019*"care" + 0.019*"candidate" + 0.017*"applicant" 298 | Topic: 4 299 | Words: 0.068*"knowledge" + 0.055*"experience" + 0.049*"skill" + 0.034*"development" + 0.031*"software" + 0.029*"ability" + 0.027*"language" + 0.025*"degree" + 0.025*"science" + 0.024*"computer" 300 | Topic: 5 301 | Words: 0.085*"network" + 0.078*"knowledge" + 0.054*"system" + 0.043*"windows" + 0.034*"server" + 0.030*"experience" + 0.029*"administration" + 0.025*"technology" + 0.020*"microsoft" + 0.017*"protocol" 302 | Topic: 6 303 | Words: 0.098*"skill" + 0.032*"willingness" + 0.026*"excel" + 0.024*"excellent" + 0.022*"word" + 0.021*"bank" + 0.020*"environment" + 0.019*"organization" + 0.017*"hours" + 0.016*"world" 304 | Topic: 7 305 | Words: 0.139*"ability" + 0.022*"task" + 0.021*"environment" + 0.018*"pressure" + 0.018*"effectively" + 0.015*"attention" + 0.015*"detail" + 0.013*"team" + 0.013*"deadline" + 0.013*"language" 306 | Topic: 8 307 | Words: 0.064*"experience" + 0.031*"skill" + 0.028*"degree" + 0.026*"management" + 0.026*"years" + 0.025*"knowledge" + 0.025*"project" + 0.024*"language" + 0.015*"field" + 0.013*"development" 308 | Topic: 9 309 | Words: 0.095*"skill" + 0.071*"knowledge" + 0.059*"language" + 0.040*"ability" + 0.039*"experience" + 0.036*"communication" + 0.028*"work" + 0.027*"computer" + 0.025*"degree" + 0.023*"office" 310 | Topic: 10 311 | Words: 0.082*"ability" + 0.037*"work" + 0.035*"knowledge" + 0.034*"construction" + 0.033*"license" + 0.031*"experience" + 0.031*"language" + 0.018*"engineering" + 0.017*"driver" + 0.016*"availability" 312 | Topic: 11 313 | Words: 0.104*"accounting" + 0.080*"knowledge" + 0.043*"experience" + 0.040*"finance" + 0.028*"software" + 0.028*"skill" + 0.027*"ability" + 0.020*"standard" + 0.019*"plus" + 0.017*"audit" 314 | Topic: 12 315 | Words: 0.078*"ability" + 0.046*"skill" + 0.033*"team" + 0.033*"design" + 0.020*"understanding" + 0.019*"personality" + 0.018*"communication" + 0.014*"experience" + 0.014*"well" + 0.013*"project" 316 | Topic: 13 317 | Words: 0.033*"student" + 0.031*"design" + 0.027*"engineering" + 0.027*"tool" + 0.027*"understanding" + 0.024*"electronics" + 0.018*"circuit" + 0.016*"device" + 0.016*"radio" + 0.015*"physics" 318 | Topic: 14 319 | Words: 0.055*"skill" + 0.054*"ability" + 0.051*"experience" + 0.024*"development" + 0.021*"training" + 0.019*"education" + 0.016*"knowledge" + 0.015*"community" + 0.013*"work" + 0.012*"willingness" 320 | Topic: 15 321 | Words: 0.094*"experience" + 0.045*"knowledge" + 0.029*"testing" + 0.027*"development" + 0.024*"ability" + 0.020*"years" + 0.019*"understanding" + 0.018*"tool" + 0.018*"skill" + 0.015*"software" 322 | Topic: 16 323 | Words: 0.077*"skill" + 0.050*"management" + 0.048*"experience" + 0.037*"business" + 0.033*"knowledge" + 0.030*"ability" + 0.027*"marketing" + 0.021*"years" + 0.018*"degree" + 0.016*"communication" 324 | Topic: 17 325 | Words: 0.065*"adobe" + 0.063*"knowledge" + 0.059*"experience" + 0.055*"photoshop" + 0.037*"language" + 0.027*"illustrator" + 0.026*"corel" + 0.026*"draw" + 0.023*"flash" + 0.021*"html" 326 | 327 | 328 | Required Qualification Topics (TF-IDF): 329 | Topic: 0 330 | Words: 0.016*"b.s." + 0.013*"patient" + 0.013*"request" + 0.012*"enthusiasm" + 0.011*"career" + 0.009*"evidence" + 0.008*"care" + 0.007*"proactive" + 0.007*"device" + 0.007*"controller" 331 | Topic: 1 332 | Words: 0.023*"education" + 0.018*"personality" + 0.018*"field" + 0.016*"work" + 0.016*"sales" + 0.016*"sense" + 0.015*"pressure" + 0.015*"responsibility" + 0.015*"skill" + 0.014*"customer" 333 | Topic: 2 334 | Words: 0.036*"accounting" + 0.023*"banking" + 0.022*"finance" + 0.018*"legislation" + 0.014*"economics" + 0.014*"bank" + 0.010*"standard" + 0.009*"office" + 0.009*"least" + 0.009*"university" 335 | Topic: 3 336 | Words: 0.019*"together" + 0.018*"pharmaceutical/" + 0.012*"tourism" + 0.011*"privilege" + 0.009*"command" + 0.008*"applicant" + 0.008*"2014" + 0.008*"organizational" + 0.008*"literacy" + 0.008*"education" 337 | Topic: 4 338 | Words: 0.043*"adobe" + 0.036*"photoshop" + 0.024*"design" + 0.022*"draw" + 0.022*"corel" + 0.021*"illustrator" + 0.014*"flash" + 0.008*"video" + 0.008*"coreldraw" + 0.008*"website" 339 | Topic: 5 340 | Words: 0.017*"2005" + 0.016*"kind" + 0.014*"namely" + 0.012*"warehouse" + 0.011*"2012" + 0.011*"fulfillment" + 0.010*"exposure" + 0.010*"protection" + 0.009*"disaster" + 0.008*"operational" 341 | Topic: 6 342 | Words: 0.014*"mastery" + 0.013*"supervision" + 0.013*"without" + 0.011*"pharmacology" + 0.010*"starter" + 0.010*"task" + 0.010*"telecom" + 0.010*"passion" + 0.009*"professionalism" + 0.009*"workload" 343 | Topic: 7 344 | Words: 0.011*"linguistics" + 0.010*"office" + 0.009*"administration" + 0.009*"english" + 0.009*"package" + 0.009*"procurement" + 0.008*"word" + 0.008*"excellent" + 0.007*"business" + 0.007*"field" 345 | Topic: 8 346 | Words: 0.027*"testing" + 0.014*"tool" + 0.013*"script" + 0.013*"test" + 0.013*"programming" + 0.012*"software" + 0.011*"development" + 0.010*"automation" + 0.010*"linux" + 0.010*"algorithm" 347 | Topic: 9 348 | Words: 0.018*"food" + 0.015*"ruby" + 0.013*"hotel" + 0.012*"sybase" + 0.012*"suse" + 0.011*"beverage" + 0.011*"restaurant" + 0.011*"study" + 0.010*"linq" + 0.010*"ubuntu" 349 | Topic: 10 350 | Words: 0.033*"network" + 0.019*"system" + 0.018*"windows" + 0.013*"server" + 0.013*"security" + 0.012*"hardware" + 0.012*"administration" + 0.010*"linux" + 0.009*"protocol" + 0.009*"maintenance" 351 | Topic: 11 352 | Words: 0.020*"chemistry" + 0.015*"template" + 0.013*"biology" + 0.008*"nutrition" + 0.008*"discussion" + 0.008*"undergraduate" + 0.008*"amadeus" + 0.008*"nationality" + 0.007*"religion" + 0.007*"gabriel" 353 | Topic: 12 354 | Words: 0.023*"integrity" + 0.017*"efficiency" + 0.017*"action" + 0.014*"commitment" + 0.013*"high" + 0.013*"accuracy" + 0.011*"energetic" + 0.011*"discipline" + 0.011*"person" + 0.010*"hands" 355 | Topic: 13 356 | Words: 0.011*"management" + 0.009*"project" + 0.009*"ability" + 0.006*"development" + 0.006*"training" + 0.006*"organization" + 0.006*"excellent" + 0.005*"skill" + 0.005*"business" + 0.005*"writing" 357 | Topic: 14 358 | Words: 0.028*"manner" + 0.027*"mail" + 0.023*"excel" + 0.022*"word" + 0.022*"internet" + 0.020*"task" + 0.017*"point" + 0.016*"power" + 0.015*"self" + 0.013*"plus" 359 | Topic: 15 360 | Words: 0.018*"development" + 0.014*"technology" + 0.014*"java" + 0.014*"database" + 0.014*"javascript" + 0.013*"programming" + 0.013*"framework" + 0.012*"server" + 0.012*".net" + 0.011*"design" 361 | Topic: 16 362 | Words: 0.013*"business" + 0.013*"market" + 0.013*"conflict" + 0.012*"sales" + 0.011*"stress" + 0.011*"situation" + 0.011*"telecommunication" + 0.011*"people" + 0.010*"marketing" + 0.009*"management" 363 | Topic: 17 364 | Words: 0.025*"license" + 0.019*"medicine" + 0.018*"driving" + 0.016*"engineering" + 0.016*"driver" + 0.013*"advantage" + 0.012*"construction" + 0.011*"education" + 0.011*"pharmacy" + 0.011*"pharmaceutical" 365 | 366 | ---------------------------------------------------------------- 367 | 368 | Example Test Data: "Investment Consulting Company is seeking a Chief Financial Officer. This position manages the company's fiscal and administrative functions, provides highly responsible and technically complex staff assistance to the Executive Director. The work performed requires a high level of technical proficiency in financial management and investment management, as well as management, supervisory and administrative skills." 369 | 370 | Noun Phrases... 371 | --- [Format: Noun Phrase -> Root Text] --- 372 | Investment Consulting Company -> Company 373 | a Chief Financial Officer -> Officer 374 | This position -> position 375 | the company's fiscal and administrative functions -> functions 376 | highly responsible and technically complex staff assistance -> assistance 377 | the Executive Director -> Director 378 | The work -> work 379 | a high level -> level 380 | technical proficiency -> proficiency 381 | financial management and investment management -> management 382 | management -> management 383 | skills -> skills 384 | 385 | 386 | Topics relevant to new document are (BoW): 387 | Score: 0.43358665704727173 Topic: 0.052*"office" + 0.049*"director" + 0.044*"management" + 0.043*"manager" + 0.034*"assistant" + 0.027*"supervision" + 0.026*"operations" + 0.022*"executive" + 0.022*"activity" + 0.022*"incumbent" 388 | Score: 0.1840825229883194 Topic: 0.029*"management" + 0.026*"incumbent" + 0.026*"analysis" + 0.023*"data" + 0.022*"procedure" + 0.022*"business" + 0.020*"risk" + 0.019*"report" + 0.018*"policy" + 0.018*"finance" 389 | Score: 0.16247783601284027 Topic: 0.166*"position" + 0.138*"candidate" + 0.062*"company" + 0.039*"highly" + 0.038*"cjsc" + 0.031*"person" + 0.022*"manager" + 0.017*"specialist" + 0.017*"team" + 0.016*"supervision" 390 | 391 | 392 | Topics relevant to new document are (TF-IDF): 393 | Score: 0.43545523285865784 Topic: 0.049*"accountant" + 0.048*"accounting" + 0.030*"chief" + 0.023*"company" + 0.020*"financial" + 0.017*"supervision" + 0.017*"director" + 0.017*"assistant" + 0.015*"department" + 0.015*"duty" 394 | Score: 0.21969853341579437 Topic: 0.018*"project" + 0.015*"program" + 0.012*"armenia" + 0.010*"office" + 0.009*"implementation" + 0.009*"development" + 0.008*"activity" + 0.008*"support" + 0.008*"programme" + 0.008*"manager" 395 | Score: 0.17757698893547058 Topic: 0.026*"feature" + 0.021*"news" + 0.019*"sourcio" + 0.015*"language" + 0.014*"partner" + 0.013*"agency" + 0.012*"english" + 0.012*"repair" + 0.011*"product" + 0.011*"translation" --------------------------------------------------------------------------------