├── Notebooks
├── data
│ └── resume
│ │ └── ChatGPT_dataScientist.pdf
├── keys.env
└── resume_scanner.ipynb
├── README.md
├── Streamlit_App
├── app.py
├── app_constants.py
├── app_display_results.py
├── app_sidebar.py
├── data
│ └── Images
│ │ ├── Education.png
│ │ ├── Language.png
│ │ ├── Leonardo_AI.jpg
│ │ ├── app.png
│ │ ├── contact_information.png
│ │ ├── scores.png
│ │ ├── top_3_strengths.png
│ │ └── work_experience.png
├── keys.env
├── llm_functions.py
├── resume_analyzer.py
└── retrieval.py
└── requirements.txt
/Notebooks/data/resume/ChatGPT_dataScientist.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Notebooks/data/resume/ChatGPT_dataScientist.pdf
--------------------------------------------------------------------------------
/Notebooks/keys.env:
--------------------------------------------------------------------------------
1 | api_key_openai = "Your_API_key"
2 | api_key_google = "Your_API_key"
3 | api_key_cohere = "Your_API_key"
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 🔎 Resume scanner: 🚀 Leverage the power of LLM to improve your resume
2 |
3 | ### 🚀 Build a Streamlit application powered by Langchain, OpenAI and Google Generative AI
4 |
5 |
6 |

7 |
Image generated by Leonardo.ai
8 |
9 |
10 | ### Table of Contents
11 |
12 | 1. [Project Overview](#overview)
13 | 2. [Installation](#installation)
14 | 3. [File Descriptions](#files)
15 | 4. [Instructions](#instructions)
16 | 5. [Screenshots](#screenshots)
17 |
18 | ## Project Overview
19 |
20 | The aim of this project is to build a WEB application in [Streamlit](https://streamlit.io/) that will scan and improve a resume using instruction-tuned Large Language Models (LLMs).
21 |
22 | We leveraged the power of LLMs, specifically Chat GPT from [OpenAI](https://platform.openai.com/overview) and Gemini-pro from [Google](https://ai.google.dev/?hl=en), to extract, assess, and enhance resumes.
23 |
24 | We used [Langchain](https://python.langchain.com/docs/get_started/introduction), prompt engineering and retrieval augmented generation techniques to complete these steps.
25 |
26 | ## Installation
27 |
28 | This project requires Python 3 and the following Python libraries installed:
29 |
30 | `streamlit`, `langchain` ,`langchain-openai`, `langchain-google-genai`, `faiss-cpu`, `tiktoken`, `python-dotenv`, `pdfminer`, `markdown`
31 |
32 | The full list of requirements can be found in `requirements.txt`
33 |
34 | ## File Descriptions
35 |
36 | - **Streamlit_App** folder: contains the Streamlit application.
37 |
38 | - `requirements.txt`: contains the required packages for installation.
39 | - `keys.env`: Your OpenAI, Gemini, and Cohere API keys are stored here.
40 | - `llm_functions.py`: reads LLM API keys from keys.env and instantiates the LLM in Langchain.
41 | - `retrieval.py`: the script used to create a Langchain retrieval, including document loaders, embeddings, vector stores, and retrievers.
42 | - `app_constants.py`: contains templates for creating LLM prompts.
43 | - `app_sidebar.py`: the sidebar is where you can choose the LLM model and its parameters, such as temperature and top_p values, and enter your API keys.
44 | - `resume_analyzer.py`: this file contains the functions used to extract, assess, and improve each section of the resume using LLM. It is the **core** of the application.
45 | - `pp_display_results.py`: the script used to display resume sections, assessments, scores, and improved texts.
46 | - `app.py`: It's the main script of the app. It calls all the scripts and is used to run the Streamlit application.
47 |
48 | - **Notebooks** folder: contains the project's notebook.
49 |
50 | ## Instructions
51 |
52 | To run the app locally:
53 |
54 | 1. Create a virtual environment: `python -m venv virtualenv`
55 | 2. Activate the virtual environment :
56 |
57 | **Windows:** `.\virtualenv\Scripts\activate`
58 |
59 | **Linux:** `source virtualenv/bin/activate`
60 |
61 | 3. Install the required dependencies `pip install -r requirements.txt`
62 | 4. Add your OpenIA, Gemini, and Cohere API keys to the `keys.env` file. You can get your API keys from their respective websites.
63 |
64 | > - **OpenAI** API key: [Get an API key](https://platform.openai.com/account/api-keys)
65 | > - **Google** API key: [Get an API key](https://makersuite.google.com/app/apikey)
66 | > - **Cohere** API key: [Get an API key](https://dashboard.cohere.com/api-keys)
67 |
68 | 5. Start the app: `streamlit run ./Streamlit_App/app.py`
69 | 6. Select the LLM provider (either OpenAI or Google Generative AI) from the sidebar. Then, choose a model (GPT-3.5, GPT-4 or Gemini-pro) and adjust its parameters.
70 | 7. Use the file uploader widget to upload your resume in PDF format.
71 | 8. 🚀 To analyze and improve your resume, simply click the 'Analyze resume' button located in the main panel.
72 |
73 | ## Screenshots
74 |
75 | Here is a screenshot of the application.
76 |
77 |
78 |

79 |
80 |
81 | The results of the resume analysis and improvement are shown below.
82 |
83 | First, the resume's overview, top 3 strengths, and top 3 weaknesses are displayed.
84 |
85 |
86 |

87 |
88 |
89 | The scores are then displayed to give a general indication of the resume's quality.
90 | The resume is evaluated based on eight sections, each scored out of 100: contact information, summary, work experience, skills, education, language, projects, and certifications.
91 |
92 |
93 |

94 |
95 |
96 | Finally, the analysis of each section is presented in a st.expander. For instance, here is how the work experience is displayed.
97 |
98 |
99 |

100 |
101 |
--------------------------------------------------------------------------------
/Streamlit_App/app.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | from app_sidebar import sidebar
3 | from llm_functions import instantiate_LLM_main, get_api_keys_from_local_env
4 | from retrieval import retrieval_main
5 | from resume_analyzer import resume_analyzer_main
6 | from app_display_results import display_resume_analysis
7 |
8 |
9 | def main():
10 | """Analyze the uploaded resume."""
11 |
12 | if st.button("Analyze resume"):
13 | with st.spinner("Please wait..."):
14 | try:
15 | # 1. Create the Langchain retrieval
16 | retrieval_main()
17 |
18 | # 2. Instantiate a deterministic LLM with a temperature of 0.0.
19 | st.session_state.llm = instantiate_LLM_main(temperature=0.0, top_p=0.95)
20 |
21 | # 3. Instantiate LLM with temperature >0.1 for creativity.
22 | st.session_state.llm_creative = instantiate_LLM_main(
23 | temperature=st.session_state.temperature,
24 | top_p=st.session_state.top_p,
25 | )
26 |
27 | # 4. Analyze the resume
28 | st.session_state.SCANNED_RESUME = resume_analyzer_main(
29 | llm=st.session_state.llm,
30 | llm_creative=st.session_state.llm_creative,
31 | documents=st.session_state.documents,
32 | )
33 |
34 | # 5. Display results
35 | display_resume_analysis(st.session_state.SCANNED_RESUME)
36 |
37 | except Exception as e:
38 | st.error(f"An error occured: {e}")
39 |
40 |
41 | if __name__ == "__main__":
42 | # 1. Set app configuration
43 | st.set_page_config(page_title="Resume Scanner", page_icon="🚀")
44 | st.title("🔎 Resume Scanner")
45 |
46 | # 2. Get API keys from local "keys.env" file
47 | openai_api_key, google_api_key, cohere_api_key = get_api_keys_from_local_env()
48 |
49 | # 3. Create the sidebar
50 | sidebar(openai_api_key, google_api_key, cohere_api_key)
51 |
52 | # 4. File uploader widget
53 | st.session_state.uploaded_file = st.file_uploader(
54 | label="**Upload Resume**",
55 | accept_multiple_files=False,
56 | type=(["pdf"]),
57 | )
58 |
59 | # 5. Analyze the uploaded resume
60 | main()
61 |
--------------------------------------------------------------------------------
/Streamlit_App/app_constants.py:
--------------------------------------------------------------------------------
1 | from pathlib import Path
2 | import os
3 |
4 | # 1. Constants
5 |
6 | list_LLM_providers = [":rainbow[**OpenAI**]", "**Google Generative AI**"]
7 |
8 | list_Assistant_Languages = [
9 | "english",
10 | "french",
11 | "spanish",
12 | "german",
13 | "russian",
14 | "chinese",
15 | "arabic",
16 | "portuguese",
17 | "italian",
18 | "Japanese",
19 | ]
20 |
21 | TMP_DIR = Path(__file__).resolve().parent.joinpath("data", "tmp")
22 |
23 |
24 | # 2. PROMPT TEMPLATES
25 |
26 | templates = {}
27 |
28 | # 2.1 Contact information Section
29 | templates[
30 | "Contact__information"
31 | ] = """Extract and evaluate the contact information. \
32 | Output a dictionary with the following keys:
33 | - candidate__name
34 | - candidate__title
35 | - candidate__location
36 | - candidate__email
37 | - candidate__phone
38 | - candidate__social_media: Extract a list of all social media profiles, blogs or websites.
39 | - evaluation__ContactInfo: Evaluate in {language} the contact information.
40 | - score__ContactInfo: Rate the contact information by giving a score (integer) from 0 to 100.
41 | """
42 |
43 | # 2.2. Summary Section
44 | templates[
45 | "CV__summary"
46 | ] = """Extract the summary and/or objective section. This is a separate section of the resume. \
47 | If the resume doed not contain a summary and/or objective section, then simply write "unknown"."""
48 |
49 | # 2.3. WORK Experience Section
50 |
51 | templates[
52 | "Work__experience"
53 | ] = """Extract all work experiences. For each work experience:
54 | 1. Extract the job title.
55 | 2. Extract the company.
56 | 3. Extract the start date and output it in the following format: \
57 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
58 | 4. Extract the end date and output it in the following format: \
59 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
60 | 5. Create a dictionary with the following keys: job__title, job__company, job__start_date, job__end_date.
61 |
62 | Format your response as a list of dictionaries.
63 | """
64 |
65 | # 2.4. Projects Section
66 | templates[
67 | "CV__Projects"
68 | ] = """Include any side projects outside the work experience.
69 | For each project:
70 | 1. Extract the title of the project.
71 | 2. Extract the start date and output it in the following format: \
72 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
73 | 3. Extract the end date and output it in the following format: \
74 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
75 | 4. Create a dictionary with the following keys: project__title, project__start_date, project__end_date.
76 |
77 | Format your response as a list of dictionaries.
78 | """
79 |
80 | # 2.5. Education Section
81 | templates[
82 | "CV__Education"
83 | ] = """Extract all educational background and academic achievements.
84 | For each education achievement:
85 | 1. Extract the name of the college or the high school.
86 | 2. Extract the earned degree. Honors and achievements are included.
87 | 3. Extract the start date and output it in the following format: \
88 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
89 | 4. Extract the end date and output it in the following format: \
90 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
91 | 5. Create a dictionary with the following keys: edu__college, edu__degree, edu__start_date, edu__end_date.
92 |
93 | Format your response as a list of dictionaries.
94 | """
95 |
96 | templates[
97 | "Education__evaluation"
98 | ] = """Your task is to perform the following actions:
99 | 1. Rate the quality of the Education section by giving an integer score from 0 to 100.
100 | 2. Evaluate (in three sentences and in {language}) the quality of the Education section.
101 | 3. Format your response as a dictionary with the following keys: score__edu, evaluation__edu.
102 | """
103 |
104 | # 2.6. Skills
105 | templates[
106 | "candidate__skills"
107 | ] = """Extract the list of soft and hard skills from the skill section. Output a list.
108 | The skill section is a separate section.
109 | """
110 |
111 | templates[
112 | "Skills__evaluation"
113 | ] = """Your task is to perform the following actions:
114 | 1. Rate the quality of the Skills section by giving an integer score from 0 to 100.
115 | 2. Evaluate (in three sentences and in {language}) the quality of the Skills section.
116 | 3. Format your response as a dictionary with the following keys: score__skills, evaluation__skills.
117 | """
118 |
119 | # 2.7. Languages
120 | templates[
121 | "CV__Languages"
122 | ] = """Extract all the languages that the candidate can speak. For each language:
123 | 1. Extract the language.
124 | 2. Extract the fluency. If the fluency is not available, then simply write "unknown".
125 | 3. Create a dictionary with the following keys: spoken__language, language__fluency.
126 |
127 | Format your response as a list of dictionaries.
128 | """
129 |
130 | templates[
131 | "Languages__evaluation"
132 | ] = """ Your task is to perform the following actions:
133 | 1. Rate the quality of the language section by giving an integer score from 0 to 100.
134 | 2. Evaluate (in three sentences and in {language}) the quality of the language section.
135 | 3. Format your response as a dictionary with the following keys: score__language,evaluation__language.
136 | """
137 |
138 | # 2.8. Certifications
139 | templates[
140 | "CV__Certifications"
141 | ] = """Extraction of all certificates other than education background and academic achievements. \
142 | For each certificate:
143 | 1. Extract the title of the certification.
144 | 2. Extract the name of the organization or institution that issues the certification.
145 | 3. Extract the date of certification and output it in the following format: \
146 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
147 | 4. Extract the certification expiry date and output it in the following format: \
148 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
149 | 5. Extract any other information listed about the certification. if not found, then simply write "unknown".
150 | 6. Create a dictionary with the following keys: certif__title, certif__organization, certif__date, certif__expiry_date, certif__details.
151 |
152 | Format your response as a list of dictionaries.
153 | """
154 |
155 | templates[
156 | "Certif__evaluation"
157 | ] = """Your task is to perform the following actions:
158 | 1. Rate the certifications by giving an integer score from 0 to 100.
159 | 2. Evaluate (in three sentences and in {language}) the certifications and the quality of the text.
160 | 3. Format your response as a dictionary with the following keys: score__certif,evaluation__certif.
161 | """
162 |
163 |
164 | # 3. PROMPTS
165 |
166 | PROMPT_IMPROVE_SUMMARY = """Your are given a resume (delimited by ) \
167 | and a summary (delimited by ).
168 | 1. In {language}, evaluate the summary (format and content) .
169 | 2. Rate the summary by giving an integer score from 0 to 100. \
170 | If the summary is "unknown", the score is 0.
171 | 3. In {language}, strengthen the summary. The summary should not exceed 5 sentences. \
172 | If the summary is "unknown", generate a strong summary in {language} with no more than 5 sentences. \
173 | Please include: years of experience, top skills and experiences, some of the biggest achievements, and finally an attractive objective.
174 | 4. Format your response as a dictionary with the following keys: evaluation__summary, score__summary, CV__summary_enhanced.
175 |
176 |
177 | {summary}
178 |
179 | ------
180 |
181 | {resume}
182 |
183 | """
184 |
185 | PROMPT_IMPROVE_WORK_EXPERIENCE = """you are given a work experience text delimited by triple backticks.
186 | 1. Rate the quality of the work experience text by giving an integer score from 0 to 100.
187 | 2. Suggest in {language} how to make the work experience text better and stronger.
188 | 3. Strengthen the work experience text to make it more appealing to a recruiter in {language}. \
189 | Provide additional details on responsibilities and quantify results for each bullet point. \
190 | Format your text as a string in {language}.
191 | 4. Format your response as a dictionary with the following keys: "Score__WorkExperience", "Comments__WorkExperience" and "Improvement__WorkExperience".
192 |
193 | Work experience text: ```{text}```
194 | """
195 |
196 | PROMPT_IMPROVE_PROJECT = """you are given a project text delimited by triple backticks.
197 | 1. Rate the quality of the project text by giving an integer score from 0 to 100.
198 | 2. Suggest in {language} how to make the project text better and stronger.
199 | 3. Strengthen the project text to make it more appealing to a recruiter in {language}, \
200 | including the problem, the approach taken, the tools used and quantifiable results. \
201 | Format your text as a string in {language}.
202 | 4. Format your response as a dictionary with the following keys: Score__project, Comments__project, Improvement__project.
203 |
204 | project text: ```{text}```
205 | """
206 |
207 | PROMPT_EVALUATE_RESUME = """You are given a resume delimited by triple backticks.
208 | 1. Provide an overview of the resume in {language}.
209 | 2. Provide a comprehensive analysis of the three main strengths of the resume in {language}. \
210 | Format the top 3 strengths as string containg three bullet points.
211 | 3. Provide a comprehensive analysis of the three main weaknesses of the resume in {language}. \
212 | Format the top 3 weaknesses as string containg three bullet points.
213 | 4. Format your response as a dictionary with the following keys: resume_cv_overview, top_3_strengths, top_3_weaknesses.
214 |
215 | The strengths and weaknesses lie in the format, style and content of the resume.
216 |
217 | Resume: ```{text}```
218 | """
219 |
--------------------------------------------------------------------------------
/Streamlit_App/app_display_results.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | import markdown
3 | from resume_analyzer import get_section_scores
4 |
5 |
6 | def custom_markdown(
7 | text,
8 | html_tag="p",
9 | bg_color="white",
10 | color="black",
11 | font_size=None,
12 | text_align="left",
13 | ):
14 | """Customise markdown by specifying custom background colour, text colour, font size, and text alignment.."""
15 |
16 | style = f'style="background-color:{bg_color};color:{color};font-size:{font_size}px; \
17 | text-align: {text_align};padding: 25px 25px 25px 25px;border-radius:2%;"'
18 |
19 | body = f"<{html_tag} {style}> {text}{html_tag}>"
20 |
21 | st.markdown(body, unsafe_allow_html=True)
22 | st.write("")
23 |
24 |
25 | def set_background_color(score):
26 | """Set background color based on score."""
27 | if score >= 80:
28 | bg_color = "#D4F1F4"
29 | elif score >= 60:
30 | bg_color = "#ededed"
31 | else:
32 | bg_color = "#fbcccd"
33 | return bg_color
34 |
35 |
36 | def format_object_to_string(object, separator="\n- "):
37 | """Convert object (e.g. list) to string."""
38 | if not isinstance(object, str):
39 | return separator + separator.join(object)
40 | else:
41 | return object
42 |
43 |
44 | def markdown_to_html(md_text):
45 | """Convert Markdown to html."""
46 | html_txt = (
47 | markdown.markdown(md_text.replace("\\n", "\n").replace("- ", "\n- "))
48 | .replace("\n", "")
49 | .replace('\\"', '"')
50 | )
51 | return html_txt
52 |
53 |
54 | def display_scores_in_columns(section_names: list, scores: list, column_width: list):
55 | """Display the scores of the sections in side-by-side columns.
56 | The column_width variable sets the width of the columns."""
57 | columns = st.columns(column_width)
58 | for i, column in enumerate(columns):
59 | with column:
60 | custom_markdown(
61 | text=f"{section_names[i]}
{scores[i]}",
62 | bg_color=set_background_color(scores[i]),
63 | text_align="center",
64 | )
65 |
66 |
67 | def display_section_results(
68 | expander_label: str,
69 | expander_header_fields: list,
70 | expander_header_links: list,
71 | score: int,
72 | section_original_text_header: str,
73 | section_original_text: list,
74 | original_text_bullet_points: bool,
75 | section_assessment,
76 | section_improved_text,
77 | ):
78 | if score > -1:
79 | expander_label += f"- 🎯 **{score}**/100"
80 | with st.expander(expander_label):
81 | st.write("")
82 |
83 | # 1. Display the header fields (for example, the company and dates of the work experience)
84 | if expander_header_fields is not None:
85 | for field in expander_header_fields:
86 | if not isinstance(field, list):
87 | st.markdown(field)
88 | else:
89 | # display fields in side-by-side columns.
90 | columns = st.columns(len(field))
91 | for i, column in enumerate(columns):
92 | with column:
93 | st.markdown(field[i])
94 |
95 | # 2. View the links (examle social media blogs and web sites)
96 | if expander_header_links is not None:
97 | if not isinstance(expander_header_links, list):
98 | link = expander_header_links.strip().replace('"', "")
99 | if not link.startswith("http"):
100 | link = "https://" + link
101 | st.markdown(
102 | f"""🌐 {link}""",
103 | unsafe_allow_html=True,
104 | )
105 | else:
106 | for link in expander_header_links:
107 | if not link.startswith("http"):
108 | link = "https://" + link
109 | st.markdown(
110 | f"""🌐 {link}""",
111 | unsafe_allow_html=True,
112 | )
113 |
114 | # 3. View the original text
115 | if section_original_text_header is not None:
116 | st.write("")
117 | st.markdown(section_original_text_header)
118 | if section_original_text is not None:
119 | for text in section_original_text:
120 | if original_text_bullet_points:
121 | st.markdown(f"- {text}")
122 | else:
123 | st.markdown(text)
124 |
125 | # 4. Display of section score
126 | st.divider()
127 | custom_markdown(
128 | html_tag="h4",
129 | text=f"🎯 Score: {score}/100",
130 | )
131 |
132 | # 5. Display the assessmnet
133 | bg_color = set_background_color(score)
134 | assessment = markdown_to_html(format_object_to_string(section_assessment))
135 | custom_markdown(
136 | text=f"🔎 Assessment:
{assessment}",
137 | html_tag="div",
138 | bg_color=bg_color,
139 | )
140 |
141 | # 6. View the improved text
142 | if section_improved_text is not None:
143 | improved_text = markdown_to_html(
144 | format_object_to_string(section_improved_text)
145 | )
146 | custom_markdown(
147 | text=f"🚀 Improvement:
{improved_text}",
148 | html_tag="div",
149 | bg_color="#ededed",
150 | )
151 | st.write("")
152 |
153 |
154 | def display_assessment(score, section_assessment):
155 | """Display the section score and the assessment."""
156 | # 1. View section score
157 | custom_markdown(
158 | html_tag="h4",
159 | text=f"🎯 Score: {score}/100",
160 | )
161 | # 2. Display the assessmnet
162 | bg_color = set_background_color(score)
163 | assessment = markdown_to_html(format_object_to_string(section_assessment))
164 | custom_markdown(
165 | text=f"🔎 Assessment:
{assessment}",
166 | html_tag="div",
167 | bg_color=bg_color,
168 | )
169 | st.write("")
170 |
171 |
172 | def display_resume_analysis(SCANNED_RESUME):
173 | """Display the resume analysis."""
174 | try:
175 | ###############################################################
176 | # Overview, Top 3 strengths and Top 3 weaknesses
177 | ###############################################################
178 | st.divider()
179 | st.header("🎯 Overview and scores")
180 |
181 | list_task = ["Overview", "Top 3 strengths", "Top 3 weaknesses"]
182 | list_content = [
183 | SCANNED_RESUME["resume_cv_overview"],
184 | SCANNED_RESUME["top_3_strengths"],
185 | SCANNED_RESUME["top_3_weaknesses"],
186 | ]
187 | list_colors = ["#ededed", "#D4F1F4", "#fbcccd"]
188 |
189 | for i in range(3):
190 | st.write("")
191 | st.subheader(list_task[i])
192 | custom_markdown(
193 | html_tag="div",
194 | text=markdown_to_html(format_object_to_string(list_content[i])),
195 | bg_color=list_colors[i],
196 | )
197 |
198 | ###############################################################
199 | # Display scores
200 | ###############################################################
201 | st.write("")
202 | st.subheader("Scores over 100")
203 | st.write("")
204 |
205 | dict_scores = get_section_scores(SCANNED_RESUME)
206 |
207 | display_scores_in_columns(
208 | section_names=[
209 | "👤 Contact",
210 | "📋 Summary",
211 | "📋 Work Experience",
212 | "💪 Skills",
213 | ],
214 | scores=[
215 | dict_scores.get(key)
216 | for key in ["ContactInfo", "summary", "work_experience", "skills"]
217 | ],
218 | column_width=[2.25, 2.25, 2.75, 2.25],
219 | )
220 |
221 | display_scores_in_columns(
222 | section_names=[
223 | "🎓 Education",
224 | "🗣 Language",
225 | "📋 Projects",
226 | "🏅 Certifications",
227 | ],
228 | scores=[
229 | dict_scores.get(key)
230 | for key in ["education", "language", "projects", "certfication"]
231 | ],
232 | column_width=[2.5, 2.5, 2.5, 2.75],
233 | )
234 |
235 | ##################################################################################
236 | # Detailed analysis
237 | ##################################################################################
238 | st.divider()
239 | st.header("🔎 Detailed Analysis")
240 |
241 | # 1. Contact Information
242 |
243 | st.write("")
244 | st.subheader(f"Contact Information - 🎯 **{dict_scores['ContactInfo']}**/100")
245 | display_section_results(
246 | expander_label="🛈 Contact Information",
247 | expander_header_fields=[
248 | f"**👤 {SCANNED_RESUME['Contact__information']['candidate__name']}**",
249 | f"{SCANNED_RESUME['Contact__information']['candidate__title']}",
250 | "",
251 | [
252 | f"**📌 Location:** {SCANNED_RESUME['Contact__information']['candidate__location']}",
253 | f"**:telephone_receiver::** {SCANNED_RESUME['Contact__information']['candidate__phone']}",
254 | ],
255 | "",
256 | "**Email and Social media:**",
257 | f"**:e-mail:** {SCANNED_RESUME['Contact__information']['candidate__email']}",
258 | ],
259 | expander_header_links=SCANNED_RESUME["Contact__information"][
260 | "candidate__social_media"
261 | ],
262 | score=dict_scores["ContactInfo"],
263 | section_original_text_header=None,
264 | section_original_text=None,
265 | original_text_bullet_points=False,
266 | section_assessment=SCANNED_RESUME["Contact__information"][
267 | "evaluation__ContactInfo"
268 | ],
269 | section_improved_text=None,
270 | )
271 |
272 | # 2. Summary
273 |
274 | st.write("")
275 | st.write("")
276 | st.subheader(f"Summary - 🎯 **{dict_scores['summary']}**/100")
277 | display_section_results(
278 | expander_label="Summary",
279 | expander_header_fields=[],
280 | expander_header_links=None,
281 | score=dict_scores["summary"],
282 | section_original_text_header="**📋 Summary:**",
283 | section_original_text=[SCANNED_RESUME["CV__summary"]],
284 | original_text_bullet_points=False,
285 | section_assessment=SCANNED_RESUME["Summary__evaluation"][
286 | "evaluation__summary"
287 | ],
288 | section_improved_text=SCANNED_RESUME["Summary__evaluation"][
289 | "CV__summary_enhanced"
290 | ],
291 | )
292 |
293 | # 3. Work Experience
294 |
295 | st.write("")
296 | st.write("")
297 | st.subheader(f"work experience - 🎯 **{dict_scores['work_experience']}**/100")
298 |
299 | if len(SCANNED_RESUME["Work__experience"]) == 0:
300 | st.info("No work experience results.")
301 | else:
302 | for work_experience in SCANNED_RESUME["Work__experience"]:
303 | display_section_results(
304 | expander_label=f"{work_experience['job__title']}",
305 | expander_header_fields=[
306 | [
307 | f"**Company:**\n {work_experience['job__company']}",
308 | f"**📅**\n {work_experience['job__start_date']} - {work_experience['job__end_date']}",
309 | ]
310 | ],
311 | expander_header_links=None,
312 | score=work_experience["Score__WorkExperience"],
313 | section_original_text_header="**📋 Responsibilities:**",
314 | section_original_text=list(
315 | work_experience["work__duties"].values()
316 | ),
317 | original_text_bullet_points=True,
318 | section_assessment=work_experience["Comments__WorkExperience"],
319 | section_improved_text=work_experience[
320 | "Improvement__WorkExperience"
321 | ],
322 | )
323 |
324 | # 4. Skills
325 |
326 | st.write("")
327 | st.write("")
328 | st.subheader(f"Skills - 🎯 **{dict_scores['skills']}**/100")
329 | display_section_results(
330 | expander_label="💪 Skills",
331 | expander_header_fields=None,
332 | expander_header_links=None,
333 | score=dict_scores["skills"],
334 | section_original_text_header=None,
335 | section_original_text=[SCANNED_RESUME["candidate__skills"]],
336 | original_text_bullet_points=True,
337 | section_assessment=SCANNED_RESUME["Skills__evaluation"][
338 | "evaluation__skills"
339 | ],
340 | section_improved_text=None,
341 | )
342 |
343 | # 5. Education
344 |
345 | st.write("")
346 | st.write("")
347 | st.subheader(f"Education - 🎯 **{dict_scores['education']}**/100")
348 | with st.expander(f"🎓 Educational background and academic achievements."):
349 | st.write("")
350 | list_education = SCANNED_RESUME["CV__Education"]
351 | if not isinstance(list_education, list):
352 | st.markdown(f"- {list_education}")
353 | else:
354 | for edu in list_education:
355 | col1, col2 = st.columns([6, 4])
356 | with col1:
357 | st.markdown(f"**🎓 Degree:** {edu['edu__degree']}")
358 | with col2:
359 | st.markdown(
360 | f"**📅** {edu['edu__start_date']} - {edu['edu__end_date']}"
361 | )
362 | st.markdown(f"**🏛️** {edu['edu__college']}")
363 | st.divider()
364 |
365 | display_assessment(
366 | score=dict_scores["education"],
367 | section_assessment=SCANNED_RESUME["Education__evaluation"][
368 | "evaluation__edu"
369 | ],
370 | )
371 |
372 | # 6. Language (Optional section)
373 |
374 | st.divider()
375 | st.subheader(f"Language - 🎯 **{dict_scores['language']}**/100")
376 | languages = []
377 | for language in SCANNED_RESUME["CV__Languages"]:
378 | languages.append(
379 | f"**🗣 {language['spoken__language']}** : {language['language__fluency']}"
380 | )
381 | display_section_results(
382 | expander_label="🗣 Language",
383 | expander_header_fields=None,
384 | expander_header_links=None,
385 | score=dict_scores["language"],
386 | section_original_text_header=None,
387 | section_original_text=languages,
388 | original_text_bullet_points=False,
389 | section_assessment=SCANNED_RESUME["Languages__evaluation"][
390 | "evaluation__language"
391 | ],
392 | section_improved_text=None,
393 | )
394 |
395 | # 7. CERTIFICATIONS (optional section)
396 |
397 | st.write("")
398 | st.write("")
399 | st.subheader(f"Certifications - 🎯 **{dict_scores['certfication']}**/100")
400 | with st.expander("🏅 Certifications"):
401 | st.write("")
402 | list_certifs = SCANNED_RESUME["CV__Certifications"]
403 | if not isinstance(list_certifs, list):
404 | st.markdown(f"- {list_certifs}")
405 | else:
406 | for certif in list_certifs:
407 | col1, col2 = st.columns([6, 4])
408 | with col1:
409 | st.markdown(f"**🏅 Title:** {certif['certif__title']}")
410 | with col2:
411 | st.markdown(f"**📅** {certif['certif__date']} ")
412 | st.markdown(f"**🏛️** {certif['certif__organization']}")
413 |
414 | if certif["certif__expiry_date"].lower() != "unknown":
415 | st.markdown(
416 | f"**📅 Expiry date:** {certif['certif__expiry_date']}"
417 | )
418 | if certif["certif__details"].lower() != "unknown":
419 | st.write("")
420 | st.markdown(f"{certif['certif__details']}")
421 | st.divider()
422 |
423 | display_assessment(
424 | score=dict_scores["certfication"],
425 | section_assessment=SCANNED_RESUME["Certif__evaluation"][
426 | "evaluation__certif"
427 | ],
428 | )
429 |
430 | # 8. Projects (Optional section)
431 |
432 | st.write("")
433 | st.write("")
434 | st.subheader(f"Projects - 🎯 **{dict_scores['projects']}**/100")
435 | if len(SCANNED_RESUME["CV__Projects"]) == 0:
436 | st.info("No projects found.")
437 | else:
438 | for project in SCANNED_RESUME["CV__Projects"]:
439 | display_section_results(
440 | expander_label=f"{project['project__title']}",
441 | expander_header_fields=[
442 | f"**📅**\n {project['project__start_date']} - {project['project__end_date']}"
443 | ],
444 | expander_header_links=None,
445 | score=project["Score__project"],
446 | section_original_text_header="**📋 Project details:**",
447 | section_original_text=[project["project__description"]],
448 | original_text_bullet_points=True,
449 | section_assessment=project["Comments__project"],
450 | section_improved_text=project["Improvement__project"],
451 | )
452 |
453 | except Exception as exception:
454 | print(exception)
455 |
--------------------------------------------------------------------------------
/Streamlit_App/app_sidebar.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 |
3 | from app_constants import list_Assistant_Languages, list_LLM_providers
4 |
5 |
6 | def expander_model_parameters(
7 | LLM_provider="OpenAI",
8 | text_input_API_key="OpenAI API Key - [Get an API key](https://platform.openai.com/account/api-keys)",
9 | list_models=["gpt-3.5-turbo-0125", "gpt-3.5-turbo", "gpt-4-turbo-preview"],
10 | openai_api_key="",
11 | google_api_key="",
12 | ):
13 | """Add a text_input (for the API key) and a streamlit expander with models and parameters."""
14 |
15 | st.session_state.LLM_provider = LLM_provider
16 |
17 | if LLM_provider == "OpenAI":
18 | st.session_state.openai_api_key = st.text_input(
19 | text_input_API_key,
20 | value=openai_api_key,
21 | type="password",
22 | placeholder="insert your API key",
23 | )
24 |
25 | if LLM_provider == "Google":
26 | st.session_state.google_api_key = st.text_input(
27 | text_input_API_key,
28 | type="password",
29 | value=google_api_key,
30 | placeholder="insert your API key",
31 | )
32 |
33 | with st.expander("**Models and parameters**"):
34 | st.session_state.selected_model = st.selectbox(
35 | f"Choose {LLM_provider} model", list_models
36 | )
37 | # model parameters
38 | st.session_state.temperature = st.slider(
39 | "temperature",
40 | min_value=0.1,
41 | max_value=1.0,
42 | value=0.7,
43 | step=0.1,
44 | )
45 | st.session_state.top_p = st.slider(
46 | "top_p",
47 | min_value=0.1,
48 | max_value=1.0,
49 | value=0.95,
50 | step=0.05,
51 | )
52 |
53 |
54 | def sidebar(openai_api_key, google_api_key, cohere_api_key):
55 | """Create the sidebar."""
56 |
57 | with st.sidebar:
58 | st.caption(
59 | "🚀 A resume scanner powered by 🔗 Langchain, OpenAI and Google Generative AI"
60 | )
61 | st.write("")
62 |
63 | llm_chooser = st.radio(
64 | "Select provider",
65 | list_LLM_providers,
66 | captions=[
67 | "[OpenAI pricing page](https://openai.com/pricing)",
68 | "Rate limit: 60 requests per minute.",
69 | ],
70 | )
71 |
72 | st.divider()
73 | if llm_chooser == list_LLM_providers[0]:
74 | expander_model_parameters(
75 | LLM_provider="OpenAI",
76 | text_input_API_key="OpenAI API Key - [Get an API key](https://platform.openai.com/account/api-keys)",
77 | list_models=[
78 | "gpt-3.5-turbo-0125",
79 | "gpt-3.5-turbo",
80 | "gpt-4-turbo-preview",
81 | ],
82 | openai_api_key=openai_api_key,
83 | google_api_key=google_api_key,
84 | )
85 |
86 | if llm_chooser == list_LLM_providers[1]:
87 | expander_model_parameters(
88 | LLM_provider="Google",
89 | text_input_API_key="Google API Key - [Get an API key](https://makersuite.google.com/app/apikey)",
90 | list_models=["gemini-pro"],
91 | openai_api_key=openai_api_key,
92 | google_api_key=google_api_key,
93 | )
94 |
95 | # Cohere API Key
96 | st.write("")
97 | st.session_state.cohere_api_key = st.text_input(
98 | "Coher API Key - [Get an API key](https://dashboard.cohere.com/api-keys)",
99 | type="password",
100 | value=cohere_api_key,
101 | placeholder="insert your API key",
102 | )
103 |
104 | # Assistant language
105 | st.divider()
106 | st.session_state.assistant_language = st.selectbox(
107 | f"Assistant language", list_Assistant_Languages
108 | )
109 |
--------------------------------------------------------------------------------
/Streamlit_App/data/Images/Education.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Education.png
--------------------------------------------------------------------------------
/Streamlit_App/data/Images/Language.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Language.png
--------------------------------------------------------------------------------
/Streamlit_App/data/Images/Leonardo_AI.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Leonardo_AI.jpg
--------------------------------------------------------------------------------
/Streamlit_App/data/Images/app.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/app.png
--------------------------------------------------------------------------------
/Streamlit_App/data/Images/contact_information.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/contact_information.png
--------------------------------------------------------------------------------
/Streamlit_App/data/Images/scores.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/scores.png
--------------------------------------------------------------------------------
/Streamlit_App/data/Images/top_3_strengths.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/top_3_strengths.png
--------------------------------------------------------------------------------
/Streamlit_App/data/Images/work_experience.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/work_experience.png
--------------------------------------------------------------------------------
/Streamlit_App/keys.env:
--------------------------------------------------------------------------------
1 | api_key_openai = "Your_API_key"
2 | api_key_google = "Your_API_key"
3 | api_key_cohere = "Your_API_key"
--------------------------------------------------------------------------------
/Streamlit_App/llm_functions.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 |
3 | # LLM: openai
4 | from langchain_openai import ChatOpenAI
5 |
6 | # LLM: google_genai
7 | from langchain_google_genai import ChatGoogleGenerativeAI
8 |
9 | # dotenv and os
10 | from dotenv import load_dotenv, find_dotenv
11 | import os
12 |
13 |
14 | def get_api_keys_from_local_env():
15 | """Get OpenAI, Gemini and Cohere API keys from local .env file"""
16 | try:
17 | found_dotenv = find_dotenv("keys.env", usecwd=True)
18 | load_dotenv(found_dotenv)
19 | try:
20 | openai_api_key = os.getenv("api_key_openai")
21 | except:
22 | openai_api_key = ""
23 | try:
24 | google_api_key = os.getenv("api_key_google")
25 | except:
26 | google_api_key = ""
27 | try:
28 | cohere_api_key = os.getenv("api_key_cohere")
29 | except:
30 | cohere_api_key = ""
31 | except Exception as e:
32 | print(e)
33 |
34 | return openai_api_key, google_api_key, cohere_api_key
35 |
36 |
37 | def instantiate_LLM(
38 | LLM_provider, api_key, temperature=0.5, top_p=0.95, model_name=None
39 | ):
40 | """Instantiate LLM in Langchain.
41 | Parameters:
42 | LLM_provider (str): the LLM provider; in ["OpenAI","Google"]
43 | model_name (str): in ["gpt-3.5-turbo", "gpt-3.5-turbo-0125", "gpt-4-turbo-preview","gemini-pro"].
44 | api_key (str): google_api_key or openai_api_key
45 | temperature (float): Range: 0.0 - 1.0; default = 0.5
46 | top_p (float): : Range: 0.0 - 1.0; default = 1.
47 | """
48 | if LLM_provider == "OpenAI":
49 | llm = ChatOpenAI(
50 | api_key=api_key,
51 | model=model_name,
52 | temperature=temperature,
53 | model_kwargs={"top_p": top_p},
54 | )
55 | if LLM_provider == "Google":
56 | llm = ChatGoogleGenerativeAI(
57 | google_api_key=api_key,
58 | # model="gemini-pro",
59 | model=model_name,
60 | temperature=temperature,
61 | top_p=top_p,
62 | convert_system_message_to_human=True,
63 | )
64 |
65 | return llm
66 |
67 |
68 | def instantiate_LLM_main(temperature, top_p):
69 | """Instantiate the selected LLM model."""
70 | try:
71 | if st.session_state.LLM_provider == "OpenAI":
72 | llm = instantiate_LLM(
73 | "OpenAI",
74 | api_key=st.session_state.openai_api_key,
75 | temperature=temperature,
76 | top_p=top_p,
77 | model_name=st.session_state.selected_model,
78 | )
79 | else:
80 | llm = instantiate_LLM(
81 | "Google",
82 | api_key=st.session_state.google_api_key,
83 | temperature=temperature,
84 | top_p=top_p,
85 | model_name=st.session_state.selected_model,
86 | )
87 | except Exception as e:
88 | st.error(f"An error occured: {e}")
89 | llm = None
90 | return llm
91 |
--------------------------------------------------------------------------------
/Streamlit_App/resume_analyzer.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | import json, warnings
3 |
4 | warnings.filterwarnings("ignore", category=FutureWarning)
5 |
6 | import datetime, json
7 |
8 | from langchain.prompts import PromptTemplate
9 |
10 | from app_constants import (
11 | templates,
12 | PROMPT_IMPROVE_WORK_EXPERIENCE,
13 | PROMPT_IMPROVE_PROJECT,
14 | PROMPT_EVALUATE_RESUME,
15 | PROMPT_IMPROVE_SUMMARY,
16 | )
17 | import retrieval
18 |
19 |
20 | def create_prompt_template(resume_sections, language="english"):
21 | """create the promptTemplate.
22 | Parameters:
23 | resume_sections (list): List of CV sections from which information will be extracted.
24 | language (str): the language of the assistant, default="english".
25 | """
26 |
27 | # Create the Template
28 | template = f"""For the following resume, output in {language} the following information:\n\n"""
29 |
30 | for key in resume_sections:
31 | template += key + ": " + templates[key] + "\n\n"
32 |
33 | template += "For any requested information, if it is not found, output 'unknown' or ['unknown'] accordingly.\n\n"
34 | template += (
35 | """Format the final output as a json dictionary with the following keys: ("""
36 | )
37 |
38 | for key in resume_sections:
39 | template += "" + key + ", "
40 | template = template[:-2] + ")" # remove the last ", "
41 |
42 | template += """\n\nResume: {text}"""
43 |
44 | # Create the PromptTemplate
45 | prompt_template = PromptTemplate.from_template(template)
46 |
47 | return prompt_template
48 |
49 |
50 | def extract_from_text(text, start_tag, end_tag=None):
51 | """Use start and end tags to extract a substring from text.
52 | This helper function is used to parse the response content on the LLM in case 'json.loads' fails.
53 | """
54 | start_index = text.find(start_tag)
55 | if end_tag is None:
56 | extacted_txt = text[start_index + len(start_tag) :]
57 | else:
58 | end_index = text.find(end_tag)
59 | extacted_txt = text[start_index + len(start_tag) : end_index]
60 |
61 | return extacted_txt
62 |
63 |
64 | def convert_text_to_list_of_dicts(text, dict_keys):
65 | """Convert text to a python list of dicts.
66 | Parameters:
67 | - text: string containing a list of dicts
68 | - dict_keys (list): the keys of the dictionary which will be returned.
69 | Output:
70 | - list_of_dicts (list): the list of dicts to return.
71 | """
72 | list_of_dicts = []
73 |
74 | if text != "":
75 | text_splitted = text.split("},\n")
76 | dict_keys.append(None)
77 |
78 | for i in range(len(text_splitted)):
79 | dict_i = {}
80 |
81 | for j in range(len(dict_keys) - 1):
82 | key_value = extract_from_text(
83 | text_splitted[i], f'"{dict_keys[j]}": ', f'"{dict_keys[j+1]}": '
84 | )
85 | key_value = key_value[: key_value.rfind(",\n")].strip()[1:-1]
86 | dict_i[dict_keys[j]] = key_value
87 |
88 | list_of_dicts.append(dict_i) # add the dict to the list.
89 |
90 | return list_of_dicts
91 |
92 |
93 | def get_current_time():
94 | current_time = (datetime.datetime.now()).strftime("%H:%M:%S")
95 | return current_time
96 |
97 |
98 | def invoke_LLM(
99 | llm,
100 | documents,
101 | resume_sections: list,
102 | info_message="",
103 | language="english",
104 | ):
105 | """Invoke LLM and get a response.
106 | Parameters:
107 | - llm: the LLM to call
108 | - documents: our Langchain Documents. Will be use to format the prompt_template.
109 | - resume_sections (list): List of resume sections to be parsed.
110 | - info_message (str): display an informational message.
111 | - language (str): Assistant language. Will be use to format the prompt_template.
112 |
113 | Output:
114 | - response_content (str): the content of the LLM response.
115 | - response_tokens_count (int): count of response tokens.
116 | """
117 |
118 | # 1. display the info message
119 | st.info(f"**{get_current_time()}** \t{info_message}")
120 | print(f"**{get_current_time()}** \t{info_message}")
121 |
122 | # 2. Create the promptTemplate.
123 | prompt_template = create_prompt_template(
124 | resume_sections,
125 | language=st.session_state.assistant_language,
126 | )
127 |
128 | # 3. Format promptTemplate with the full documents
129 | if language is not None:
130 | prompt = prompt_template.format_prompt(text=documents, language=language).text
131 | else:
132 | prompt = prompt_template.format_prompt(text=documents).text
133 |
134 | # 4. Invoke LLM
135 | response = llm.invoke(prompt)
136 |
137 | response_content = response.content[
138 | response.content.find("{") : response.content.rfind("}") + 1
139 | ]
140 | response_tokens_count = sum(retrieval.tiktoken_tokens([response_content]))
141 |
142 | return response_content, response_tokens_count
143 |
144 |
145 | def ResponseContent_Parser(
146 | response_content, list_fields, list_rfind, list_exclude_first_car
147 | ):
148 | """This is a function for parsing any response_content.
149 | Parameters:
150 | - response_content (str): the content of the LLM response we are going to parse.
151 | - list_fields (list): List of dictionary fields returned by this function.
152 | A field can be a dictionary. The key of the dict will not be parsed.
153 | Example: [{'Contact__information':['candidate__location','candidate__email','candidate__phone','candidate__social_media']},
154 | 'CV__summary']
155 | We will not parse the content for 'Contact__information'.
156 | - list_rfind (list): To parse the content of a field, first we will extract the text between this field and the next field.
157 | Then, extract text using `rfind` Python command, which returns the highest index in the text where the substring is found.
158 | - list_exclude_first_car (list): Exclusion or not of the first and last characters.
159 |
160 | Output:
161 | - INFORMATION_dict: dictionary, where fields are the keys and parsed texts are the values.
162 |
163 | """
164 |
165 | list_fields_detailed = (
166 | []
167 | ) # list of tupples. tupple = (field,extract info (boolean), parent field)
168 |
169 | for field in list_fields:
170 | if type(field) is dict:
171 | list_fields_detailed.append(
172 | (list(field.keys())[0], False, None)
173 | ) # We will not extract any value for the text between this tag and the next.
174 | for val in list(field.values())[0]:
175 | list_fields_detailed.append((val, True, list(field.keys())[0]))
176 | else:
177 | list_fields_detailed.append((field, True, None))
178 |
179 | list_fields_detailed.append((None, False, None))
180 |
181 | # Parse the response_content
182 | INFORMATION_dict = {}
183 |
184 | for i in range(len(list_fields_detailed) - 1):
185 | if list_fields_detailed[i][1] is False: # Extract info = False
186 | INFORMATION_dict[list_fields_detailed[i][0]] = {} # Initialize the dict
187 | if list_fields_detailed[i][1]:
188 | extracted_value = extract_from_text(
189 | response_content,
190 | f'"{list_fields_detailed[i][0]}": ',
191 | f'"{list_fields_detailed[i+1][0]}":',
192 | )
193 | extracted_value = extracted_value[
194 | : extracted_value.rfind(list_rfind[i])
195 | ].strip()
196 | if list_exclude_first_car[i]:
197 | extracted_value = extracted_value[1:-1].strip()
198 | if list_fields_detailed[i][2] is None:
199 | INFORMATION_dict[list_fields_detailed[i][0]] = extracted_value
200 | else:
201 | INFORMATION_dict[list_fields_detailed[i][2]][
202 | list_fields_detailed[i][0]
203 | ] = extracted_value
204 |
205 | return INFORMATION_dict
206 |
207 |
208 | def Extract_contact_information(llm, documents):
209 | """Extract Contact Information: Name, Title, Location, Email, Phone number and Social media profiles."""
210 |
211 | try:
212 | response_content, response_tokens_count = invoke_LLM(
213 | llm,
214 | documents,
215 | resume_sections=["Contact__information"],
216 | info_message="Extract and evaluate contact information...",
217 | language=st.session_state.assistant_language,
218 | )
219 |
220 | try:
221 | # Load response_content to json dictionary
222 | CONTACT_INFORMATION = json.loads(response_content, strict=False)
223 | except Exception as e:
224 | print("[ERROR] json.loads returns error:", e)
225 | print("\n[INFO] Parse response content...\n")
226 |
227 | list_fields = [
228 | {
229 | "Contact__information": [
230 | "candidate__name",
231 | "candidate__title",
232 | "candidate__location",
233 | "candidate__email",
234 | "candidate__phone",
235 | "candidate__social_media",
236 | "evaluation__ContactInfo",
237 | "score__ContactInfo",
238 | ]
239 | }
240 | ]
241 | list_rfind = [",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", "}\n"]
242 | list_exclude_first_car = [
243 | True,
244 | True,
245 | True,
246 | True,
247 | True,
248 | True,
249 | False,
250 | True,
251 | False,
252 | ]
253 | CONTACT_INFORMATION = ResponseContent_Parser(
254 | response_content, list_fields, list_rfind, list_exclude_first_car
255 | )
256 | # convert score to int
257 | try:
258 | CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"] = int(
259 | CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"]
260 | )
261 | except:
262 | CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"] = -1
263 |
264 | except Exception as exception:
265 | print(f"[Error] {exception}")
266 | CONTACT_INFORMATION = {
267 | "Contact__information": {
268 | "candidate__name": "unknown",
269 | "candidate__title": "unknown",
270 | "candidate__location": "unknown",
271 | "candidate__email": "unknown",
272 | "candidate__phone": "unknown",
273 | "candidate__social_media": "unknown",
274 | "evaluation__ContactInfo": "unknown",
275 | "score__ContactInfo": -1,
276 | }
277 | }
278 |
279 | return CONTACT_INFORMATION
280 |
281 |
282 | def Extract_Evaluate_Summary(llm, documents):
283 | """Extract, evaluate and strengthen the summary."""
284 |
285 | ######################################
286 | # 1. Extract the summary
287 | ######################################
288 | try:
289 | response_content, response_tokens_count = invoke_LLM(
290 | llm,
291 | documents,
292 | resume_sections=["CV__summary"],
293 | info_message="Extract and evaluate the Summary....",
294 | language=st.session_state.assistant_language,
295 | )
296 | try:
297 | # Load response_content to json dictionary
298 | SUMMARY_SECTION = json.loads(response_content, strict=False)
299 | except Exception as e:
300 | print("[ERROR] json.loads returns error:", e)
301 | print("\n[INFO] Parse response content...\n")
302 |
303 | list_fields = ["CV__summary"]
304 | list_rfind = ["}\n"]
305 | list_exclude_first_car = [True]
306 |
307 | SUMMARY_SECTION = ResponseContent_Parser(
308 | response_content, list_fields, list_rfind, list_exclude_first_car
309 | )
310 |
311 | except Exception as exception:
312 | print(f"[Error] {exception}")
313 | SUMMARY_SECTION = {"CV__summary": "unknown"}
314 |
315 | ######################################
316 | # 2. Evaluate the summary
317 | ######################################
318 |
319 | try:
320 | prompt_template = PromptTemplate.from_template(PROMPT_IMPROVE_SUMMARY)
321 |
322 | prompt = prompt_template.format_prompt(
323 | resume=documents,
324 | language=st.session_state.assistant_language,
325 | summary=SUMMARY_SECTION["CV__summary"],
326 | ).text
327 |
328 | # Invoke LLM
329 | response = llm.invoke(prompt)
330 | response_content = response.content[
331 | response.content.find("{") : response.content.rfind("}") + 1
332 | ]
333 |
334 | try:
335 | SUMMARY_EVAL = {}
336 | SUMMARY_EVAL["Summary__evaluation"] = json.loads(
337 | response_content, strict=False
338 | )
339 | except Exception as e:
340 | print("[ERROR] json.loads returns error:", e)
341 | print("\n[INFO] Parse response content...\n")
342 |
343 | list_fields = [
344 | "evaluation__summary",
345 | "score__summary",
346 | "CV__summary_enhanced",
347 | ]
348 | list_rfind = [",\n", ",\n", "}\n"]
349 | list_exclude_first_car = [True, False, True]
350 | SUMMARY_EVAL["Summary__evaluation"] = ResponseContent_Parser(
351 | response_content, list_fields, list_rfind, list_exclude_first_car
352 | )
353 | # convert score to int
354 | try:
355 | SUMMARY_EVAL["Summary__evaluation"]["score__summary"] = int(
356 | SUMMARY_EVAL["Summary__evaluation"]["score__summary"]
357 | )
358 | except:
359 | SUMMARY_EVAL["Summary__evaluation"]["score__summary"] = -1
360 |
361 | except Exception as e:
362 | print(e)
363 | SUMMARY_EVAL = {
364 | "Summary__evaluation": {
365 | "evaluation__summary": "unknown",
366 | "score__summary": -1,
367 | "CV__summary_enhanced": "unknown",
368 | }
369 | }
370 |
371 | SUMMARY_EVAL["CV__summary"] = SUMMARY_SECTION["CV__summary"]
372 |
373 | return SUMMARY_EVAL
374 |
375 |
376 | def Extract_Education_Language(llm, documents):
377 | """Extract and evaluate education and language sections."""
378 |
379 | try:
380 | response_content, response_tokens_count = invoke_LLM(
381 | llm,
382 | documents,
383 | resume_sections=[
384 | "CV__Education",
385 | "Education__evaluation",
386 | "CV__Languages",
387 | "Languages__evaluation",
388 | ],
389 | info_message="Extract and evaluate education and language sections...",
390 | language=st.session_state.assistant_language,
391 | )
392 |
393 | try:
394 | # Load response_content to json dictionary
395 | Education_Language_sections = json.loads(response_content, strict=False)
396 | except Exception as e:
397 | print("[ERROR] json.loads returns error:", e)
398 | print("\n[INFO] Parse response content...\n")
399 |
400 | list_fields = [
401 | "CV__Education",
402 | {"Education__evaluation": ["score__edu", "evaluation__edu"]},
403 | "CV__Languages",
404 | {"Languages__evaluation": ["score__language", "evaluation__language"]},
405 | ]
406 |
407 | list_rfind = [",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", "\n"]
408 | list_exclude_first_car = [True, True, False, True, True, True, False, True]
409 |
410 | Education_Language_sections = ResponseContent_Parser(
411 | response_content, list_fields, list_rfind, list_exclude_first_car
412 | )
413 |
414 | # Convert scores to int
415 | try:
416 | Education_Language_sections["Education__evaluation"]["score__edu"] = (
417 | int(
418 | Education_Language_sections["Education__evaluation"][
419 | "score__edu"
420 | ]
421 | )
422 | )
423 | except:
424 | Education_Language_sections["Education__evaluation"]["score__edu"] = -1
425 |
426 | try:
427 | Education_Language_sections["Languages__evaluation"][
428 | "score__language"
429 | ] = int(
430 | Education_Language_sections["Languages__evaluation"][
431 | "score__language"
432 | ]
433 | )
434 | except:
435 | Education_Language_sections["Languages__evaluation"][
436 | "score__language"
437 | ] = -1
438 |
439 | # Split languages and educational texts into a Python list of dict
440 | languages = Education_Language_sections["CV__Languages"]
441 | Education_Language_sections["CV__Languages"] = (
442 | convert_text_to_list_of_dicts(
443 | text=languages[
444 | languages.find("[") + 1 : languages.rfind("]")
445 | ].strip(),
446 | dict_keys=["spoken__language", "language__fluency"],
447 | )
448 | )
449 | education = Education_Language_sections["CV__Education"]
450 | Education_Language_sections["CV__Education"] = (
451 | convert_text_to_list_of_dicts(
452 | text=education[
453 | education.find("[") + 1 : education.rfind("]")
454 | ].strip(),
455 | dict_keys=[
456 | "edu__college",
457 | "edu__degree",
458 | "edu__start_date",
459 | "edu__end_date",
460 | ],
461 | )
462 | )
463 | except Exception as exception:
464 | print(exception)
465 | Education_Language_sections = {
466 | "CV__Education": [],
467 | "Education__evaluation": {"score__edu": -1, "evaluation__edu": "unknown"},
468 | "CV__Languages": [],
469 | "Languages__evaluation": {
470 | "score__language": -1,
471 | "evaluation__language": "unknown",
472 | },
473 | }
474 |
475 | return Education_Language_sections
476 |
477 |
478 | def Extract_Skills_and_Certifications(llm, documents):
479 | """Extract skills and certifications and evaluate these sections."""
480 |
481 | try:
482 | response_content, response_tokens_count = invoke_LLM(
483 | llm,
484 | documents,
485 | resume_sections=[
486 | "candidate__skills",
487 | "Skills__evaluation",
488 | "CV__Certifications",
489 | "Certif__evaluation",
490 | ],
491 | info_message="Extract and evaluate the skills and certifications...",
492 | language=st.session_state.assistant_language,
493 | )
494 |
495 | try:
496 | # Load response_content to json dictionary
497 | SKILLS_and_CERTIF = json.loads(response_content, strict=False)
498 | except Exception as e:
499 | print("[ERROR] json.loads returns error:", e)
500 | print("\n[INFO] Parse response content...\n")
501 |
502 | skills = extract_from_text(
503 | response_content, '"candidate__skills": ', '"Skills__evaluation":'
504 | )
505 | skills = skills.replace("\n ", "\n").replace("],\n", "").replace("[\n", "")
506 | score_skills = extract_from_text(
507 | response_content, '"score__skills": ', '"evaluation__skills":'
508 | )
509 | evaluation_skills = extract_from_text(
510 | response_content, '"evaluation__skills": ', '"CV__Certifications":'
511 | )
512 |
513 | certif_text = extract_from_text(
514 | response_content, '"CV__Certifications": ', '"Certif__evaluation":'
515 | )
516 | certif_score = extract_from_text(
517 | response_content, '"score__certif": ', '"evaluation__certif":'
518 | )
519 | certif_eval = extract_from_text(
520 | response_content, '"evaluation__certif": ', None
521 | )
522 |
523 | # Create the dictionary
524 | SKILLS_and_CERTIF = {}
525 | SKILLS_and_CERTIF["candidate__skills"] = [
526 | skill.strip()[1:-1] for skill in skills.split(",\n")
527 | ]
528 | try:
529 | score_skills_int = int(score_skills[0 : score_skills.rfind(",\n")])
530 | except:
531 | score_skills_int = -1
532 | SKILLS_and_CERTIF["Skills__evaluation"] = {
533 | "score__skills": score_skills_int,
534 | "evaluation__skills": evaluation_skills[
535 | : evaluation_skills.rfind("}\n")
536 | ].strip()[1:-1],
537 | }
538 |
539 | # Convert certificate text to list of dictionaries
540 | list_certifs = convert_text_to_list_of_dicts(
541 | text=certif_text[
542 | certif_text.find("[") + 1 : certif_text.rfind("]")
543 | ].strip(), # .strip()[1:-1]
544 | dict_keys=[
545 | "certif__title",
546 | "certif__organization",
547 | "certif__date",
548 | "certif__expiry_date",
549 | "certif__details",
550 | ],
551 | )
552 | SKILLS_and_CERTIF["CV__Certifications"] = list_certifs
553 | try:
554 | certif_score_int = int(certif_score[0 : certif_score.rfind(",\n")])
555 | except:
556 | certif_score_int = -1
557 | SKILLS_and_CERTIF["Certif__evaluation"] = {
558 | "score__certif": certif_score_int,
559 | "evaluation__certif": certif_eval[: certif_eval.rfind("}\n")].strip()[
560 | 1:-1
561 | ],
562 | }
563 |
564 | except Exception as exception:
565 | SKILLS_and_CERTIF = {
566 | "candidate__skills": [],
567 | "Skills__evaluation": {
568 | "score__skills": -1,
569 | "evaluation__skills": "unknown",
570 | },
571 | "CV__Certifications": [],
572 | "Certif__evaluation": {
573 | "score__certif": -1,
574 | "evaluation__certif": "unknown",
575 | },
576 | }
577 | print(exception)
578 |
579 | return SKILLS_and_CERTIF
580 |
581 |
582 | def Extract_PROFESSIONAL_EXPERIENCE(llm, documents):
583 | """Extract list of work experience and projects."""
584 |
585 | try:
586 | response_content, response_tokens_count = invoke_LLM(
587 | llm,
588 | documents,
589 | resume_sections=["Work__experience", "CV__Projects"],
590 | info_message="Extract list of work experience and projects...",
591 | language=st.session_state.assistant_language,
592 | )
593 |
594 | try:
595 | # Load response_content to json dictionary
596 | PROFESSIONAL_EXPERIENCE = json.loads(response_content, strict=False)
597 | except Exception as e:
598 | print("[ERROR] json.loads returns error:", e)
599 | print("\n[INFO] Parse response content...\n")
600 |
601 | work_experiences = extract_from_text(
602 | response_content, '"Work__experience": ', '"CV__Projects":'
603 | )
604 | projects = extract_from_text(response_content, '"CV__Projects": ', None)
605 |
606 | # Create the dictionary
607 | PROFESSIONAL_EXPERIENCE = {}
608 | PROFESSIONAL_EXPERIENCE["Work__experience"] = convert_text_to_list_of_dicts(
609 | text=work_experiences[
610 | work_experiences.find("[") + 1 : work_experiences.rfind("]")
611 | ].strip()[1:-1],
612 | dict_keys=[
613 | "job__title",
614 | "job__company",
615 | "job__start_date",
616 | "job__end_date",
617 | ],
618 | )
619 | PROFESSIONAL_EXPERIENCE["CV__Projects"] = convert_text_to_list_of_dicts(
620 | text=projects[projects.find("[") + 1 : projects.rfind("]")].strip()[
621 | 1:-1
622 | ],
623 | dict_keys=[
624 | "project__title",
625 | "project__start_date",
626 | "project__end_date",
627 | ],
628 | )
629 | # Exclude 'unknown' projects and work experiences
630 | try:
631 | for work_experience in PROFESSIONAL_EXPERIENCE["Work__experience"]:
632 | if work_experience["job__title"] == "unknown":
633 | PROFESSIONAL_EXPERIENCE["Work__experience"].remove(work_experience)
634 | except Exception as e:
635 | print(e)
636 | try:
637 | for project in PROFESSIONAL_EXPERIENCE["CV__Projects"]:
638 | if project["project__title"] == "unknown":
639 | PROFESSIONAL_EXPERIENCE["CV__Projects"].remove(project)
640 | except Exception as e:
641 | print(e)
642 |
643 | except Exception as exception:
644 | PROFESSIONAL_EXPERIENCE = {"Work__experience": [], "CV__Projects": []}
645 | print(exception)
646 |
647 | return PROFESSIONAL_EXPERIENCE
648 |
649 |
650 | def get_relevant_documents(query, documents):
651 | """Retreieve most relevant documents from Langchain documents using the CoherRerank retriever."""
652 |
653 | # 1.1. Retrieve documents using the CohereRerank retriever
654 |
655 | retrieved_docs = st.session_state.retriever.get_relevant_documents(query)
656 |
657 | # 1.2. Keep only relevant documents where relevance_score >= (max(relevance_scores) - 0.1)
658 |
659 | relevance_scores = [
660 | retrieved_docs[j].metadata["relevance_score"]
661 | for j in range(len(retrieved_docs))
662 | ]
663 | max_relevance_score = max(relevance_scores)
664 | threshold = max_relevance_score - 0.1
665 |
666 | relevant_doc_ids = []
667 |
668 | for j in range(len(retrieved_docs)):
669 |
670 | # keep relevant documents with (relevance_score >= threshold)
671 |
672 | if retrieved_docs[j].metadata["relevance_score"] >= threshold:
673 | # Append the retrieved document
674 | relevant_doc_ids.append(retrieved_docs[j].metadata["doc_number"])
675 |
676 | # Append the next document to the most relevant document, as relevant information may be split between two documents.
677 | relevant_doc_ids.append(min(relevant_doc_ids[0] + 1, len(documents) - 1))
678 |
679 | # Sort document ids
680 | relevant_doc_ids = sorted(set(relevant_doc_ids))
681 |
682 | # Get the most relevant documents
683 | relevant_documents = [documents[k] for k in relevant_doc_ids]
684 |
685 | return relevant_documents
686 |
687 |
688 | def Extract_Job_Responsibilities(llm, documents, PROFESSIONAL_EXPERIENCE):
689 | """Extract job responsibilities for each job in PROFESSIONAL_EXPERIENCE."""
690 |
691 | st.info(f"**{get_current_time()}** \tExtract work experience responsibilities...")
692 | print(f"**{get_current_time()}** \tExtract work experience responsibilities...")
693 |
694 | for i in range(len(PROFESSIONAL_EXPERIENCE["Work__experience"])):
695 | try:
696 | Work_experience_i = PROFESSIONAL_EXPERIENCE["Work__experience"][i]
697 |
698 | # 1. Extract relevant documents
699 | query = f"""Extract from the resume delimited by triple backticks \
700 | all the duties and responsibilities of the following work experience: \
701 | (title = '{Work_experience_i['job__title']}'"""
702 | if str(Work_experience_i["job__company"]) != "unknown":
703 | query += f" and company = '{Work_experience_i['job__company']}'"
704 | if str(Work_experience_i["job__start_date"]) != "unknown":
705 | query += f" and start date = '{Work_experience_i['job__start_date']}'"
706 | if str(Work_experience_i["job__end_date"]) != "unknown":
707 | query += f" and end date = '{Work_experience_i['job__end_date']}'"
708 | query += ")\n"
709 |
710 | try:
711 | relevant_documents = get_relevant_documents(query, documents)
712 | except Exception as err:
713 | st.error(f"get_relevant_documents error: {err}")
714 | relevant_documents = documents
715 |
716 | # 2. Invoke LLM
717 |
718 | prompt = (
719 | query
720 | + f"""Output the duties in a json dictionary with the following keys (__duty_id__,__duty__). \
721 | Use this format: "1":"duty","2":"another duty".
722 | Resume:\n\n ```{relevant_documents}```"""
723 | )
724 | response = llm.invoke(prompt)
725 |
726 | # 3. Convert the response content to json dict and update work_experience
727 | response_content = response.content[
728 | response.content.find("{") : response.content.rfind("}") + 1
729 | ]
730 |
731 | try:
732 | Work_experience_i["work__duties"] = json.loads(
733 | response_content, strict=False
734 | ) # Convert the response content to a json dict
735 | except Exception as e:
736 | print("\njson.loads returns error:", e, "\n\n")
737 | print("\n[INFO] Parse response content...\n")
738 |
739 | Work_experience_i["work__duties"] = {}
740 | list_duties = (
741 | response_content[
742 | response_content.find("{") + 1 : response_content.rfind("}")
743 | ]
744 | .strip()
745 | .split(",\n")
746 | )
747 |
748 | for j in range(len(list_duties)):
749 | try:
750 | Work_experience_i["work__duties"][f"{j+1}"] = (
751 | list_duties[j].split('":')[1].strip()[1:-1]
752 | )
753 | except:
754 | Work_experience_i["work__duties"][f"{j+1}"] = "unknown"
755 |
756 | except Exception as exception:
757 | Work_experience_i["work__duties"] = {}
758 | print(exception)
759 |
760 | return PROFESSIONAL_EXPERIENCE
761 |
762 |
763 | def Extract_Project_Details(llm, documents, PROFESSIONAL_EXPERIENCE):
764 | """Extract project details for each project in PROFESSIONAL_EXPERIENCE."""
765 |
766 | st.info(f"**{get_current_time()}** \tExtract project details...")
767 | print(f"**{get_current_time()}** \tExtract project details...")
768 |
769 | for i in range(len(PROFESSIONAL_EXPERIENCE["CV__Projects"])):
770 | try:
771 | project_i = PROFESSIONAL_EXPERIENCE["CV__Projects"][i]
772 |
773 | # 1. Extract relevant documents
774 | query = f"""Extract from the resume (delimited by triple backticks) what is listed about the following project: \
775 | (project title = '{project_i['project__title']}'"""
776 | if str(project_i["project__start_date"]) != "unknown":
777 | query += f" and start date = '{project_i['project__start_date']}'"
778 | if str(project_i["project__end_date"]) != "unknown":
779 | query += f" and end date = '{project_i['project__end_date']}'"
780 | query += ")"
781 |
782 | try:
783 | relevant_documents = get_relevant_documents(query, documents)
784 | except Exception as err:
785 | st.error(f"get_relevant_documents error: {err}")
786 | relevant_documents = documents
787 |
788 | # 2. Invoke LLM
789 |
790 | prompt = (
791 | query
792 | + f"""Format the extracted text into a string (with bullet points).
793 | Resume:\n\n ```{relevant_documents}```"""
794 | )
795 |
796 | response = llm.invoke(prompt)
797 |
798 | response_content = response.content
799 | project_i["project__description"] = response_content
800 |
801 | except Exception as exception:
802 | project_i["project__description"] = "unknown"
803 | print(exception)
804 |
805 | return PROFESSIONAL_EXPERIENCE
806 |
807 |
808 | ###############################################################################
809 | # Improve Work Experience and Project texts
810 | ###############################################################################
811 |
812 |
813 | def improve_text_quality(PROMPT, text_to_imporve, llm, language):
814 | """Invoke LLM to improve the text quality."""
815 | query = PROMPT.format(text=text_to_imporve, language=language)
816 | response = llm.invoke(query)
817 | return response
818 |
819 |
820 | def improve_work_experience(WORK_EXPERIENCE: list, llm):
821 | """Improve each bullet point in the work experience responsibilities."""
822 |
823 | message = f"**{get_current_time()}** \tImprove the quality of the work experience section..."
824 | st.info(message)
825 | print(message)
826 |
827 | # Call LLM for any work experience to get a better and stronger text.
828 | for i in range(len(WORK_EXPERIENCE)):
829 | try:
830 | WORK_EXPERIENCE_i = WORK_EXPERIENCE[i]
831 |
832 | # 1. Convert the responsibilities from dict to string
833 |
834 | text_duties = ""
835 | for duty in list(WORK_EXPERIENCE_i["work__duties"].values()):
836 | text_duties += "- " + duty
837 | # 2. Call LLM
838 |
839 | response = improve_text_quality(
840 | PROMPT_IMPROVE_WORK_EXPERIENCE,
841 | text_duties,
842 | llm,
843 | st.session_state.assistant_language,
844 | )
845 | response_content = response.content
846 |
847 | # 3. Convert response content to json dict with keys:
848 | # ('Score__WorkExperience','Comments__WorkExperience','Improvement__WorkExperience')
849 |
850 | response_content = response_content[
851 | response_content.find("{") : response_content.rfind("}") + 1
852 | ]
853 |
854 | try:
855 | list_fields = [
856 | "Score__WorkExperience",
857 | "Comments__WorkExperience",
858 | "Improvement__WorkExperience",
859 | ]
860 | list_rfind = [",\n", ",\n", "\n"]
861 | list_exclude_first_car = [False, True, True]
862 | response_content_dict = ResponseContent_Parser(
863 | response_content, list_fields, list_rfind, list_exclude_first_car
864 | )
865 | try:
866 | response_content_dict["Score__WorkExperience"] = int(
867 | response_content_dict["Score__WorkExperience"]
868 | )
869 | except:
870 | response_content_dict["Score__WorkExperience"] = -1
871 |
872 | except Exception as e:
873 | response_content_dict = {
874 | "Score__WorkExperience": -1,
875 | "Comments__WorkExperience": "",
876 | "Improvement__WorkExperience": "",
877 | }
878 | print(e)
879 | st.error(e)
880 |
881 | # 4. update PROFESSIONAL_EXPERIENCE: Add the new keys (overall_quality, comments, Improvement.)
882 |
883 | WORK_EXPERIENCE_i["Score__WorkExperience"] = response_content_dict[
884 | "Score__WorkExperience"
885 | ]
886 | WORK_EXPERIENCE_i["Comments__WorkExperience"] = response_content_dict[
887 | "Comments__WorkExperience"
888 | ]
889 | WORK_EXPERIENCE_i["Improvement__WorkExperience"] = response_content_dict[
890 | "Improvement__WorkExperience"
891 | ]
892 |
893 | except Exception as exception:
894 | st.error(exception)
895 | print(exception)
896 | WORK_EXPERIENCE_i["Score__WorkExperience"] = -1
897 | WORK_EXPERIENCE_i["Comments__WorkExperience"] = ""
898 | WORK_EXPERIENCE_i["Improvement__WorkExperience"] = ""
899 |
900 | return WORK_EXPERIENCE
901 |
902 |
903 | def improve_projects(PROJECTS: list, llm):
904 | """Improve project text with LLM."""
905 |
906 | st.info(f"**{get_current_time()}** \tImprove the quality of the project section...")
907 | print(f"**{get_current_time()}** \tImprove the quality of the project section...")
908 |
909 | for i in range(len(PROJECTS)):
910 | try:
911 | PROJECT_i = PROJECTS[i] # the ith project.
912 |
913 | # 1. LLM call to improve the text quality of each duty
914 | response = improve_text_quality(
915 | PROMPT_IMPROVE_PROJECT,
916 | PROJECT_i["project__title"] + "\n" + PROJECT_i["project__description"],
917 | llm,
918 | st.session_state.assistant_language,
919 | )
920 | response_content = response.content
921 |
922 | # 2. Convert response content to json dict with keys:
923 | # ('Score__project','Comments__project','Improvement__project')
924 |
925 | response_content = response_content[
926 | response_content.find("{") : response_content.rfind("}") + 1
927 | ]
928 |
929 | try:
930 | list_fields = [
931 | "Score__project",
932 | "Comments__project",
933 | "Improvement__project",
934 | ]
935 | list_rfind = [",\n", ",\n", "\n"]
936 | list_exclude_first_car = [False, True, True]
937 |
938 | response_content_dict = ResponseContent_Parser(
939 | response_content, list_fields, list_rfind, list_exclude_first_car
940 | )
941 | try:
942 | response_content_dict["Score__project"] = int(
943 | response_content_dict["Score__project"]
944 | )
945 | except:
946 | response_content_dict["Score__project"] = -1
947 |
948 | except Exception as e:
949 | response_content_dict = {
950 | "Score__project": -1,
951 | "Comments__project": "",
952 | "Improvement__project": "",
953 | }
954 | print(e)
955 |
956 | # 3. Update PROJECTS
957 | PROJECT_i["Score__project"] = response_content_dict["Score__project"]
958 | PROJECT_i["Comments__project"] = response_content_dict["Comments__project"]
959 | PROJECT_i["Improvement__project"] = response_content_dict[
960 | "Improvement__project"
961 | ]
962 |
963 | except Exception as exception:
964 | print(exception)
965 |
966 | PROJECT_i["Score__project"] = -1
967 | PROJECT_i["Comments__project"] = ""
968 | PROJECT_i["Improvement__project"] = ""
969 |
970 | return PROJECTS
971 |
972 |
973 | ###############################################################################
974 | # Evaluate the Resume
975 | ###############################################################################
976 |
977 |
978 | def Evaluate_the_Resume(llm, documents):
979 | try:
980 | st.info(
981 | f"**{get_current_time()}** \tEvaluate, outline and analyse \
982 | the resume's top 3 strengths and top 3 weaknesses..."
983 | )
984 | print(
985 | f"**{get_current_time()}** \tEvaluate, outline and analyse \
986 | the resume's top 3 strengths and top 3 weaknesses..."
987 | )
988 |
989 | prompt_template = PromptTemplate.from_template(PROMPT_EVALUATE_RESUME)
990 | prompt = prompt_template.format_prompt(
991 | text=documents, language=st.session_state.assistant_language
992 | ).text
993 |
994 | # Invoke LLM
995 | response = llm.invoke(prompt)
996 | response_content = response.content[
997 | response.content.find("{") : response.content.rfind("}") + 1
998 | ]
999 | try:
1000 | RESUME_EVALUATION = json.loads(response_content)
1001 | except Exception as e:
1002 | print("[ERROR] json.loads returns error:", e)
1003 | print("\n[INFO] Parse response content...\n")
1004 |
1005 | list_fields = ["resume_cv_overview", "top_3_strengths", "top_3_weaknesses"]
1006 | list_rfind = [",\n", ",\n", "\n"]
1007 | list_exclude_first_car = [True, True, True]
1008 | RESUME_EVALUATION = ResponseContent_Parser(
1009 | response_content, list_fields, list_rfind, list_exclude_first_car
1010 | )
1011 |
1012 | except Exception as error:
1013 | RESUME_EVALUATION = {
1014 | "resume_cv_overview": "unknown",
1015 | "top_3_strengths": "unknown",
1016 | "top_3_weaknesses": "unknown",
1017 | }
1018 | print(f"An error occured: {error}")
1019 |
1020 | return RESUME_EVALUATION
1021 |
1022 |
1023 | def get_section_scores(SCANNED_RESUME):
1024 | """Output in a dictionary the scores of all the sections of the resume (summary, skills...)"""
1025 | dict_scores = {}
1026 |
1027 | # Summary, Skills, EDUCATION
1028 | dict_scores["ContactInfo"] = max(
1029 | -1, SCANNED_RESUME["Contact__information"]["score__ContactInfo"]
1030 | )
1031 | dict_scores["summary"] = max(
1032 | -1, SCANNED_RESUME["Summary__evaluation"]["score__summary"]
1033 | )
1034 | dict_scores["skills"] = max(
1035 | -1, SCANNED_RESUME["Skills__evaluation"]["score__skills"]
1036 | )
1037 | dict_scores["education"] = max(
1038 | -1, SCANNED_RESUME["Education__evaluation"]["score__edu"]
1039 | )
1040 | dict_scores["language"] = max(
1041 | -1, SCANNED_RESUME["Languages__evaluation"]["score__language"]
1042 | )
1043 |
1044 | dict_scores["certfication"] = max(
1045 | -1, SCANNED_RESUME["Certif__evaluation"]["score__certif"]
1046 | )
1047 |
1048 | # Work__experience: The score is the average of the scores of all the work experiences.
1049 | scores = []
1050 | for work_experience in SCANNED_RESUME["Work__experience"]:
1051 | score = work_experience["Score__WorkExperience"]
1052 | if score > -1:
1053 | scores.append(score)
1054 | try:
1055 | dict_scores["work_experience"] = int(sum(scores) / len(scores))
1056 | except:
1057 | dict_scores["work_experience"] = 0
1058 |
1059 | # Projects: The score is the average of the scores of all projects.
1060 | scores = []
1061 | for project in SCANNED_RESUME["CV__Projects"]:
1062 | score = project["Score__project"]
1063 | if score > -1:
1064 | scores.append(score)
1065 | try:
1066 | dict_scores["projects"] = int(sum(scores) / len(scores))
1067 | except:
1068 | dict_scores["projects"] = 0
1069 |
1070 | return dict_scores
1071 |
1072 |
1073 | ###############################################################################
1074 | # Put it all together
1075 | ###############################################################################
1076 |
1077 |
1078 | def resume_analyzer_main(llm, llm_creative, documents):
1079 | """Put it all together: Extract, evaluate and improve all resume sections.
1080 | Save the final results in a dictionary.
1081 | """
1082 | # 1. Extract Contact information: Name, Title, Location, Email,...
1083 | CONTACT_INFORMATION = Extract_contact_information(llm, documents)
1084 |
1085 | # 2. Extract, evaluate and improve the Summary
1086 | Summary_SECTION = Extract_Evaluate_Summary(llm, documents)
1087 |
1088 | # 3. Extract and evaluate education and language sections.
1089 | Education_Language_sections = Extract_Education_Language(llm, documents)
1090 |
1091 | # 4. Extract and evaluate the SKILLS.
1092 | SKILLS_and_CERTIF = Extract_Skills_and_Certifications(llm, documents)
1093 |
1094 | # 5. Extract Work Experience and Projects.
1095 | PROFESSIONAL_EXPERIENCE = Extract_PROFESSIONAL_EXPERIENCE(llm, documents)
1096 |
1097 | # 6. EXTRACT WORK EXPERIENCE RESPONSIBILITIES.
1098 | PROFESSIONAL_EXPERIENCE = Extract_Job_Responsibilities(
1099 | llm, documents, PROFESSIONAL_EXPERIENCE
1100 | )
1101 |
1102 | # 7. EXTRACT PROJECT DETAILS.
1103 | PROFESSIONAL_EXPERIENCE = Extract_Project_Details(
1104 | llm, documents, PROFESSIONAL_EXPERIENCE
1105 | )
1106 |
1107 | # 8. Improve the quality of the work experience section.
1108 | PROFESSIONAL_EXPERIENCE["Work__experience"] = improve_work_experience(
1109 | WORK_EXPERIENCE=PROFESSIONAL_EXPERIENCE["Work__experience"], llm=llm_creative
1110 | )
1111 |
1112 | # 9. Improve the quality of the project section.
1113 | PROFESSIONAL_EXPERIENCE["CV__Projects"] = improve_projects(
1114 | PROJECTS=PROFESSIONAL_EXPERIENCE["CV__Projects"], llm=llm_creative
1115 | )
1116 |
1117 | # 10. Evaluate the Resume
1118 | RESUME_EVALUATION = Evaluate_the_Resume(llm_creative, documents)
1119 |
1120 | # 11. Put it all together: create the SCANNED_RESUME dictionary
1121 | SCANNED_RESUME = {}
1122 | for dictionary in [
1123 | CONTACT_INFORMATION,
1124 | Summary_SECTION,
1125 | Education_Language_sections,
1126 | SKILLS_and_CERTIF,
1127 | PROFESSIONAL_EXPERIENCE,
1128 | RESUME_EVALUATION,
1129 | ]:
1130 | SCANNED_RESUME.update(dictionary)
1131 |
1132 | # 12. Save the Scanned resume
1133 | try:
1134 | now = (datetime.datetime.now()).strftime("%Y%m%d_%H%M%S")
1135 | file_name = "results_" + now
1136 | with open(f"./data/{file_name}.json", "w") as fp:
1137 | json.dump(SCANNED_RESUME, fp)
1138 | except:
1139 | pass
1140 |
1141 | return SCANNED_RESUME
1142 |
--------------------------------------------------------------------------------
/Streamlit_App/retrieval.py:
--------------------------------------------------------------------------------
1 | # Streamlit
2 | import streamlit as st
3 |
4 | # document loader
5 | from langchain_community.document_loaders import PDFMinerLoader
6 |
7 | # text_splitter
8 | from langchain.text_splitter import RecursiveCharacterTextSplitter
9 |
10 | # Cohere reranker
11 | from langchain.retrievers import ContextualCompressionRetriever
12 | from langchain.retrievers.document_compressors import CohereRerank
13 | from langchain_community.llms import Cohere
14 |
15 | # Embeddings
16 | from langchain_openai import OpenAIEmbeddings
17 | from langchain_google_genai import GoogleGenerativeAIEmbeddings
18 |
19 | # FAISS vector database
20 | from langchain_community.vectorstores import FAISS
21 |
22 | # Other libraries
23 | import os, glob, datetime
24 | from pathlib import Path
25 | import tiktoken
26 | import warnings
27 |
28 | warnings.filterwarnings("ignore", category=FutureWarning)
29 |
30 |
31 | # Data Directories: where temp files and vectorstores will be saved
32 | from app_constants import TMP_DIR
33 |
34 |
35 | def langchain_document_loader(file_path):
36 | """Load and split a PDF file in Langchain.
37 | Parameters:
38 | - file_path (str): path of the file.
39 | Output:
40 | - documents: list of Langchain Documents."""
41 |
42 | if file_path.endswith(".pdf"):
43 | loader = PDFMinerLoader(file_path=file_path)
44 | else:
45 | st.error("You can only upload .pdf files!")
46 |
47 | # 1. Load and split documents
48 | documents = loader.load_and_split()
49 |
50 | # 2. Update the metadata: add document number to metadata
51 | for i in range(len(documents)):
52 | documents[i].metadata = {
53 | "source": documents[i].metadata["source"],
54 | "doc_number": i,
55 | }
56 |
57 | return documents
58 |
59 |
60 | def delte_temp_files():
61 | """delete temp files from TMP_DIR"""
62 | files = glob.glob(TMP_DIR.as_posix() + "/*")
63 | for f in files:
64 | try:
65 | os.remove(f)
66 | except:
67 | pass
68 |
69 |
70 | def save_uploaded_file(uploaded_file):
71 | """Save the uploaded file (output of the Streamlit File Uploader widget) to TMP_DIR."""
72 |
73 | temp_file_path = ""
74 | try:
75 | temp_file_path = os.path.join(TMP_DIR.as_posix(), uploaded_file.name)
76 | with open(temp_file_path, "wb") as temp_file:
77 | temp_file.write(uploaded_file.read())
78 | return temp_file_path
79 | except Exception as error:
80 | st.error(f"An error occured: {error}")
81 |
82 | return temp_file_path
83 |
84 |
85 | def tiktoken_tokens(documents, model="gpt-3.5-turbo-0125"):
86 | """Use tiktoken (tokeniser for OpenAI models) to return a list of token length per document."""
87 |
88 | # Get the encoding used by the model.
89 | encoding = tiktoken.encoding_for_model(model)
90 |
91 | # Calculate the token length of documents
92 | tokens_length = [len(encoding.encode(doc)) for doc in documents]
93 |
94 | return tokens_length
95 |
96 |
97 | def select_embeddings_model(LLM_service="OpenAI"):
98 | """Select the Embeddings model: OpenAIEmbeddings or GoogleGenerativeAIEmbeddings."""
99 |
100 | if LLM_service == "OpenAI":
101 | embeddings = OpenAIEmbeddings(api_key=st.session_state.openai_api_key)
102 |
103 | if LLM_service == "Google":
104 | embeddings = GoogleGenerativeAIEmbeddings(
105 | model="models/embedding-001", google_api_key=st.session_state.google_api_key
106 | )
107 |
108 | return embeddings
109 |
110 |
111 | def create_vectorstore(embeddings, documents):
112 | """Create a Faiss vector database."""
113 | vector_store = FAISS.from_documents(documents=documents, embedding=embeddings)
114 |
115 | return vector_store
116 |
117 |
118 | def Vectorstore_backed_retriever(
119 | vectorstore, search_type="similarity", k=4, score_threshold=None
120 | ):
121 | """create a vectorsore-backed retriever
122 | Parameters:
123 | search_type: Defines the type of search that the Retriever should perform.
124 | Can be "similarity" (default), "mmr", or "similarity_score_threshold"
125 | k: number of documents to return (Default: 4)
126 | score_threshold: Minimum relevance threshold for similarity_score_threshold (default=None)
127 | """
128 | search_kwargs = {}
129 | if k is not None:
130 | search_kwargs["k"] = k
131 | if score_threshold is not None:
132 | search_kwargs["score_threshold"] = score_threshold
133 |
134 | retriever = vectorstore.as_retriever(
135 | search_type=search_type, search_kwargs=search_kwargs
136 | )
137 | return retriever
138 |
139 |
140 | def CohereRerank_retriever(
141 | base_retriever, cohere_api_key, cohere_model="rerank-multilingual-v2.0", top_n=4
142 | ):
143 | """Build a ContextualCompressionRetriever using Cohere Rerank endpoint to reorder the results based on relevance.
144 | Parameters:
145 | base_retriever: a Vectorstore-backed retriever
146 | cohere_api_key: the Cohere API key
147 | cohere_model: The Cohere model can be either 'rerank-english-v2.0' or 'rerank-multilingual-v2.0', with the latter being the default.
148 | top_n: top n results returned by Cohere rerank, default = 4.
149 | """
150 |
151 | compressor = CohereRerank(
152 | cohere_api_key=cohere_api_key, model=cohere_model, top_n=top_n
153 | )
154 |
155 | retriever_Cohere = ContextualCompressionRetriever(
156 | base_compressor=compressor, base_retriever=base_retriever
157 | )
158 | return retriever_Cohere
159 |
160 |
161 | def retrieval_main():
162 | """Create a Langchain retrieval, which includes document loaders to upload the resume,
163 | embeddings to create a numerical representation of the text, FAISS vector database to store the embeddings,
164 | and CohereRerank retriever to find the most relevant documents.
165 | """
166 |
167 | # 1. Delete old temp files from TMP directory.
168 | delte_temp_files()
169 |
170 | if st.session_state.uploaded_file is not None:
171 | # 2. Save uploaded_file to TMP directory.
172 | saved_file_path = save_uploaded_file(st.session_state.uploaded_file)
173 |
174 | # 3. Load documents with Langchain loaders
175 | documents = langchain_document_loader(saved_file_path)
176 | st.session_state.documents = documents
177 |
178 | # 4. Embeddings
179 | embeddings = select_embeddings_model(st.session_state.LLM_provider)
180 |
181 | # 5. Create a Faiss vector database
182 | try:
183 | st.session_state.vector_store = create_vectorstore(
184 | embeddings=embeddings, documents=documents
185 | )
186 |
187 | # 6. Create CohereRerank retriever
188 | base_retriever = Vectorstore_backed_retriever(
189 | st.session_state.vector_store, "similarity", k=min(4, len(documents))
190 | )
191 | st.session_state.retriever = CohereRerank_retriever(
192 | base_retriever=base_retriever,
193 | cohere_api_key=st.session_state.cohere_api_key,
194 | cohere_model="rerank-multilingual-v2.0",
195 | top_n=min(2, len(documents)),
196 | )
197 | except Exception as error:
198 | st.error(f"An error occured:\n {error}")
199 |
200 | else:
201 | st.error("Please upload a resume!")
202 | st.stop()
203 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | aiohttp==3.9.3
2 | aiosignal==1.3.1
3 | altair==5.2.0
4 | annotated-types==0.6.0
5 | anyio==4.3.0
6 | async-timeout==4.0.3
7 | attrs==23.2.0
8 | backoff==2.2.1
9 | blinker==1.7.0
10 | cachetools==5.3.3
11 | certifi==2024.2.2
12 | cffi==1.16.0
13 | charset-normalizer==3.3.2
14 | click==8.1.7
15 | cohere==4.56
16 | colorama==0.4.6
17 | cryptography==42.0.5
18 | dataclasses-json==0.6.4
19 | distro==1.9.0
20 | exceptiongroup==1.2.0
21 | faiss-cpu==1.8.0
22 | fastavro==1.9.4
23 | frozenlist==1.4.1
24 | gitdb==4.0.11
25 | GitPython==3.1.42
26 | google-ai-generativelanguage==0.4.0
27 | google-api-core==2.17.1
28 | google-auth==2.28.2
29 | google-generativeai==0.3.2
30 | googleapis-common-protos==1.63.0
31 | greenlet==3.0.3
32 | grpcio==1.62.1
33 | grpcio-status==1.62.1
34 | h11==0.14.0
35 | httpcore==1.0.4
36 | httpx==0.27.0
37 | idna==3.6
38 | importlib-metadata==6.11.0
39 | Jinja2==3.1.3
40 | jsonpatch==1.33
41 | jsonpointer==2.4
42 | jsonschema==4.21.1
43 | jsonschema-specifications==2023.12.1
44 | langchain==0.1.12
45 | langchain-community==0.0.28
46 | langchain-core==0.1.32
47 | langchain-google-genai==0.0.6
48 | langchain-openai==0.0.2.post1
49 | langchain-text-splitters==0.0.1
50 | langsmith==0.1.28
51 | Markdown==3.6
52 | markdown-it-py==3.0.0
53 | MarkupSafe==2.1.5
54 | marshmallow==3.21.1
55 | mdurl==0.1.2
56 | multidict==6.0.5
57 | mypy-extensions==1.0.0
58 | numpy==1.26.4
59 | openai==1.14.1
60 | orjson==3.9.15
61 | packaging==23.2
62 | pandas==2.2.1
63 | pdfminer.six==20231228
64 | pillow==10.2.0
65 | proto-plus==1.23.0
66 | protobuf==4.25.3
67 | pyarrow==15.0.2
68 | pyasn1==0.5.1
69 | pyasn1-modules==0.3.0
70 | pycparser==2.21
71 | pydantic==2.6.4
72 | pydantic_core==2.16.3
73 | pydeck==0.8.1b0
74 | Pygments==2.17.2
75 | python-dateutil==2.9.0.post0
76 | python-dotenv==1.0.1
77 | pytz==2024.1
78 | PyYAML==6.0.1
79 | referencing==0.34.0
80 | regex==2023.12.25
81 | requests==2.31.0
82 | rich==13.7.1
83 | rpds-py==0.18.0
84 | rsa==4.9
85 | six==1.16.0
86 | smmap==5.0.1
87 | sniffio==1.3.1
88 | SQLAlchemy==2.0.28
89 | streamlit==1.28.0
90 | tenacity==8.2.3
91 | tiktoken==0.5.2
92 | toml==0.10.2
93 | toolz==0.12.1
94 | tornado==6.4
95 | tqdm==4.66.2
96 | typing-inspect==0.9.0
97 | typing_extensions==4.10.0
98 | tzdata==2024.1
99 | tzlocal==5.2
100 | urllib3==2.2.1
101 | validators==0.22.0
102 | watchdog==4.0.0
103 | yarl==1.9.4
104 | zipp==3.18.1
105 |
--------------------------------------------------------------------------------