├── Notebooks
    ├── data
    │   └── resume
    │   │   └── ChatGPT_dataScientist.pdf
    ├── keys.env
    └── resume_scanner.ipynb
├── README.md
├── Streamlit_App
    ├── app.py
    ├── app_constants.py
    ├── app_display_results.py
    ├── app_sidebar.py
    ├── data
    │   └── Images
    │   │   ├── Education.png
    │   │   ├── Language.png
    │   │   ├── Leonardo_AI.jpg
    │   │   ├── app.png
    │   │   ├── contact_information.png
    │   │   ├── scores.png
    │   │   ├── top_3_strengths.png
    │   │   └── work_experience.png
    ├── keys.env
    ├── llm_functions.py
    ├── resume_analyzer.py
    └── retrieval.py
└── requirements.txt


/Notebooks/data/resume/ChatGPT_dataScientist.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Notebooks/data/resume/ChatGPT_dataScientist.pdf


--------------------------------------------------------------------------------
/Notebooks/keys.env:
--------------------------------------------------------------------------------
1 | api_key_openai = "Your_API_key"
2 | api_key_google = "Your_API_key"
3 | api_key_cohere = "Your_API_key"


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 🔎 Resume scanner: 🚀 Leverage the power of LLM to improve your resume
  2 | 
  3 | ### 🚀 Build a Streamlit application powered by Langchain, OpenAI and Google Generative AI
  4 | 
  5 | <div align="center">
  6 |   <img src="https://github.com/AlaGrine/CV_Improver_with_LLMs/blob/main/Streamlit_App/data/Images/Leonardo_AI.jpg" >
  7 |   <figcaption>Image generated by Leonardo.ai</figcaption>
  8 | </div>
  9 | 
 10 | ### Table of Contents
 11 | 
 12 | 1. [Project Overview](#overview)
 13 | 2. [Installation](#installation)
 14 | 3. [File Descriptions](#files)
 15 | 4. [Instructions](#instructions)
 16 | 5. [Screenshots](#screenshots)
 17 | 
 18 | ## Project Overview <a name="overview"></a>
 19 | 
 20 | The aim of this project is to build a WEB application in [Streamlit](https://streamlit.io/) that will scan and improve a resume using instruction-tuned Large Language Models (LLMs).
 21 | 
 22 | We leveraged the power of LLMs, specifically Chat GPT from [OpenAI](https://platform.openai.com/overview) and Gemini-pro from [Google](https://ai.google.dev/?hl=en), to extract, assess, and enhance resumes.
 23 | 
 24 | We used [Langchain](https://python.langchain.com/docs/get_started/introduction), prompt engineering and retrieval augmented generation techniques to complete these steps.
 25 | 
 26 | ## Installation <a name="installation"></a>
 27 | 
 28 | This project requires Python 3 and the following Python libraries installed:
 29 | 
 30 | `streamlit`, `langchain` ,`langchain-openai`, `langchain-google-genai`, `faiss-cpu`, `tiktoken`, `python-dotenv`, `pdfminer`, `markdown`
 31 | 
 32 | The full list of requirements can be found in `requirements.txt`
 33 | 
 34 | ## File Descriptions <a name="files"></a>
 35 | 
 36 | - **Streamlit_App** folder: contains the Streamlit application.
 37 | 
 38 |   - `requirements.txt`: contains the required packages for installation.
 39 |   - `keys.env`: Your OpenAI, Gemini, and Cohere API keys are stored here.
 40 |   - `llm_functions.py`: reads LLM API keys from keys.env and instantiates the LLM in Langchain.
 41 |   - `retrieval.py`: the script used to create a Langchain retrieval, including document loaders, embeddings, vector stores, and retrievers.
 42 |   - `app_constants.py`: contains templates for creating LLM prompts.
 43 |   - `app_sidebar.py`: the sidebar is where you can choose the LLM model and its parameters, such as temperature and top_p values, and enter your API keys.
 44 |   - `resume_analyzer.py`: this file contains the functions used to extract, assess, and improve each section of the resume using LLM. It is the **core** of the application.
 45 |   - `pp_display_results.py`: the script used to display resume sections, assessments, scores, and improved texts.
 46 |   - `app.py`: It's the main script of the app. It calls all the scripts and is used to run the Streamlit application.
 47 | 
 48 | - **Notebooks** folder: contains the project's notebook.
 49 | 
 50 | ## Instructions <a name="instructions"></a>
 51 | 
 52 | To run the app locally:
 53 | 
 54 | 1. Create a virtual environment: `python -m venv virtualenv`
 55 | 2. Activate the virtual environment :
 56 | 
 57 |    **Windows:** `.\virtualenv\Scripts\activate`
 58 | 
 59 |    **Linux:** `source virtualenv/bin/activate`
 60 | 
 61 | 3. Install the required dependencies `pip install -r requirements.txt`
 62 | 4. Add your OpenIA, Gemini, and Cohere API keys to the `keys.env` file. You can get your API keys from their respective websites.
 63 | 
 64 | > - **OpenAI** API key: [Get an API key](https://platform.openai.com/account/api-keys)
 65 | > - **Google** API key: [Get an API key](https://makersuite.google.com/app/apikey)
 66 | > - **Cohere** API key: [Get an API key](https://dashboard.cohere.com/api-keys)
 67 | 
 68 | 5. Start the app: `streamlit run ./Streamlit_App/app.py`
 69 | 6. Select the LLM provider (either OpenAI or Google Generative AI) from the sidebar. Then, choose a model (GPT-3.5, GPT-4 or Gemini-pro) and adjust its parameters.
 70 | 7. Use the file uploader widget to upload your resume in PDF format.
 71 | 8. 🚀 To analyze and improve your resume, simply click the 'Analyze resume' button located in the main panel.
 72 | 
 73 | ## Screenshots <a name="screenshots"></a>
 74 | 
 75 | Here is a screenshot of the application.
 76 | 
 77 | <div align="center">
 78 |   <img src="https://github.com/AlaGrine/CV_Improver_with_LLMs/blob/main/Streamlit_App/data/Images/app.png" >
 79 | </div>
 80 | <br>
 81 | The results of the resume analysis and improvement are shown below.
 82 | 
 83 | First, the resume's overview, top 3 strengths, and top 3 weaknesses are displayed.
 84 | 
 85 | <div align="center">
 86 |   <img src="https://github.com/AlaGrine/CV_Improver_with_LLMs/blob/main/Streamlit_App/data/Images/top_3_strengths.png" >
 87 | </div>
 88 | <br>
 89 | The scores are then displayed to give a general indication of the resume's quality. 
 90 | The resume is evaluated based on eight sections, each scored out of 100: contact information, summary, work experience, skills, education, language, projects, and certifications.
 91 | 
 92 | <div align="center">
 93 |   <img src="https://github.com/AlaGrine/CV_Improver_with_LLMs/blob/main/Streamlit_App/data/Images/scores.png" >
 94 | </div>
 95 | <br>
 96 | Finally, the analysis of each section is presented in a st.expander. For instance, here is how the work experience is displayed.
 97 | 
 98 | <div align="center">
 99 |   <img src="https://github.com/AlaGrine/CV_Improver_with_LLMs/blob/main/Streamlit_App/data/Images/work_experience.png" >
100 | </div>
101 | 


--------------------------------------------------------------------------------
/Streamlit_App/app.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | from app_sidebar import sidebar
 3 | from llm_functions import instantiate_LLM_main, get_api_keys_from_local_env
 4 | from retrieval import retrieval_main
 5 | from resume_analyzer import resume_analyzer_main
 6 | from app_display_results import display_resume_analysis
 7 | 
 8 | 
 9 | def main():
10 |     """Analyze the uploaded resume."""
11 | 
12 |     if st.button("Analyze resume"):
13 |         with st.spinner("Please wait..."):
14 |             try:
15 |                 # 1. Create the Langchain retrieval
16 |                 retrieval_main()
17 | 
18 |                 # 2. Instantiate a deterministic LLM with a temperature of 0.0.
19 |                 st.session_state.llm = instantiate_LLM_main(temperature=0.0, top_p=0.95)
20 | 
21 |                 # 3. Instantiate LLM with temperature >0.1 for creativity.
22 |                 st.session_state.llm_creative = instantiate_LLM_main(
23 |                     temperature=st.session_state.temperature,
24 |                     top_p=st.session_state.top_p,
25 |                 )
26 | 
27 |                 # 4. Analyze the resume
28 |                 st.session_state.SCANNED_RESUME = resume_analyzer_main(
29 |                     llm=st.session_state.llm,
30 |                     llm_creative=st.session_state.llm_creative,
31 |                     documents=st.session_state.documents,
32 |                 )
33 | 
34 |                 # 5. Display results
35 |                 display_resume_analysis(st.session_state.SCANNED_RESUME)
36 | 
37 |             except Exception as e:
38 |                 st.error(f"An error occured: {e}")
39 | 
40 | 
41 | if __name__ == "__main__":
42 |     # 1. Set app configuration
43 |     st.set_page_config(page_title="Resume Scanner", page_icon="🚀")
44 |     st.title("🔎 Resume Scanner")
45 | 
46 |     # 2. Get API keys from local "keys.env" file
47 |     openai_api_key, google_api_key, cohere_api_key = get_api_keys_from_local_env()
48 | 
49 |     # 3. Create the sidebar
50 |     sidebar(openai_api_key, google_api_key, cohere_api_key)
51 | 
52 |     # 4. File uploader widget
53 |     st.session_state.uploaded_file = st.file_uploader(
54 |         label="**Upload Resume**",
55 |         accept_multiple_files=False,
56 |         type=(["pdf"]),
57 |     )
58 | 
59 |     # 5. Analyze the uploaded resume
60 |     main()
61 | 


--------------------------------------------------------------------------------
/Streamlit_App/app_constants.py:
--------------------------------------------------------------------------------
  1 | from pathlib import Path
  2 | import os
  3 | 
  4 | # 1. Constants
  5 | 
  6 | list_LLM_providers = [":rainbow[**OpenAI**]", "**Google Generative AI**"]
  7 | 
  8 | list_Assistant_Languages = [
  9 |     "english",
 10 |     "french",
 11 |     "spanish",
 12 |     "german",
 13 |     "russian",
 14 |     "chinese",
 15 |     "arabic",
 16 |     "portuguese",
 17 |     "italian",
 18 |     "Japanese",
 19 | ]
 20 | 
 21 | TMP_DIR = Path(__file__).resolve().parent.joinpath("data", "tmp")
 22 | 
 23 | 
 24 | #  2. PROMPT TEMPLATES
 25 | 
 26 | templates = {}
 27 | 
 28 | # 2.1 Contact information Section
 29 | templates[
 30 |     "Contact__information"
 31 | ] = """Extract and evaluate the contact information. \
 32 | Output a dictionary with the following keys:
 33 | - candidate__name 
 34 | - candidate__title
 35 | - candidate__location
 36 | - candidate__email
 37 | - candidate__phone
 38 | - candidate__social_media: Extract a list of all social media profiles, blogs or websites.
 39 | - evaluation__ContactInfo: Evaluate in {language} the contact information.
 40 | - score__ContactInfo: Rate the contact information by giving a score (integer) from 0 to 100.
 41 | """
 42 | 
 43 | # 2.2. Summary Section
 44 | templates[
 45 |     "CV__summary"
 46 | ] = """Extract the summary and/or objective section. This is a separate section of the resume. \
 47 | If the resume doed not contain a summary and/or objective section, then simply write "unknown"."""
 48 | 
 49 | # 2.3. WORK Experience Section
 50 | 
 51 | templates[
 52 |     "Work__experience"
 53 | ] = """Extract all work experiences. For each work experience: 
 54 | 1. Extract the job title.
 55 | 2. Extract the company.  
 56 | 3. Extract the start date and output it in the following format: \
 57 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
 58 | 4. Extract the end date and output it in the following format: \
 59 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
 60 | 5. Create a dictionary with the following keys: job__title, job__company, job__start_date, job__end_date.
 61 | 
 62 | Format your response as a list of dictionaries.
 63 | """
 64 | 
 65 | # 2.4. Projects Section
 66 | templates[
 67 |     "CV__Projects"
 68 | ] = """Include any side projects outside the work experience. 
 69 | For each project:
 70 | 1. Extract the title of the project. 
 71 | 2. Extract the start date and output it in the following format: \
 72 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
 73 | 3. Extract the end date and output it in the following format: \
 74 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
 75 | 4. Create a dictionary with the following keys: project__title, project__start_date, project__end_date.
 76 | 
 77 | Format your response as a list of dictionaries.
 78 | """
 79 | 
 80 | # 2.5. Education Section
 81 | templates[
 82 |     "CV__Education"
 83 | ] = """Extract all educational background and academic achievements.
 84 | For each education achievement:
 85 | 1. Extract the name of the college or the high school. 
 86 | 2. Extract the earned degree. Honors and achievements are included.
 87 | 3. Extract the start date and output it in the following format: \
 88 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
 89 | 4. Extract the end date and output it in the following format: \
 90 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
 91 | 5. Create a dictionary with the following keys: edu__college, edu__degree, edu__start_date, edu__end_date.
 92 | 
 93 | Format your response as a list of dictionaries.
 94 | """
 95 | 
 96 | templates[
 97 |     "Education__evaluation"
 98 | ] = """Your task is to perform the following actions:  
 99 | 1. Rate the quality of the Education section by giving an integer score from 0 to 100. 
100 | 2. Evaluate (in three sentences and in {language}) the quality of the Education section.
101 | 3. Format your response as a dictionary with the following keys: score__edu, evaluation__edu.
102 | """
103 | 
104 | # 2.6. Skills
105 | templates[
106 |     "candidate__skills"
107 | ] = """Extract the list of soft and hard skills from the skill section. Output a list.
108 | The skill section is a separate section.
109 | """
110 | 
111 | templates[
112 |     "Skills__evaluation"
113 | ] = """Your task is to perform the following actions: 
114 | 1. Rate the quality of the Skills section by giving an integer score from 0 to 100.
115 | 2. Evaluate (in three sentences and in {language}) the quality of the Skills section.
116 | 3. Format your response as a dictionary with the following keys: score__skills, evaluation__skills.
117 | """
118 | 
119 | # 2.7. Languages
120 | templates[
121 |     "CV__Languages"
122 | ] = """Extract all the languages that the candidate can speak. For each language:
123 | 1. Extract the language.
124 | 2. Extract the fluency. If the fluency is not available, then simply write "unknown".
125 | 3. Create a dictionary with the following keys: spoken__language, language__fluency.
126 | 
127 | Format your response as a list of dictionaries.
128 | """
129 | 
130 | templates[
131 |     "Languages__evaluation"
132 | ] = """ Your task is to perform the following actions: 
133 | 1. Rate the quality of the language section by giving an integer score from 0 to 100.
134 | 2. Evaluate (in three sentences and in {language}) the quality of the language section.
135 | 3. Format your response as a dictionary with the following keys: score__language,evaluation__language.
136 | """
137 | 
138 | # 2.8. Certifications
139 | templates[
140 |     "CV__Certifications"
141 | ] = """Extraction of all certificates other than education background and academic achievements. \
142 | For each certificate: 
143 | 1. Extract the title of the certification. 
144 | 2. Extract the name of the organization or institution that issues the certification.
145 | 3. Extract the date of certification and output it in the following format: \
146 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
147 | 4. Extract the certification expiry date and output it in the following format: \
148 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month).
149 | 5. Extract any other information listed about the certification. if not found, then simply write "unknown".
150 | 6. Create a dictionary with the following keys: certif__title, certif__organization, certif__date, certif__expiry_date, certif__details.
151 | 
152 | Format your response as a list of dictionaries.
153 | """
154 | 
155 | templates[
156 |     "Certif__evaluation"
157 | ] = """Your task is to perform the following actions: 
158 | 1. Rate the certifications by giving an integer score from 0 to 100.
159 | 2. Evaluate (in three sentences and in {language}) the certifications and the quality of the text.
160 | 3. Format your response as a dictionary with the following keys: score__certif,evaluation__certif.
161 | """
162 | 
163 | 
164 | # 3. PROMPTS
165 | 
166 | PROMPT_IMPROVE_SUMMARY = """Your are given a resume (delimited by <resume></resume>) \
167 | and a summary (delimited by <summary></summary>).
168 | 1. In {language}, evaluate the summary (format and content) .
169 | 2. Rate the summary by giving an integer score from 0 to 100. \
170 | If the summary is "unknown", the score is 0.
171 | 3. In {language}, strengthen the summary. The summary should not exceed 5 sentences. \
172 | If the summary is "unknown", generate a strong summary in {language} with no more than 5 sentences. \
173 | Please include: years of experience, top skills and experiences, some of the biggest achievements, and finally an attractive objective.
174 | 4. Format your response as a dictionary with the following keys: evaluation__summary, score__summary, CV__summary_enhanced.
175 | 
176 | <summary>
177 | {summary}
178 | </summary>
179 | ------
180 | <resume>
181 | {resume}
182 | </resume>
183 | """
184 | 
185 | PROMPT_IMPROVE_WORK_EXPERIENCE = """you are given a work experience text delimited by triple backticks.
186 | 1. Rate the quality of the work experience text by giving an integer score from 0 to 100. 
187 | 2. Suggest in {language} how to make the work experience text better and stronger.
188 | 3. Strengthen the work experience text to make it more appealing to a recruiter in {language}. \
189 | Provide additional details on responsibilities and quantify results for each bullet point. \
190 | Format your text as a string in {language}.
191 | 4. Format your response as a dictionary with the following keys: "Score__WorkExperience", "Comments__WorkExperience" and "Improvement__WorkExperience".
192 | 
193 | Work experience text: ```{text}```
194 | """
195 | 
196 | PROMPT_IMPROVE_PROJECT = """you are given a project text delimited by triple backticks.
197 | 1. Rate the quality of the project text by giving an integer score from 0 to 100. 
198 | 2. Suggest in {language} how to make the project text better and stronger.
199 | 3. Strengthen the project text to make it more appealing to a recruiter in {language}, \
200 | including the problem, the approach taken, the tools used and quantifiable results. \
201 | Format your text as a string in {language}.
202 | 4. Format your response as a dictionary with the following keys: Score__project, Comments__project, Improvement__project.
203 | 
204 | project text: ```{text}```
205 | """
206 | 
207 | PROMPT_EVALUATE_RESUME = """You are given a resume delimited by triple backticks. 
208 | 1. Provide an overview of the resume in {language}.
209 | 2. Provide a comprehensive analysis of the three main strengths of the resume in {language}. \
210 | Format the top 3 strengths as string containg three bullet points.
211 | 3. Provide a comprehensive analysis of the three main weaknesses of the resume in {language}. \
212 | Format the top 3 weaknesses as string containg three bullet points.
213 | 4. Format your response as a dictionary with the following keys: resume_cv_overview, top_3_strengths, top_3_weaknesses.
214 | 
215 | The strengths and weaknesses lie in the format, style and content of the resume.
216 | 
217 | Resume: ```{text}```
218 | """
219 | 


--------------------------------------------------------------------------------
/Streamlit_App/app_display_results.py:
--------------------------------------------------------------------------------
  1 | import streamlit as st
  2 | import markdown
  3 | from resume_analyzer import get_section_scores
  4 | 
  5 | 
  6 | def custom_markdown(
  7 |     text,
  8 |     html_tag="p",
  9 |     bg_color="white",
 10 |     color="black",
 11 |     font_size=None,
 12 |     text_align="left",
 13 | ):
 14 |     """Customise markdown by specifying custom background colour, text colour, font size, and text alignment.."""
 15 | 
 16 |     style = f'style="background-color:{bg_color};color:{color};font-size:{font_size}px; \
 17 | text-align: {text_align};padding: 25px 25px 25px 25px;border-radius:2%;"'
 18 | 
 19 |     body = f"<{html_tag} {style}> {text}</{html_tag}>"
 20 | 
 21 |     st.markdown(body, unsafe_allow_html=True)
 22 |     st.write("")
 23 | 
 24 | 
 25 | def set_background_color(score):
 26 |     """Set background color based on score."""
 27 |     if score >= 80:
 28 |         bg_color = "#D4F1F4"
 29 |     elif score >= 60:
 30 |         bg_color = "#ededed"
 31 |     else:
 32 |         bg_color = "#fbcccd"
 33 |     return bg_color
 34 | 
 35 | 
 36 | def format_object_to_string(object, separator="\n- "):
 37 |     """Convert object (e.g. list) to string."""
 38 |     if not isinstance(object, str):
 39 |         return separator + separator.join(object)
 40 |     else:
 41 |         return object
 42 | 
 43 | 
 44 | def markdown_to_html(md_text):
 45 |     """Convert Markdown to html."""
 46 |     html_txt = (
 47 |         markdown.markdown(md_text.replace("\\n", "\n").replace("- ", "\n- "))
 48 |         .replace("\n", "")
 49 |         .replace('\\"', '"')
 50 |     )
 51 |     return html_txt
 52 | 
 53 | 
 54 | def display_scores_in_columns(section_names: list, scores: list, column_width: list):
 55 |     """Display the scores of the sections in side-by-side columns.
 56 |     The column_width variable sets the width of the columns."""
 57 |     columns = st.columns(column_width)
 58 |     for i, column in enumerate(columns):
 59 |         with column:
 60 |             custom_markdown(
 61 |                 text=f"<b>{section_names[i]} <br><br> {scores[i]}</b>",
 62 |                 bg_color=set_background_color(scores[i]),
 63 |                 text_align="center",
 64 |             )
 65 | 
 66 | 
 67 | def display_section_results(
 68 |     expander_label: str,
 69 |     expander_header_fields: list,
 70 |     expander_header_links: list,
 71 |     score: int,
 72 |     section_original_text_header: str,
 73 |     section_original_text: list,
 74 |     original_text_bullet_points: bool,
 75 |     section_assessment,
 76 |     section_improved_text,
 77 | ):
 78 |     if score > -1:
 79 |         expander_label += f"- 🎯 **{score}**/100"
 80 |     with st.expander(expander_label):
 81 |         st.write("")
 82 | 
 83 |         # 1. Display the header fields (for example, the company and dates of the work experience)
 84 |         if expander_header_fields is not None:
 85 |             for field in expander_header_fields:
 86 |                 if not isinstance(field, list):
 87 |                     st.markdown(field)
 88 |                 else:
 89 |                     # display fields in side-by-side columns.
 90 |                     columns = st.columns(len(field))
 91 |                     for i, column in enumerate(columns):
 92 |                         with column:
 93 |                             st.markdown(field[i])
 94 | 
 95 |         # 2. View the links (examle social media blogs and web sites)
 96 |         if expander_header_links is not None:
 97 |             if not isinstance(expander_header_links, list):
 98 |                 link = expander_header_links.strip().replace('"', "")
 99 |                 if not link.startswith("http"):
100 |                     link = "https://" + link
101 |                 st.markdown(
102 |                     f"""🌐 <a href="{link}" target="_blank">{link}""",
103 |                     unsafe_allow_html=True,
104 |                 )
105 |             else:
106 |                 for link in expander_header_links:
107 |                     if not link.startswith("http"):
108 |                         link = "https://" + link
109 |                     st.markdown(
110 |                         f"""🌐 <a href="{link}" target="_blank">{link}""",
111 |                         unsafe_allow_html=True,
112 |                     )
113 | 
114 |         # 3. View the original text
115 |         if section_original_text_header is not None:
116 |             st.write("")
117 |             st.markdown(section_original_text_header)
118 |         if section_original_text is not None:
119 |             for text in section_original_text:
120 |                 if original_text_bullet_points:
121 |                     st.markdown(f"- {text}")
122 |                 else:
123 |                     st.markdown(text)
124 | 
125 |         # 4. Display of section score
126 |         st.divider()
127 |         custom_markdown(
128 |             html_tag="h4",
129 |             text=f"<b>🎯 Score: {score}</b>/<small>100</small>",
130 |         )
131 | 
132 |         # 5. Display the assessmnet
133 |         bg_color = set_background_color(score)
134 |         assessment = markdown_to_html(format_object_to_string(section_assessment))
135 |         custom_markdown(
136 |             text=f"<b>🔎 Assessment:</b> <br><br> {assessment}",
137 |             html_tag="div",
138 |             bg_color=bg_color,
139 |         )
140 | 
141 |         # 6. View the improved text
142 |         if section_improved_text is not None:
143 |             improved_text = markdown_to_html(
144 |                 format_object_to_string(section_improved_text)
145 |             )
146 |             custom_markdown(
147 |                 text=f"<b>🚀 Improvement:</b> <br><br> {improved_text}",
148 |                 html_tag="div",
149 |                 bg_color="#ededed",
150 |             )
151 |         st.write("")
152 | 
153 | 
154 | def display_assessment(score, section_assessment):
155 |     """Display the section score and the assessment."""
156 |     # 1. View section score
157 |     custom_markdown(
158 |         html_tag="h4",
159 |         text=f"<b>🎯 Score: {score}</b>/<small>100</small>",
160 |     )
161 |     # 2. Display the assessmnet
162 |     bg_color = set_background_color(score)
163 |     assessment = markdown_to_html(format_object_to_string(section_assessment))
164 |     custom_markdown(
165 |         text=f"<b>🔎 Assessment:</b> <br><br> {assessment}",
166 |         html_tag="div",
167 |         bg_color=bg_color,
168 |     )
169 |     st.write("")
170 | 
171 | 
172 | def display_resume_analysis(SCANNED_RESUME):
173 |     """Display the resume analysis."""
174 |     try:
175 |         ###############################################################
176 |         #        Overview, Top 3 strengths and Top 3 weaknesses
177 |         ###############################################################
178 |         st.divider()
179 |         st.header("🎯 Overview and scores")
180 | 
181 |         list_task = ["Overview", "Top 3 strengths", "Top 3 weaknesses"]
182 |         list_content = [
183 |             SCANNED_RESUME["resume_cv_overview"],
184 |             SCANNED_RESUME["top_3_strengths"],
185 |             SCANNED_RESUME["top_3_weaknesses"],
186 |         ]
187 |         list_colors = ["#ededed", "#D4F1F4", "#fbcccd"]
188 | 
189 |         for i in range(3):
190 |             st.write("")
191 |             st.subheader(list_task[i])
192 |             custom_markdown(
193 |                 html_tag="div",
194 |                 text=markdown_to_html(format_object_to_string(list_content[i])),
195 |                 bg_color=list_colors[i],
196 |             )
197 | 
198 |         ###############################################################
199 |         #                      Display scores
200 |         ###############################################################
201 |         st.write("")
202 |         st.subheader("Scores over 100")
203 |         st.write("")
204 | 
205 |         dict_scores = get_section_scores(SCANNED_RESUME)
206 | 
207 |         display_scores_in_columns(
208 |             section_names=[
209 |                 "👤 Contact",
210 |                 "📋 Summary",
211 |                 "📋 Work Experience",
212 |                 "💪 Skills",
213 |             ],
214 |             scores=[
215 |                 dict_scores.get(key)
216 |                 for key in ["ContactInfo", "summary", "work_experience", "skills"]
217 |             ],
218 |             column_width=[2.25, 2.25, 2.75, 2.25],
219 |         )
220 | 
221 |         display_scores_in_columns(
222 |             section_names=[
223 |                 "🎓 Education",
224 |                 "🗣 Language",
225 |                 "📋 Projects",
226 |                 "🏅 Certifications",
227 |             ],
228 |             scores=[
229 |                 dict_scores.get(key)
230 |                 for key in ["education", "language", "projects", "certfication"]
231 |             ],
232 |             column_width=[2.5, 2.5, 2.5, 2.75],
233 |         )
234 | 
235 |         ##################################################################################
236 |         #                          Detailed analysis
237 |         ##################################################################################
238 |         st.divider()
239 |         st.header("🔎 Detailed Analysis")
240 | 
241 |         # 1. Contact Information
242 | 
243 |         st.write("")
244 |         st.subheader(f"Contact Information - 🎯 **{dict_scores['ContactInfo']}**/100")
245 |         display_section_results(
246 |             expander_label="🛈 Contact Information",
247 |             expander_header_fields=[
248 |                 f"**👤 {SCANNED_RESUME['Contact__information']['candidate__name']}**",
249 |                 f"{SCANNED_RESUME['Contact__information']['candidate__title']}",
250 |                 "",
251 |                 [
252 |                     f"**📌 Location:** {SCANNED_RESUME['Contact__information']['candidate__location']}",
253 |                     f"**:telephone_receiver::** {SCANNED_RESUME['Contact__information']['candidate__phone']}",
254 |                 ],
255 |                 "",
256 |                 "**Email and Social media:**",
257 |                 f"**:e-mail:** {SCANNED_RESUME['Contact__information']['candidate__email']}",
258 |             ],
259 |             expander_header_links=SCANNED_RESUME["Contact__information"][
260 |                 "candidate__social_media"
261 |             ],
262 |             score=dict_scores["ContactInfo"],
263 |             section_original_text_header=None,
264 |             section_original_text=None,
265 |             original_text_bullet_points=False,
266 |             section_assessment=SCANNED_RESUME["Contact__information"][
267 |                 "evaluation__ContactInfo"
268 |             ],
269 |             section_improved_text=None,
270 |         )
271 | 
272 |         # 2. Summary
273 | 
274 |         st.write("")
275 |         st.write("")
276 |         st.subheader(f"Summary - 🎯 **{dict_scores['summary']}**/100")
277 |         display_section_results(
278 |             expander_label="Summary",
279 |             expander_header_fields=[],
280 |             expander_header_links=None,
281 |             score=dict_scores["summary"],
282 |             section_original_text_header="**📋 Summary:**",
283 |             section_original_text=[SCANNED_RESUME["CV__summary"]],
284 |             original_text_bullet_points=False,
285 |             section_assessment=SCANNED_RESUME["Summary__evaluation"][
286 |                 "evaluation__summary"
287 |             ],
288 |             section_improved_text=SCANNED_RESUME["Summary__evaluation"][
289 |                 "CV__summary_enhanced"
290 |             ],
291 |         )
292 | 
293 |         #  3. Work Experience
294 | 
295 |         st.write("")
296 |         st.write("")
297 |         st.subheader(f"work experience - 🎯 **{dict_scores['work_experience']}**/100")
298 | 
299 |         if len(SCANNED_RESUME["Work__experience"]) == 0:
300 |             st.info("No work experience results.")
301 |         else:
302 |             for work_experience in SCANNED_RESUME["Work__experience"]:
303 |                 display_section_results(
304 |                     expander_label=f"{work_experience['job__title']}",
305 |                     expander_header_fields=[
306 |                         [
307 |                             f"**Company:**\n {work_experience['job__company']}",
308 |                             f"**📅**\n {work_experience['job__start_date']} - {work_experience['job__end_date']}",
309 |                         ]
310 |                     ],
311 |                     expander_header_links=None,
312 |                     score=work_experience["Score__WorkExperience"],
313 |                     section_original_text_header="**📋 Responsibilities:**",
314 |                     section_original_text=list(
315 |                         work_experience["work__duties"].values()
316 |                     ),
317 |                     original_text_bullet_points=True,
318 |                     section_assessment=work_experience["Comments__WorkExperience"],
319 |                     section_improved_text=work_experience[
320 |                         "Improvement__WorkExperience"
321 |                     ],
322 |                 )
323 | 
324 |         # 4. Skills
325 | 
326 |         st.write("")
327 |         st.write("")
328 |         st.subheader(f"Skills - 🎯 **{dict_scores['skills']}**/100")
329 |         display_section_results(
330 |             expander_label="💪 Skills",
331 |             expander_header_fields=None,
332 |             expander_header_links=None,
333 |             score=dict_scores["skills"],
334 |             section_original_text_header=None,
335 |             section_original_text=[SCANNED_RESUME["candidate__skills"]],
336 |             original_text_bullet_points=True,
337 |             section_assessment=SCANNED_RESUME["Skills__evaluation"][
338 |                 "evaluation__skills"
339 |             ],
340 |             section_improved_text=None,
341 |         )
342 | 
343 |         # 5. Education
344 | 
345 |         st.write("")
346 |         st.write("")
347 |         st.subheader(f"Education - 🎯 **{dict_scores['education']}**/100")
348 |         with st.expander(f"🎓 Educational background and academic achievements."):
349 |             st.write("")
350 |             list_education = SCANNED_RESUME["CV__Education"]
351 |             if not isinstance(list_education, list):
352 |                 st.markdown(f"- {list_education}")
353 |             else:
354 |                 for edu in list_education:
355 |                     col1, col2 = st.columns([6, 4])
356 |                     with col1:
357 |                         st.markdown(f"**🎓 Degree:** {edu['edu__degree']}")
358 |                     with col2:
359 |                         st.markdown(
360 |                             f"**📅** {edu['edu__start_date']} - {edu['edu__end_date']}"
361 |                         )
362 |                     st.markdown(f"**🏛️** {edu['edu__college']}")
363 |                     st.divider()
364 | 
365 |             display_assessment(
366 |                 score=dict_scores["education"],
367 |                 section_assessment=SCANNED_RESUME["Education__evaluation"][
368 |                     "evaluation__edu"
369 |                 ],
370 |             )
371 | 
372 |         # 6. Language (Optional section)
373 | 
374 |         st.divider()
375 |         st.subheader(f"Language - 🎯 **{dict_scores['language']}**/100")
376 |         languages = []
377 |         for language in SCANNED_RESUME["CV__Languages"]:
378 |             languages.append(
379 |                 f"**🗣 {language['spoken__language']}** : {language['language__fluency']}"
380 |             )
381 |         display_section_results(
382 |             expander_label="🗣 Language",
383 |             expander_header_fields=None,
384 |             expander_header_links=None,
385 |             score=dict_scores["language"],
386 |             section_original_text_header=None,
387 |             section_original_text=languages,
388 |             original_text_bullet_points=False,
389 |             section_assessment=SCANNED_RESUME["Languages__evaluation"][
390 |                 "evaluation__language"
391 |             ],
392 |             section_improved_text=None,
393 |         )
394 | 
395 |         # 7. CERTIFICATIONS (optional section)
396 | 
397 |         st.write("")
398 |         st.write("")
399 |         st.subheader(f"Certifications - 🎯 **{dict_scores['certfication']}**/100")
400 |         with st.expander("🏅 Certifications"):
401 |             st.write("")
402 |             list_certifs = SCANNED_RESUME["CV__Certifications"]
403 |             if not isinstance(list_certifs, list):
404 |                 st.markdown(f"- {list_certifs}")
405 |             else:
406 |                 for certif in list_certifs:
407 |                     col1, col2 = st.columns([6, 4])
408 |                     with col1:
409 |                         st.markdown(f"**🏅 Title:** {certif['certif__title']}")
410 |                     with col2:
411 |                         st.markdown(f"**📅** {certif['certif__date']} ")
412 |                     st.markdown(f"**🏛️** {certif['certif__organization']}")
413 | 
414 |                     if certif["certif__expiry_date"].lower() != "unknown":
415 |                         st.markdown(
416 |                             f"**📅 Expiry date:** {certif['certif__expiry_date']}"
417 |                         )
418 |                     if certif["certif__details"].lower() != "unknown":
419 |                         st.write("")
420 |                         st.markdown(f"{certif['certif__details']}")
421 |                     st.divider()
422 | 
423 |                 display_assessment(
424 |                     score=dict_scores["certfication"],
425 |                     section_assessment=SCANNED_RESUME["Certif__evaluation"][
426 |                         "evaluation__certif"
427 |                     ],
428 |                 )
429 | 
430 |         # 8. Projects (Optional section)
431 | 
432 |         st.write("")
433 |         st.write("")
434 |         st.subheader(f"Projects - 🎯 **{dict_scores['projects']}**/100")
435 |         if len(SCANNED_RESUME["CV__Projects"]) == 0:
436 |             st.info("No projects found.")
437 |         else:
438 |             for project in SCANNED_RESUME["CV__Projects"]:
439 |                 display_section_results(
440 |                     expander_label=f"{project['project__title']}",
441 |                     expander_header_fields=[
442 |                         f"**📅**\n {project['project__start_date']} - {project['project__end_date']}"
443 |                     ],
444 |                     expander_header_links=None,
445 |                     score=project["Score__project"],
446 |                     section_original_text_header="**📋 Project details:**",
447 |                     section_original_text=[project["project__description"]],
448 |                     original_text_bullet_points=True,
449 |                     section_assessment=project["Comments__project"],
450 |                     section_improved_text=project["Improvement__project"],
451 |                 )
452 | 
453 |     except Exception as exception:
454 |         print(exception)
455 | 


--------------------------------------------------------------------------------
/Streamlit_App/app_sidebar.py:
--------------------------------------------------------------------------------
  1 | import streamlit as st
  2 | 
  3 | from app_constants import list_Assistant_Languages, list_LLM_providers
  4 | 
  5 | 
  6 | def expander_model_parameters(
  7 |     LLM_provider="OpenAI",
  8 |     text_input_API_key="OpenAI API Key - [Get an API key](https://platform.openai.com/account/api-keys)",
  9 |     list_models=["gpt-3.5-turbo-0125", "gpt-3.5-turbo", "gpt-4-turbo-preview"],
 10 |     openai_api_key="",
 11 |     google_api_key="",
 12 | ):
 13 |     """Add a text_input (for the API key) and a streamlit expander with models and parameters."""
 14 | 
 15 |     st.session_state.LLM_provider = LLM_provider
 16 | 
 17 |     if LLM_provider == "OpenAI":
 18 |         st.session_state.openai_api_key = st.text_input(
 19 |             text_input_API_key,
 20 |             value=openai_api_key,
 21 |             type="password",
 22 |             placeholder="insert your API key",
 23 |         )
 24 | 
 25 |     if LLM_provider == "Google":
 26 |         st.session_state.google_api_key = st.text_input(
 27 |             text_input_API_key,
 28 |             type="password",
 29 |             value=google_api_key,
 30 |             placeholder="insert your API key",
 31 |         )
 32 | 
 33 |     with st.expander("**Models and parameters**"):
 34 |         st.session_state.selected_model = st.selectbox(
 35 |             f"Choose {LLM_provider} model", list_models
 36 |         )
 37 |         # model parameters
 38 |         st.session_state.temperature = st.slider(
 39 |             "temperature",
 40 |             min_value=0.1,
 41 |             max_value=1.0,
 42 |             value=0.7,
 43 |             step=0.1,
 44 |         )
 45 |         st.session_state.top_p = st.slider(
 46 |             "top_p",
 47 |             min_value=0.1,
 48 |             max_value=1.0,
 49 |             value=0.95,
 50 |             step=0.05,
 51 |         )
 52 | 
 53 | 
 54 | def sidebar(openai_api_key, google_api_key, cohere_api_key):
 55 |     """Create the sidebar."""
 56 | 
 57 |     with st.sidebar:
 58 |         st.caption(
 59 |             "🚀 A resume scanner powered by 🔗 Langchain, OpenAI and Google Generative AI"
 60 |         )
 61 |         st.write("")
 62 | 
 63 |         llm_chooser = st.radio(
 64 |             "Select provider",
 65 |             list_LLM_providers,
 66 |             captions=[
 67 |                 "[OpenAI pricing page](https://openai.com/pricing)",
 68 |                 "Rate limit: 60 requests per minute.",
 69 |             ],
 70 |         )
 71 | 
 72 |         st.divider()
 73 |         if llm_chooser == list_LLM_providers[0]:
 74 |             expander_model_parameters(
 75 |                 LLM_provider="OpenAI",
 76 |                 text_input_API_key="OpenAI API Key - [Get an API key](https://platform.openai.com/account/api-keys)",
 77 |                 list_models=[
 78 |                     "gpt-3.5-turbo-0125",
 79 |                     "gpt-3.5-turbo",
 80 |                     "gpt-4-turbo-preview",
 81 |                 ],
 82 |                 openai_api_key=openai_api_key,
 83 |                 google_api_key=google_api_key,
 84 |             )
 85 | 
 86 |         if llm_chooser == list_LLM_providers[1]:
 87 |             expander_model_parameters(
 88 |                 LLM_provider="Google",
 89 |                 text_input_API_key="Google API Key - [Get an API key](https://makersuite.google.com/app/apikey)",
 90 |                 list_models=["gemini-pro"],
 91 |                 openai_api_key=openai_api_key,
 92 |                 google_api_key=google_api_key,
 93 |             )
 94 | 
 95 |         # Cohere API Key
 96 |         st.write("")
 97 |         st.session_state.cohere_api_key = st.text_input(
 98 |             "Coher API Key - [Get an API key](https://dashboard.cohere.com/api-keys)",
 99 |             type="password",
100 |             value=cohere_api_key,
101 |             placeholder="insert your API key",
102 |         )
103 | 
104 |         # Assistant language
105 |         st.divider()
106 |         st.session_state.assistant_language = st.selectbox(
107 |             f"Assistant language", list_Assistant_Languages
108 |         )
109 | 


--------------------------------------------------------------------------------
/Streamlit_App/data/Images/Education.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Education.png


--------------------------------------------------------------------------------
/Streamlit_App/data/Images/Language.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Language.png


--------------------------------------------------------------------------------
/Streamlit_App/data/Images/Leonardo_AI.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Leonardo_AI.jpg


--------------------------------------------------------------------------------
/Streamlit_App/data/Images/app.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/app.png


--------------------------------------------------------------------------------
/Streamlit_App/data/Images/contact_information.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/contact_information.png


--------------------------------------------------------------------------------
/Streamlit_App/data/Images/scores.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/scores.png


--------------------------------------------------------------------------------
/Streamlit_App/data/Images/top_3_strengths.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/top_3_strengths.png


--------------------------------------------------------------------------------
/Streamlit_App/data/Images/work_experience.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/work_experience.png


--------------------------------------------------------------------------------
/Streamlit_App/keys.env:
--------------------------------------------------------------------------------
1 | api_key_openai = "Your_API_key"
2 | api_key_google = "Your_API_key"
3 | api_key_cohere = "Your_API_key"


--------------------------------------------------------------------------------
/Streamlit_App/llm_functions.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | 
 3 | # LLM: openai
 4 | from langchain_openai import ChatOpenAI
 5 | 
 6 | # LLM: google_genai
 7 | from langchain_google_genai import ChatGoogleGenerativeAI
 8 | 
 9 | # dotenv and os
10 | from dotenv import load_dotenv, find_dotenv
11 | import os
12 | 
13 | 
14 | def get_api_keys_from_local_env():
15 |     """Get OpenAI, Gemini and Cohere API keys from local .env file"""
16 |     try:
17 |         found_dotenv = find_dotenv("keys.env", usecwd=True)
18 |         load_dotenv(found_dotenv)
19 |         try:
20 |             openai_api_key = os.getenv("api_key_openai")
21 |         except:
22 |             openai_api_key = ""
23 |         try:
24 |             google_api_key = os.getenv("api_key_google")
25 |         except:
26 |             google_api_key = ""
27 |         try:
28 |             cohere_api_key = os.getenv("api_key_cohere")
29 |         except:
30 |             cohere_api_key = ""
31 |     except Exception as e:
32 |         print(e)
33 | 
34 |     return openai_api_key, google_api_key, cohere_api_key
35 | 
36 | 
37 | def instantiate_LLM(
38 |     LLM_provider, api_key, temperature=0.5, top_p=0.95, model_name=None
39 | ):
40 |     """Instantiate LLM in Langchain.
41 |     Parameters:
42 |         LLM_provider (str): the LLM provider; in ["OpenAI","Google"]
43 |         model_name (str): in ["gpt-3.5-turbo", "gpt-3.5-turbo-0125", "gpt-4-turbo-preview","gemini-pro"].
44 |         api_key (str): google_api_key or openai_api_key
45 |         temperature (float): Range: 0.0 - 1.0; default = 0.5
46 |         top_p (float): : Range: 0.0 - 1.0; default = 1.
47 |     """
48 |     if LLM_provider == "OpenAI":
49 |         llm = ChatOpenAI(
50 |             api_key=api_key,
51 |             model=model_name,
52 |             temperature=temperature,
53 |             model_kwargs={"top_p": top_p},
54 |         )
55 |     if LLM_provider == "Google":
56 |         llm = ChatGoogleGenerativeAI(
57 |             google_api_key=api_key,
58 |             # model="gemini-pro",
59 |             model=model_name,
60 |             temperature=temperature,
61 |             top_p=top_p,
62 |             convert_system_message_to_human=True,
63 |         )
64 | 
65 |     return llm
66 | 
67 | 
68 | def instantiate_LLM_main(temperature, top_p):
69 |     """Instantiate the selected LLM model."""
70 |     try:
71 |         if st.session_state.LLM_provider == "OpenAI":
72 |             llm = instantiate_LLM(
73 |                 "OpenAI",
74 |                 api_key=st.session_state.openai_api_key,
75 |                 temperature=temperature,
76 |                 top_p=top_p,
77 |                 model_name=st.session_state.selected_model,
78 |             )
79 |         else:
80 |             llm = instantiate_LLM(
81 |                 "Google",
82 |                 api_key=st.session_state.google_api_key,
83 |                 temperature=temperature,
84 |                 top_p=top_p,
85 |                 model_name=st.session_state.selected_model,
86 |             )
87 |     except Exception as e:
88 |         st.error(f"An error occured: {e}")
89 |         llm = None
90 |     return llm
91 | 


--------------------------------------------------------------------------------
/Streamlit_App/resume_analyzer.py:
--------------------------------------------------------------------------------
   1 | import streamlit as st
   2 | import json, warnings
   3 | 
   4 | warnings.filterwarnings("ignore", category=FutureWarning)
   5 | 
   6 | import datetime, json
   7 | 
   8 | from langchain.prompts import PromptTemplate
   9 | 
  10 | from app_constants import (
  11 |     templates,
  12 |     PROMPT_IMPROVE_WORK_EXPERIENCE,
  13 |     PROMPT_IMPROVE_PROJECT,
  14 |     PROMPT_EVALUATE_RESUME,
  15 |     PROMPT_IMPROVE_SUMMARY,
  16 | )
  17 | import retrieval
  18 | 
  19 | 
  20 | def create_prompt_template(resume_sections, language="english"):
  21 |     """create the promptTemplate.
  22 |     Parameters:
  23 |        resume_sections (list): List of CV sections from which information will be extracted.
  24 |        language (str): the language of the assistant, default="english".
  25 |     """
  26 | 
  27 |     # Create the Template
  28 |     template = f"""For the following resume, output in {language} the following information:\n\n"""
  29 | 
  30 |     for key in resume_sections:
  31 |         template += key + ": " + templates[key] + "\n\n"
  32 | 
  33 |     template += "For any requested information, if it is not found, output 'unknown' or ['unknown'] accordingly.\n\n"
  34 |     template += (
  35 |         """Format the final output as a json dictionary with the following keys: ("""
  36 |     )
  37 | 
  38 |     for key in resume_sections:
  39 |         template += "" + key + ", "
  40 |     template = template[:-2] + ")"  # remove the last ", "
  41 | 
  42 |     template += """\n\nResume: {text}"""
  43 | 
  44 |     # Create the PromptTemplate
  45 |     prompt_template = PromptTemplate.from_template(template)
  46 | 
  47 |     return prompt_template
  48 | 
  49 | 
  50 | def extract_from_text(text, start_tag, end_tag=None):
  51 |     """Use start and end tags to extract a substring from text.
  52 |     This helper function is used to parse the response content on the LLM in case 'json.loads' fails.
  53 |     """
  54 |     start_index = text.find(start_tag)
  55 |     if end_tag is None:
  56 |         extacted_txt = text[start_index + len(start_tag) :]
  57 |     else:
  58 |         end_index = text.find(end_tag)
  59 |         extacted_txt = text[start_index + len(start_tag) : end_index]
  60 | 
  61 |     return extacted_txt
  62 | 
  63 | 
  64 | def convert_text_to_list_of_dicts(text, dict_keys):
  65 |     """Convert text to a python list of dicts.
  66 |     Parameters:
  67 |      - text: string containing a list of dicts
  68 |      - dict_keys (list): the keys of the dictionary which will be returned.
  69 |     Output:
  70 |      - list_of_dicts (list): the list of dicts to return.
  71 |     """
  72 |     list_of_dicts = []
  73 | 
  74 |     if text != "":
  75 |         text_splitted = text.split("},\n")
  76 |         dict_keys.append(None)
  77 | 
  78 |         for i in range(len(text_splitted)):
  79 |             dict_i = {}
  80 | 
  81 |             for j in range(len(dict_keys) - 1):
  82 |                 key_value = extract_from_text(
  83 |                     text_splitted[i], f'"{dict_keys[j]}": ', f'"{dict_keys[j+1]}": '
  84 |                 )
  85 |                 key_value = key_value[: key_value.rfind(",\n")].strip()[1:-1]
  86 |                 dict_i[dict_keys[j]] = key_value
  87 | 
  88 |             list_of_dicts.append(dict_i)  # add the dict to the list.
  89 | 
  90 |     return list_of_dicts
  91 | 
  92 | 
  93 | def get_current_time():
  94 |     current_time = (datetime.datetime.now()).strftime("%H:%M:%S")
  95 |     return current_time
  96 | 
  97 | 
  98 | def invoke_LLM(
  99 |     llm,
 100 |     documents,
 101 |     resume_sections: list,
 102 |     info_message="",
 103 |     language="english",
 104 | ):
 105 |     """Invoke LLM and get a response.
 106 |     Parameters:
 107 |      - llm: the LLM to call
 108 |      - documents: our Langchain Documents. Will be use to format the prompt_template.
 109 |      - resume_sections (list): List of resume sections to be parsed.
 110 |      - info_message (str): display an informational message.
 111 |      - language (str): Assistant language. Will be use to format the prompt_template.
 112 | 
 113 |      Output:
 114 |      - response_content (str): the content of the LLM response.
 115 |      - response_tokens_count (int): count of response tokens.
 116 |     """
 117 | 
 118 |     # 1. display the info message
 119 |     st.info(f"**{get_current_time()}** \t{info_message}")
 120 |     print(f"**{get_current_time()}** \t{info_message}")
 121 | 
 122 |     # 2. Create the promptTemplate.
 123 |     prompt_template = create_prompt_template(
 124 |         resume_sections,
 125 |         language=st.session_state.assistant_language,
 126 |     )
 127 | 
 128 |     # 3. Format promptTemplate with the full documents
 129 |     if language is not None:
 130 |         prompt = prompt_template.format_prompt(text=documents, language=language).text
 131 |     else:
 132 |         prompt = prompt_template.format_prompt(text=documents).text
 133 | 
 134 |     # 4. Invoke LLM
 135 |     response = llm.invoke(prompt)
 136 | 
 137 |     response_content = response.content[
 138 |         response.content.find("{") : response.content.rfind("}") + 1
 139 |     ]
 140 |     response_tokens_count = sum(retrieval.tiktoken_tokens([response_content]))
 141 | 
 142 |     return response_content, response_tokens_count
 143 | 
 144 | 
 145 | def ResponseContent_Parser(
 146 |     response_content, list_fields, list_rfind, list_exclude_first_car
 147 | ):
 148 |     """This is a function for parsing any response_content.
 149 |     Parameters:
 150 |     - response_content (str): the content of the LLM response we are going to parse.
 151 |     - list_fields (list): List of dictionary fields returned by this function.
 152 |         A field can be a dictionary. The key of the dict will not be parsed.
 153 |         Example: [{'Contact__information':['candidate__location','candidate__email','candidate__phone','candidate__social_media']},
 154 |                    'CV__summary']
 155 |                    We will not parse the content for 'Contact__information'.
 156 |     - list_rfind (list): To parse the content of a field, first we will extract the text between this field and the next field.
 157 |         Then, extract text using `rfind` Python command, which returns the highest index in the text where the substring is found.
 158 |     - list_exclude_first_car (list): Exclusion or not of the first and last characters.
 159 | 
 160 |     Output:
 161 |       - INFORMATION_dict: dictionary, where fields are the keys and parsed texts are the values.
 162 | 
 163 |     """
 164 | 
 165 |     list_fields_detailed = (
 166 |         []
 167 |     )  # list of tupples. tupple = (field,extract info (boolean), parent field)
 168 | 
 169 |     for field in list_fields:
 170 |         if type(field) is dict:
 171 |             list_fields_detailed.append(
 172 |                 (list(field.keys())[0], False, None)
 173 |             )  # We will not extract any value for the text between this tag and the next.
 174 |             for val in list(field.values())[0]:
 175 |                 list_fields_detailed.append((val, True, list(field.keys())[0]))
 176 |         else:
 177 |             list_fields_detailed.append((field, True, None))
 178 | 
 179 |     list_fields_detailed.append((None, False, None))
 180 | 
 181 |     # Parse the response_content
 182 |     INFORMATION_dict = {}
 183 | 
 184 |     for i in range(len(list_fields_detailed) - 1):
 185 |         if list_fields_detailed[i][1] is False:  # Extract info = False
 186 |             INFORMATION_dict[list_fields_detailed[i][0]] = {}  # Initialize the dict
 187 |         if list_fields_detailed[i][1]:
 188 |             extracted_value = extract_from_text(
 189 |                 response_content,
 190 |                 f'"{list_fields_detailed[i][0]}": ',
 191 |                 f'"{list_fields_detailed[i+1][0]}":',
 192 |             )
 193 |             extracted_value = extracted_value[
 194 |                 : extracted_value.rfind(list_rfind[i])
 195 |             ].strip()
 196 |             if list_exclude_first_car[i]:
 197 |                 extracted_value = extracted_value[1:-1].strip()
 198 |             if list_fields_detailed[i][2] is None:
 199 |                 INFORMATION_dict[list_fields_detailed[i][0]] = extracted_value
 200 |             else:
 201 |                 INFORMATION_dict[list_fields_detailed[i][2]][
 202 |                     list_fields_detailed[i][0]
 203 |                 ] = extracted_value
 204 | 
 205 |     return INFORMATION_dict
 206 | 
 207 | 
 208 | def Extract_contact_information(llm, documents):
 209 |     """Extract Contact Information: Name, Title, Location, Email, Phone number and Social media profiles."""
 210 | 
 211 |     try:
 212 |         response_content, response_tokens_count = invoke_LLM(
 213 |             llm,
 214 |             documents,
 215 |             resume_sections=["Contact__information"],
 216 |             info_message="Extract and evaluate contact information...",
 217 |             language=st.session_state.assistant_language,
 218 |         )
 219 | 
 220 |         try:
 221 |             # Load response_content to json dictionary
 222 |             CONTACT_INFORMATION = json.loads(response_content, strict=False)
 223 |         except Exception as e:
 224 |             print("[ERROR] json.loads returns error:", e)
 225 |             print("\n[INFO] Parse response content...\n")
 226 | 
 227 |             list_fields = [
 228 |                 {
 229 |                     "Contact__information": [
 230 |                         "candidate__name",
 231 |                         "candidate__title",
 232 |                         "candidate__location",
 233 |                         "candidate__email",
 234 |                         "candidate__phone",
 235 |                         "candidate__social_media",
 236 |                         "evaluation__ContactInfo",
 237 |                         "score__ContactInfo",
 238 |                     ]
 239 |                 }
 240 |             ]
 241 |             list_rfind = [",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", "}\n"]
 242 |             list_exclude_first_car = [
 243 |                 True,
 244 |                 True,
 245 |                 True,
 246 |                 True,
 247 |                 True,
 248 |                 True,
 249 |                 False,
 250 |                 True,
 251 |                 False,
 252 |             ]
 253 |             CONTACT_INFORMATION = ResponseContent_Parser(
 254 |                 response_content, list_fields, list_rfind, list_exclude_first_car
 255 |             )
 256 |             # convert score to int
 257 |             try:
 258 |                 CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"] = int(
 259 |                     CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"]
 260 |                 )
 261 |             except:
 262 |                 CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"] = -1
 263 | 
 264 |     except Exception as exception:
 265 |         print(f"[Error] {exception}")
 266 |         CONTACT_INFORMATION = {
 267 |             "Contact__information": {
 268 |                 "candidate__name": "unknown",
 269 |                 "candidate__title": "unknown",
 270 |                 "candidate__location": "unknown",
 271 |                 "candidate__email": "unknown",
 272 |                 "candidate__phone": "unknown",
 273 |                 "candidate__social_media": "unknown",
 274 |                 "evaluation__ContactInfo": "unknown",
 275 |                 "score__ContactInfo": -1,
 276 |             }
 277 |         }
 278 | 
 279 |     return CONTACT_INFORMATION
 280 | 
 281 | 
 282 | def Extract_Evaluate_Summary(llm, documents):
 283 |     """Extract, evaluate and strengthen the summary."""
 284 | 
 285 |     ######################################
 286 |     # 1. Extract the summary
 287 |     ######################################
 288 |     try:
 289 |         response_content, response_tokens_count = invoke_LLM(
 290 |             llm,
 291 |             documents,
 292 |             resume_sections=["CV__summary"],
 293 |             info_message="Extract and evaluate the Summary....",
 294 |             language=st.session_state.assistant_language,
 295 |         )
 296 |         try:
 297 |             # Load response_content to json dictionary
 298 |             SUMMARY_SECTION = json.loads(response_content, strict=False)
 299 |         except Exception as e:
 300 |             print("[ERROR] json.loads returns error:", e)
 301 |             print("\n[INFO] Parse response content...\n")
 302 | 
 303 |             list_fields = ["CV__summary"]
 304 |             list_rfind = ["}\n"]
 305 |             list_exclude_first_car = [True]
 306 | 
 307 |             SUMMARY_SECTION = ResponseContent_Parser(
 308 |                 response_content, list_fields, list_rfind, list_exclude_first_car
 309 |             )
 310 | 
 311 |     except Exception as exception:
 312 |         print(f"[Error] {exception}")
 313 |         SUMMARY_SECTION = {"CV__summary": "unknown"}
 314 | 
 315 |     ######################################
 316 |     # 2. Evaluate the summary
 317 |     ######################################
 318 | 
 319 |     try:
 320 |         prompt_template = PromptTemplate.from_template(PROMPT_IMPROVE_SUMMARY)
 321 | 
 322 |         prompt = prompt_template.format_prompt(
 323 |             resume=documents,
 324 |             language=st.session_state.assistant_language,
 325 |             summary=SUMMARY_SECTION["CV__summary"],
 326 |         ).text
 327 | 
 328 |         # Invoke LLM
 329 |         response = llm.invoke(prompt)
 330 |         response_content = response.content[
 331 |             response.content.find("{") : response.content.rfind("}") + 1
 332 |         ]
 333 | 
 334 |         try:
 335 |             SUMMARY_EVAL = {}
 336 |             SUMMARY_EVAL["Summary__evaluation"] = json.loads(
 337 |                 response_content, strict=False
 338 |             )
 339 |         except Exception as e:
 340 |             print("[ERROR] json.loads returns error:", e)
 341 |             print("\n[INFO] Parse response content...\n")
 342 | 
 343 |             list_fields = [
 344 |                 "evaluation__summary",
 345 |                 "score__summary",
 346 |                 "CV__summary_enhanced",
 347 |             ]
 348 |             list_rfind = [",\n", ",\n", "}\n"]
 349 |             list_exclude_first_car = [True, False, True]
 350 |             SUMMARY_EVAL["Summary__evaluation"] = ResponseContent_Parser(
 351 |                 response_content, list_fields, list_rfind, list_exclude_first_car
 352 |             )
 353 |             # convert score to int
 354 |             try:
 355 |                 SUMMARY_EVAL["Summary__evaluation"]["score__summary"] = int(
 356 |                     SUMMARY_EVAL["Summary__evaluation"]["score__summary"]
 357 |                 )
 358 |             except:
 359 |                 SUMMARY_EVAL["Summary__evaluation"]["score__summary"] = -1
 360 | 
 361 |     except Exception as e:
 362 |         print(e)
 363 |         SUMMARY_EVAL = {
 364 |             "Summary__evaluation": {
 365 |                 "evaluation__summary": "unknown",
 366 |                 "score__summary": -1,
 367 |                 "CV__summary_enhanced": "unknown",
 368 |             }
 369 |         }
 370 | 
 371 |     SUMMARY_EVAL["CV__summary"] = SUMMARY_SECTION["CV__summary"]
 372 | 
 373 |     return SUMMARY_EVAL
 374 | 
 375 | 
 376 | def Extract_Education_Language(llm, documents):
 377 |     """Extract and evaluate education and language sections."""
 378 | 
 379 |     try:
 380 |         response_content, response_tokens_count = invoke_LLM(
 381 |             llm,
 382 |             documents,
 383 |             resume_sections=[
 384 |                 "CV__Education",
 385 |                 "Education__evaluation",
 386 |                 "CV__Languages",
 387 |                 "Languages__evaluation",
 388 |             ],
 389 |             info_message="Extract and evaluate education and language sections...",
 390 |             language=st.session_state.assistant_language,
 391 |         )
 392 | 
 393 |         try:
 394 |             # Load response_content to json dictionary
 395 |             Education_Language_sections = json.loads(response_content, strict=False)
 396 |         except Exception as e:
 397 |             print("[ERROR] json.loads returns error:", e)
 398 |             print("\n[INFO] Parse response content...\n")
 399 | 
 400 |             list_fields = [
 401 |                 "CV__Education",
 402 |                 {"Education__evaluation": ["score__edu", "evaluation__edu"]},
 403 |                 "CV__Languages",
 404 |                 {"Languages__evaluation": ["score__language", "evaluation__language"]},
 405 |             ]
 406 | 
 407 |             list_rfind = [",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", "\n"]
 408 |             list_exclude_first_car = [True, True, False, True, True, True, False, True]
 409 | 
 410 |             Education_Language_sections = ResponseContent_Parser(
 411 |                 response_content, list_fields, list_rfind, list_exclude_first_car
 412 |             )
 413 | 
 414 |             # Convert scores to int
 415 |             try:
 416 |                 Education_Language_sections["Education__evaluation"]["score__edu"] = (
 417 |                     int(
 418 |                         Education_Language_sections["Education__evaluation"][
 419 |                             "score__edu"
 420 |                         ]
 421 |                     )
 422 |                 )
 423 |             except:
 424 |                 Education_Language_sections["Education__evaluation"]["score__edu"] = -1
 425 | 
 426 |             try:
 427 |                 Education_Language_sections["Languages__evaluation"][
 428 |                     "score__language"
 429 |                 ] = int(
 430 |                     Education_Language_sections["Languages__evaluation"][
 431 |                         "score__language"
 432 |                     ]
 433 |                 )
 434 |             except:
 435 |                 Education_Language_sections["Languages__evaluation"][
 436 |                     "score__language"
 437 |                 ] = -1
 438 | 
 439 |             # Split languages and educational texts into a Python list of dict
 440 |             languages = Education_Language_sections["CV__Languages"]
 441 |             Education_Language_sections["CV__Languages"] = (
 442 |                 convert_text_to_list_of_dicts(
 443 |                     text=languages[
 444 |                         languages.find("[") + 1 : languages.rfind("]")
 445 |                     ].strip(),
 446 |                     dict_keys=["spoken__language", "language__fluency"],
 447 |                 )
 448 |             )
 449 |             education = Education_Language_sections["CV__Education"]
 450 |             Education_Language_sections["CV__Education"] = (
 451 |                 convert_text_to_list_of_dicts(
 452 |                     text=education[
 453 |                         education.find("[") + 1 : education.rfind("]")
 454 |                     ].strip(),
 455 |                     dict_keys=[
 456 |                         "edu__college",
 457 |                         "edu__degree",
 458 |                         "edu__start_date",
 459 |                         "edu__end_date",
 460 |                     ],
 461 |                 )
 462 |             )
 463 |     except Exception as exception:
 464 |         print(exception)
 465 |         Education_Language_sections = {
 466 |             "CV__Education": [],
 467 |             "Education__evaluation": {"score__edu": -1, "evaluation__edu": "unknown"},
 468 |             "CV__Languages": [],
 469 |             "Languages__evaluation": {
 470 |                 "score__language": -1,
 471 |                 "evaluation__language": "unknown",
 472 |             },
 473 |         }
 474 | 
 475 |     return Education_Language_sections
 476 | 
 477 | 
 478 | def Extract_Skills_and_Certifications(llm, documents):
 479 |     """Extract skills and certifications and evaluate these sections."""
 480 | 
 481 |     try:
 482 |         response_content, response_tokens_count = invoke_LLM(
 483 |             llm,
 484 |             documents,
 485 |             resume_sections=[
 486 |                 "candidate__skills",
 487 |                 "Skills__evaluation",
 488 |                 "CV__Certifications",
 489 |                 "Certif__evaluation",
 490 |             ],
 491 |             info_message="Extract and evaluate the skills and certifications...",
 492 |             language=st.session_state.assistant_language,
 493 |         )
 494 | 
 495 |         try:
 496 |             # Load response_content to json dictionary
 497 |             SKILLS_and_CERTIF = json.loads(response_content, strict=False)
 498 |         except Exception as e:
 499 |             print("[ERROR] json.loads returns error:", e)
 500 |             print("\n[INFO] Parse response content...\n")
 501 | 
 502 |             skills = extract_from_text(
 503 |                 response_content, '"candidate__skills": ', '"Skills__evaluation":'
 504 |             )
 505 |             skills = skills.replace("\n  ", "\n").replace("],\n", "").replace("[\n", "")
 506 |             score_skills = extract_from_text(
 507 |                 response_content, '"score__skills": ', '"evaluation__skills":'
 508 |             )
 509 |             evaluation_skills = extract_from_text(
 510 |                 response_content, '"evaluation__skills": ', '"CV__Certifications":'
 511 |             )
 512 | 
 513 |             certif_text = extract_from_text(
 514 |                 response_content, '"CV__Certifications": ', '"Certif__evaluation":'
 515 |             )
 516 |             certif_score = extract_from_text(
 517 |                 response_content, '"score__certif": ', '"evaluation__certif":'
 518 |             )
 519 |             certif_eval = extract_from_text(
 520 |                 response_content, '"evaluation__certif": ', None
 521 |             )
 522 | 
 523 |             # Create the dictionary
 524 |             SKILLS_and_CERTIF = {}
 525 |             SKILLS_and_CERTIF["candidate__skills"] = [
 526 |                 skill.strip()[1:-1] for skill in skills.split(",\n")
 527 |             ]
 528 |             try:
 529 |                 score_skills_int = int(score_skills[0 : score_skills.rfind(",\n")])
 530 |             except:
 531 |                 score_skills_int = -1
 532 |             SKILLS_and_CERTIF["Skills__evaluation"] = {
 533 |                 "score__skills": score_skills_int,
 534 |                 "evaluation__skills": evaluation_skills[
 535 |                     : evaluation_skills.rfind("}\n")
 536 |                 ].strip()[1:-1],
 537 |             }
 538 | 
 539 |             # Convert certificate text to list of dictionaries
 540 |             list_certifs = convert_text_to_list_of_dicts(
 541 |                 text=certif_text[
 542 |                     certif_text.find("[") + 1 : certif_text.rfind("]")
 543 |                 ].strip(),  # .strip()[1:-1]
 544 |                 dict_keys=[
 545 |                     "certif__title",
 546 |                     "certif__organization",
 547 |                     "certif__date",
 548 |                     "certif__expiry_date",
 549 |                     "certif__details",
 550 |                 ],
 551 |             )
 552 |             SKILLS_and_CERTIF["CV__Certifications"] = list_certifs
 553 |             try:
 554 |                 certif_score_int = int(certif_score[0 : certif_score.rfind(",\n")])
 555 |             except:
 556 |                 certif_score_int = -1
 557 |             SKILLS_and_CERTIF["Certif__evaluation"] = {
 558 |                 "score__certif": certif_score_int,
 559 |                 "evaluation__certif": certif_eval[: certif_eval.rfind("}\n")].strip()[
 560 |                     1:-1
 561 |                 ],
 562 |             }
 563 | 
 564 |     except Exception as exception:
 565 |         SKILLS_and_CERTIF = {
 566 |             "candidate__skills": [],
 567 |             "Skills__evaluation": {
 568 |                 "score__skills": -1,
 569 |                 "evaluation__skills": "unknown",
 570 |             },
 571 |             "CV__Certifications": [],
 572 |             "Certif__evaluation": {
 573 |                 "score__certif": -1,
 574 |                 "evaluation__certif": "unknown",
 575 |             },
 576 |         }
 577 |         print(exception)
 578 | 
 579 |     return SKILLS_and_CERTIF
 580 | 
 581 | 
 582 | def Extract_PROFESSIONAL_EXPERIENCE(llm, documents):
 583 |     """Extract list of work experience and projects."""
 584 | 
 585 |     try:
 586 |         response_content, response_tokens_count = invoke_LLM(
 587 |             llm,
 588 |             documents,
 589 |             resume_sections=["Work__experience", "CV__Projects"],
 590 |             info_message="Extract list of work experience and projects...",
 591 |             language=st.session_state.assistant_language,
 592 |         )
 593 | 
 594 |         try:
 595 |             # Load response_content to json dictionary
 596 |             PROFESSIONAL_EXPERIENCE = json.loads(response_content, strict=False)
 597 |         except Exception as e:
 598 |             print("[ERROR] json.loads returns error:", e)
 599 |             print("\n[INFO] Parse response content...\n")
 600 | 
 601 |             work_experiences = extract_from_text(
 602 |                 response_content, '"Work__experience": ', '"CV__Projects":'
 603 |             )
 604 |             projects = extract_from_text(response_content, '"CV__Projects": ', None)
 605 | 
 606 |             # Create the dictionary
 607 |             PROFESSIONAL_EXPERIENCE = {}
 608 |             PROFESSIONAL_EXPERIENCE["Work__experience"] = convert_text_to_list_of_dicts(
 609 |                 text=work_experiences[
 610 |                     work_experiences.find("[") + 1 : work_experiences.rfind("]")
 611 |                 ].strip()[1:-1],
 612 |                 dict_keys=[
 613 |                     "job__title",
 614 |                     "job__company",
 615 |                     "job__start_date",
 616 |                     "job__end_date",
 617 |                 ],
 618 |             )
 619 |             PROFESSIONAL_EXPERIENCE["CV__Projects"] = convert_text_to_list_of_dicts(
 620 |                 text=projects[projects.find("[") + 1 : projects.rfind("]")].strip()[
 621 |                     1:-1
 622 |                 ],
 623 |                 dict_keys=[
 624 |                     "project__title",
 625 |                     "project__start_date",
 626 |                     "project__end_date",
 627 |                 ],
 628 |             )
 629 |         # Exclude 'unknown' projects and work experiences
 630 |         try:
 631 |             for work_experience in PROFESSIONAL_EXPERIENCE["Work__experience"]:
 632 |                 if work_experience["job__title"] == "unknown":
 633 |                     PROFESSIONAL_EXPERIENCE["Work__experience"].remove(work_experience)
 634 |         except Exception as e:
 635 |             print(e)
 636 |         try:
 637 |             for project in PROFESSIONAL_EXPERIENCE["CV__Projects"]:
 638 |                 if project["project__title"] == "unknown":
 639 |                     PROFESSIONAL_EXPERIENCE["CV__Projects"].remove(project)
 640 |         except Exception as e:
 641 |             print(e)
 642 | 
 643 |     except Exception as exception:
 644 |         PROFESSIONAL_EXPERIENCE = {"Work__experience": [], "CV__Projects": []}
 645 |         print(exception)
 646 | 
 647 |     return PROFESSIONAL_EXPERIENCE
 648 | 
 649 | 
 650 | def get_relevant_documents(query, documents):
 651 |     """Retreieve most relevant documents from Langchain documents using the CoherRerank retriever."""
 652 | 
 653 |     # 1.1. Retrieve documents using the CohereRerank retriever
 654 | 
 655 |     retrieved_docs = st.session_state.retriever.get_relevant_documents(query)
 656 | 
 657 |     # 1.2. Keep only relevant documents where relevance_score >= (max(relevance_scores) - 0.1)
 658 | 
 659 |     relevance_scores = [
 660 |         retrieved_docs[j].metadata["relevance_score"]
 661 |         for j in range(len(retrieved_docs))
 662 |     ]
 663 |     max_relevance_score = max(relevance_scores)
 664 |     threshold = max_relevance_score - 0.1
 665 | 
 666 |     relevant_doc_ids = []
 667 | 
 668 |     for j in range(len(retrieved_docs)):
 669 | 
 670 |         # keep relevant documents with (relevance_score >= threshold)
 671 | 
 672 |         if retrieved_docs[j].metadata["relevance_score"] >= threshold:
 673 |             # Append the retrieved document
 674 |             relevant_doc_ids.append(retrieved_docs[j].metadata["doc_number"])
 675 | 
 676 |     # Append the next document to the most relevant document, as relevant information may be split between two documents.
 677 |     relevant_doc_ids.append(min(relevant_doc_ids[0] + 1, len(documents) - 1))
 678 | 
 679 |     # Sort document ids
 680 |     relevant_doc_ids = sorted(set(relevant_doc_ids))
 681 | 
 682 |     # Get the most relevant documents
 683 |     relevant_documents = [documents[k] for k in relevant_doc_ids]
 684 | 
 685 |     return relevant_documents
 686 | 
 687 | 
 688 | def Extract_Job_Responsibilities(llm, documents, PROFESSIONAL_EXPERIENCE):
 689 |     """Extract job responsibilities for each job in PROFESSIONAL_EXPERIENCE."""
 690 | 
 691 |     st.info(f"**{get_current_time()}** \tExtract work experience responsibilities...")
 692 |     print(f"**{get_current_time()}** \tExtract work experience responsibilities...")
 693 | 
 694 |     for i in range(len(PROFESSIONAL_EXPERIENCE["Work__experience"])):
 695 |         try:
 696 |             Work_experience_i = PROFESSIONAL_EXPERIENCE["Work__experience"][i]
 697 | 
 698 |             # 1. Extract relevant documents
 699 |             query = f"""Extract from the resume delimited by triple backticks \
 700 | all the duties and responsibilities of the following work experience: \
 701 | (title = '{Work_experience_i['job__title']}'"""
 702 |             if str(Work_experience_i["job__company"]) != "unknown":
 703 |                 query += f" and company = '{Work_experience_i['job__company']}'"
 704 |             if str(Work_experience_i["job__start_date"]) != "unknown":
 705 |                 query += f" and start date = '{Work_experience_i['job__start_date']}'"
 706 |             if str(Work_experience_i["job__end_date"]) != "unknown":
 707 |                 query += f" and end date = '{Work_experience_i['job__end_date']}'"
 708 |             query += ")\n"
 709 | 
 710 |             try:
 711 |                 relevant_documents = get_relevant_documents(query, documents)
 712 |             except Exception as err:
 713 |                 st.error(f"get_relevant_documents error: {err}")
 714 |                 relevant_documents = documents
 715 | 
 716 |             # 2. Invoke LLM
 717 | 
 718 |             prompt = (
 719 |                 query
 720 |                 + f"""Output the duties in a json dictionary with the following keys (__duty_id__,__duty__). \
 721 | Use this format: "1":"duty","2":"another duty".
 722 | Resume:\n\n ```{relevant_documents}```"""
 723 |             )
 724 |             response = llm.invoke(prompt)
 725 | 
 726 |             # 3. Convert the response content to json dict and update work_experience
 727 |             response_content = response.content[
 728 |                 response.content.find("{") : response.content.rfind("}") + 1
 729 |             ]
 730 | 
 731 |             try:
 732 |                 Work_experience_i["work__duties"] = json.loads(
 733 |                     response_content, strict=False
 734 |                 )  # Convert the response content to a json dict
 735 |             except Exception as e:
 736 |                 print("\njson.loads returns error:", e, "\n\n")
 737 |                 print("\n[INFO] Parse response content...\n")
 738 | 
 739 |                 Work_experience_i["work__duties"] = {}
 740 |                 list_duties = (
 741 |                     response_content[
 742 |                         response_content.find("{") + 1 : response_content.rfind("}")
 743 |                     ]
 744 |                     .strip()
 745 |                     .split(",\n")
 746 |                 )
 747 | 
 748 |                 for j in range(len(list_duties)):
 749 |                     try:
 750 |                         Work_experience_i["work__duties"][f"{j+1}"] = (
 751 |                             list_duties[j].split('":')[1].strip()[1:-1]
 752 |                         )
 753 |                     except:
 754 |                         Work_experience_i["work__duties"][f"{j+1}"] = "unknown"
 755 | 
 756 |         except Exception as exception:
 757 |             Work_experience_i["work__duties"] = {}
 758 |             print(exception)
 759 | 
 760 |     return PROFESSIONAL_EXPERIENCE
 761 | 
 762 | 
 763 | def Extract_Project_Details(llm, documents, PROFESSIONAL_EXPERIENCE):
 764 |     """Extract project details for each project in PROFESSIONAL_EXPERIENCE."""
 765 | 
 766 |     st.info(f"**{get_current_time()}** \tExtract project details...")
 767 |     print(f"**{get_current_time()}** \tExtract project details...")
 768 | 
 769 |     for i in range(len(PROFESSIONAL_EXPERIENCE["CV__Projects"])):
 770 |         try:
 771 |             project_i = PROFESSIONAL_EXPERIENCE["CV__Projects"][i]
 772 | 
 773 |             # 1. Extract relevant documents
 774 |             query = f"""Extract from the resume (delimited by triple backticks) what is listed about the following project: \
 775 | (project title = '{project_i['project__title']}'"""
 776 |             if str(project_i["project__start_date"]) != "unknown":
 777 |                 query += f" and start date = '{project_i['project__start_date']}'"
 778 |             if str(project_i["project__end_date"]) != "unknown":
 779 |                 query += f" and end date = '{project_i['project__end_date']}'"
 780 |             query += ")"
 781 | 
 782 |             try:
 783 |                 relevant_documents = get_relevant_documents(query, documents)
 784 |             except Exception as err:
 785 |                 st.error(f"get_relevant_documents error: {err}")
 786 |                 relevant_documents = documents
 787 | 
 788 |             # 2. Invoke LLM
 789 | 
 790 |             prompt = (
 791 |                 query
 792 |                 + f"""Format the extracted text into a string (with bullet points).
 793 | Resume:\n\n ```{relevant_documents}```"""
 794 |             )
 795 | 
 796 |             response = llm.invoke(prompt)
 797 | 
 798 |             response_content = response.content
 799 |             project_i["project__description"] = response_content
 800 | 
 801 |         except Exception as exception:
 802 |             project_i["project__description"] = "unknown"
 803 |             print(exception)
 804 | 
 805 |     return PROFESSIONAL_EXPERIENCE
 806 | 
 807 | 
 808 | ###############################################################################
 809 | #           Improve Work Experience and Project texts
 810 | ###############################################################################
 811 | 
 812 | 
 813 | def improve_text_quality(PROMPT, text_to_imporve, llm, language):
 814 |     """Invoke LLM to improve the text quality."""
 815 |     query = PROMPT.format(text=text_to_imporve, language=language)
 816 |     response = llm.invoke(query)
 817 |     return response
 818 | 
 819 | 
 820 | def improve_work_experience(WORK_EXPERIENCE: list, llm):
 821 |     """Improve each bullet point in the work experience responsibilities."""
 822 | 
 823 |     message = f"**{get_current_time()}** \tImprove the quality of the work experience section..."
 824 |     st.info(message)
 825 |     print(message)
 826 | 
 827 |     # Call LLM for any work experience to get a better and stronger text.
 828 |     for i in range(len(WORK_EXPERIENCE)):
 829 |         try:
 830 |             WORK_EXPERIENCE_i = WORK_EXPERIENCE[i]
 831 | 
 832 |             # 1. Convert the responsibilities from dict to string
 833 | 
 834 |             text_duties = ""
 835 |             for duty in list(WORK_EXPERIENCE_i["work__duties"].values()):
 836 |                 text_duties += "- " + duty
 837 |             # 2. Call LLM
 838 | 
 839 |             response = improve_text_quality(
 840 |                 PROMPT_IMPROVE_WORK_EXPERIENCE,
 841 |                 text_duties,
 842 |                 llm,
 843 |                 st.session_state.assistant_language,
 844 |             )
 845 |             response_content = response.content
 846 | 
 847 |             # 3. Convert response content to json dict with keys:
 848 |             # ('Score__WorkExperience','Comments__WorkExperience','Improvement__WorkExperience')
 849 | 
 850 |             response_content = response_content[
 851 |                 response_content.find("{") : response_content.rfind("}") + 1
 852 |             ]
 853 | 
 854 |             try:
 855 |                 list_fields = [
 856 |                     "Score__WorkExperience",
 857 |                     "Comments__WorkExperience",
 858 |                     "Improvement__WorkExperience",
 859 |                 ]
 860 |                 list_rfind = [",\n", ",\n", "\n"]
 861 |                 list_exclude_first_car = [False, True, True]
 862 |                 response_content_dict = ResponseContent_Parser(
 863 |                     response_content, list_fields, list_rfind, list_exclude_first_car
 864 |                 )
 865 |                 try:
 866 |                     response_content_dict["Score__WorkExperience"] = int(
 867 |                         response_content_dict["Score__WorkExperience"]
 868 |                     )
 869 |                 except:
 870 |                     response_content_dict["Score__WorkExperience"] = -1
 871 | 
 872 |             except Exception as e:
 873 |                 response_content_dict = {
 874 |                     "Score__WorkExperience": -1,
 875 |                     "Comments__WorkExperience": "",
 876 |                     "Improvement__WorkExperience": "",
 877 |                 }
 878 |                 print(e)
 879 |                 st.error(e)
 880 | 
 881 |             # 4. update PROFESSIONAL_EXPERIENCE: Add the new keys (overall_quality, comments, Improvement.)
 882 | 
 883 |             WORK_EXPERIENCE_i["Score__WorkExperience"] = response_content_dict[
 884 |                 "Score__WorkExperience"
 885 |             ]
 886 |             WORK_EXPERIENCE_i["Comments__WorkExperience"] = response_content_dict[
 887 |                 "Comments__WorkExperience"
 888 |             ]
 889 |             WORK_EXPERIENCE_i["Improvement__WorkExperience"] = response_content_dict[
 890 |                 "Improvement__WorkExperience"
 891 |             ]
 892 | 
 893 |         except Exception as exception:
 894 |             st.error(exception)
 895 |             print(exception)
 896 |             WORK_EXPERIENCE_i["Score__WorkExperience"] = -1
 897 |             WORK_EXPERIENCE_i["Comments__WorkExperience"] = ""
 898 |             WORK_EXPERIENCE_i["Improvement__WorkExperience"] = ""
 899 | 
 900 |     return WORK_EXPERIENCE
 901 | 
 902 | 
 903 | def improve_projects(PROJECTS: list, llm):
 904 |     """Improve project text with LLM."""
 905 | 
 906 |     st.info(f"**{get_current_time()}** \tImprove the quality of the project section...")
 907 |     print(f"**{get_current_time()}** \tImprove the quality of the project section...")
 908 | 
 909 |     for i in range(len(PROJECTS)):
 910 |         try:
 911 |             PROJECT_i = PROJECTS[i]  # the ith project.
 912 | 
 913 |             # 1. LLM call to improve the text quality of each duty
 914 |             response = improve_text_quality(
 915 |                 PROMPT_IMPROVE_PROJECT,
 916 |                 PROJECT_i["project__title"] + "\n" + PROJECT_i["project__description"],
 917 |                 llm,
 918 |                 st.session_state.assistant_language,
 919 |             )
 920 |             response_content = response.content
 921 | 
 922 |             # 2. Convert response content to json dict with keys:
 923 |             # ('Score__project','Comments__project','Improvement__project')
 924 | 
 925 |             response_content = response_content[
 926 |                 response_content.find("{") : response_content.rfind("}") + 1
 927 |             ]
 928 | 
 929 |             try:
 930 |                 list_fields = [
 931 |                     "Score__project",
 932 |                     "Comments__project",
 933 |                     "Improvement__project",
 934 |                 ]
 935 |                 list_rfind = [",\n", ",\n", "\n"]
 936 |                 list_exclude_first_car = [False, True, True]
 937 | 
 938 |                 response_content_dict = ResponseContent_Parser(
 939 |                     response_content, list_fields, list_rfind, list_exclude_first_car
 940 |                 )
 941 |                 try:
 942 |                     response_content_dict["Score__project"] = int(
 943 |                         response_content_dict["Score__project"]
 944 |                     )
 945 |                 except:
 946 |                     response_content_dict["Score__project"] = -1
 947 | 
 948 |             except Exception as e:
 949 |                 response_content_dict = {
 950 |                     "Score__project": -1,
 951 |                     "Comments__project": "",
 952 |                     "Improvement__project": "",
 953 |                 }
 954 |                 print(e)
 955 | 
 956 |             # 3. Update PROJECTS
 957 |             PROJECT_i["Score__project"] = response_content_dict["Score__project"]
 958 |             PROJECT_i["Comments__project"] = response_content_dict["Comments__project"]
 959 |             PROJECT_i["Improvement__project"] = response_content_dict[
 960 |                 "Improvement__project"
 961 |             ]
 962 | 
 963 |         except Exception as exception:
 964 |             print(exception)
 965 | 
 966 |             PROJECT_i["Score__project"] = -1
 967 |             PROJECT_i["Comments__project"] = ""
 968 |             PROJECT_i["Improvement__project"] = ""
 969 | 
 970 |     return PROJECTS
 971 | 
 972 | 
 973 | ###############################################################################
 974 | #                           Evaluate the Resume
 975 | ###############################################################################
 976 | 
 977 | 
 978 | def Evaluate_the_Resume(llm, documents):
 979 |     try:
 980 |         st.info(
 981 |             f"**{get_current_time()}** \tEvaluate, outline and analyse \
 982 | the resume's top 3 strengths and top 3 weaknesses..."
 983 |         )
 984 |         print(
 985 |             f"**{get_current_time()}** \tEvaluate, outline and analyse \
 986 | the resume's top 3 strengths and top 3 weaknesses..."
 987 |         )
 988 | 
 989 |         prompt_template = PromptTemplate.from_template(PROMPT_EVALUATE_RESUME)
 990 |         prompt = prompt_template.format_prompt(
 991 |             text=documents, language=st.session_state.assistant_language
 992 |         ).text
 993 | 
 994 |         # Invoke LLM
 995 |         response = llm.invoke(prompt)
 996 |         response_content = response.content[
 997 |             response.content.find("{") : response.content.rfind("}") + 1
 998 |         ]
 999 |         try:
1000 |             RESUME_EVALUATION = json.loads(response_content)
1001 |         except Exception as e:
1002 |             print("[ERROR] json.loads returns error:", e)
1003 |             print("\n[INFO] Parse response content...\n")
1004 | 
1005 |             list_fields = ["resume_cv_overview", "top_3_strengths", "top_3_weaknesses"]
1006 |             list_rfind = [",\n", ",\n", "\n"]
1007 |             list_exclude_first_car = [True, True, True]
1008 |             RESUME_EVALUATION = ResponseContent_Parser(
1009 |                 response_content, list_fields, list_rfind, list_exclude_first_car
1010 |             )
1011 | 
1012 |     except Exception as error:
1013 |         RESUME_EVALUATION = {
1014 |             "resume_cv_overview": "unknown",
1015 |             "top_3_strengths": "unknown",
1016 |             "top_3_weaknesses": "unknown",
1017 |         }
1018 |         print(f"An error occured: {error}")
1019 | 
1020 |     return RESUME_EVALUATION
1021 | 
1022 | 
1023 | def get_section_scores(SCANNED_RESUME):
1024 |     """Output in a dictionary the scores of all the sections of the resume (summary, skills...)"""
1025 |     dict_scores = {}
1026 | 
1027 |     # Summary, Skills, EDUCATION
1028 |     dict_scores["ContactInfo"] = max(
1029 |         -1, SCANNED_RESUME["Contact__information"]["score__ContactInfo"]
1030 |     )
1031 |     dict_scores["summary"] = max(
1032 |         -1, SCANNED_RESUME["Summary__evaluation"]["score__summary"]
1033 |     )
1034 |     dict_scores["skills"] = max(
1035 |         -1, SCANNED_RESUME["Skills__evaluation"]["score__skills"]
1036 |     )
1037 |     dict_scores["education"] = max(
1038 |         -1, SCANNED_RESUME["Education__evaluation"]["score__edu"]
1039 |     )
1040 |     dict_scores["language"] = max(
1041 |         -1, SCANNED_RESUME["Languages__evaluation"]["score__language"]
1042 |     )
1043 | 
1044 |     dict_scores["certfication"] = max(
1045 |         -1, SCANNED_RESUME["Certif__evaluation"]["score__certif"]
1046 |     )
1047 | 
1048 |     # Work__experience: The score is the average of the scores of all the work experiences.
1049 |     scores = []
1050 |     for work_experience in SCANNED_RESUME["Work__experience"]:
1051 |         score = work_experience["Score__WorkExperience"]
1052 |         if score > -1:
1053 |             scores.append(score)
1054 |     try:
1055 |         dict_scores["work_experience"] = int(sum(scores) / len(scores))
1056 |     except:
1057 |         dict_scores["work_experience"] = 0
1058 | 
1059 |     # Projects: The score is the average of the scores of all projects.
1060 |     scores = []
1061 |     for project in SCANNED_RESUME["CV__Projects"]:
1062 |         score = project["Score__project"]
1063 |         if score > -1:
1064 |             scores.append(score)
1065 |     try:
1066 |         dict_scores["projects"] = int(sum(scores) / len(scores))
1067 |     except:
1068 |         dict_scores["projects"] = 0
1069 | 
1070 |     return dict_scores
1071 | 
1072 | 
1073 | ###############################################################################
1074 | #                           Put it all together
1075 | ###############################################################################
1076 | 
1077 | 
1078 | def resume_analyzer_main(llm, llm_creative, documents):
1079 |     """Put it all together: Extract, evaluate and improve all resume sections.
1080 |     Save the final results in a dictionary.
1081 |     """
1082 |     # 1. Extract Contact information: Name, Title, Location, Email,...
1083 |     CONTACT_INFORMATION = Extract_contact_information(llm, documents)
1084 | 
1085 |     # 2. Extract, evaluate and improve the Summary
1086 |     Summary_SECTION = Extract_Evaluate_Summary(llm, documents)
1087 | 
1088 |     # 3. Extract and evaluate education and language sections.
1089 |     Education_Language_sections = Extract_Education_Language(llm, documents)
1090 | 
1091 |     # 4. Extract and evaluate the SKILLS.
1092 |     SKILLS_and_CERTIF = Extract_Skills_and_Certifications(llm, documents)
1093 | 
1094 |     # 5. Extract Work Experience and Projects.
1095 |     PROFESSIONAL_EXPERIENCE = Extract_PROFESSIONAL_EXPERIENCE(llm, documents)
1096 | 
1097 |     # 6. EXTRACT WORK EXPERIENCE RESPONSIBILITIES.
1098 |     PROFESSIONAL_EXPERIENCE = Extract_Job_Responsibilities(
1099 |         llm, documents, PROFESSIONAL_EXPERIENCE
1100 |     )
1101 | 
1102 |     # 7. EXTRACT PROJECT DETAILS.
1103 |     PROFESSIONAL_EXPERIENCE = Extract_Project_Details(
1104 |         llm, documents, PROFESSIONAL_EXPERIENCE
1105 |     )
1106 | 
1107 |     # 8. Improve the quality of the work experience section.
1108 |     PROFESSIONAL_EXPERIENCE["Work__experience"] = improve_work_experience(
1109 |         WORK_EXPERIENCE=PROFESSIONAL_EXPERIENCE["Work__experience"], llm=llm_creative
1110 |     )
1111 | 
1112 |     # 9. Improve the quality of the project section.
1113 |     PROFESSIONAL_EXPERIENCE["CV__Projects"] = improve_projects(
1114 |         PROJECTS=PROFESSIONAL_EXPERIENCE["CV__Projects"], llm=llm_creative
1115 |     )
1116 | 
1117 |     # 10. Evaluate the Resume
1118 |     RESUME_EVALUATION = Evaluate_the_Resume(llm_creative, documents)
1119 | 
1120 |     # 11. Put it all together: create the SCANNED_RESUME dictionary
1121 |     SCANNED_RESUME = {}
1122 |     for dictionary in [
1123 |         CONTACT_INFORMATION,
1124 |         Summary_SECTION,
1125 |         Education_Language_sections,
1126 |         SKILLS_and_CERTIF,
1127 |         PROFESSIONAL_EXPERIENCE,
1128 |         RESUME_EVALUATION,
1129 |     ]:
1130 |         SCANNED_RESUME.update(dictionary)
1131 | 
1132 |     # 12. Save the Scanned resume
1133 |     try:
1134 |         now = (datetime.datetime.now()).strftime("%Y%m%d_%H%M%S")
1135 |         file_name = "results_" + now
1136 |         with open(f"./data/{file_name}.json", "w") as fp:
1137 |             json.dump(SCANNED_RESUME, fp)
1138 |     except:
1139 |         pass
1140 | 
1141 |     return SCANNED_RESUME
1142 | 


--------------------------------------------------------------------------------
/Streamlit_App/retrieval.py:
--------------------------------------------------------------------------------
  1 | # Streamlit
  2 | import streamlit as st
  3 | 
  4 | # document loader
  5 | from langchain_community.document_loaders import PDFMinerLoader
  6 | 
  7 | # text_splitter
  8 | from langchain.text_splitter import RecursiveCharacterTextSplitter
  9 | 
 10 | # Cohere reranker
 11 | from langchain.retrievers import ContextualCompressionRetriever
 12 | from langchain.retrievers.document_compressors import CohereRerank
 13 | from langchain_community.llms import Cohere
 14 | 
 15 | # Embeddings
 16 | from langchain_openai import OpenAIEmbeddings
 17 | from langchain_google_genai import GoogleGenerativeAIEmbeddings
 18 | 
 19 | # FAISS vector database
 20 | from langchain_community.vectorstores import FAISS
 21 | 
 22 | # Other libraries
 23 | import os, glob, datetime
 24 | from pathlib import Path
 25 | import tiktoken
 26 | import warnings
 27 | 
 28 | warnings.filterwarnings("ignore", category=FutureWarning)
 29 | 
 30 | 
 31 | # Data Directories: where temp files and vectorstores will be saved
 32 | from app_constants import TMP_DIR
 33 | 
 34 | 
 35 | def langchain_document_loader(file_path):
 36 |     """Load and split a PDF file in Langchain.
 37 |     Parameters:
 38 |         - file_path (str): path of the file.
 39 |     Output:
 40 |         - documents: list of Langchain Documents."""
 41 | 
 42 |     if file_path.endswith(".pdf"):
 43 |         loader = PDFMinerLoader(file_path=file_path)
 44 |     else:
 45 |         st.error("You can only upload .pdf files!")
 46 | 
 47 |     # 1. Load and split documents
 48 |     documents = loader.load_and_split()
 49 | 
 50 |     # 2. Update the metadata: add document number to metadata
 51 |     for i in range(len(documents)):
 52 |         documents[i].metadata = {
 53 |             "source": documents[i].metadata["source"],
 54 |             "doc_number": i,
 55 |         }
 56 | 
 57 |     return documents
 58 | 
 59 | 
 60 | def delte_temp_files():
 61 |     """delete temp files from TMP_DIR"""
 62 |     files = glob.glob(TMP_DIR.as_posix() + "/*")
 63 |     for f in files:
 64 |         try:
 65 |             os.remove(f)
 66 |         except:
 67 |             pass
 68 | 
 69 | 
 70 | def save_uploaded_file(uploaded_file):
 71 |     """Save the uploaded file (output of the Streamlit File Uploader widget) to TMP_DIR."""
 72 | 
 73 |     temp_file_path = ""
 74 |     try:
 75 |         temp_file_path = os.path.join(TMP_DIR.as_posix(), uploaded_file.name)
 76 |         with open(temp_file_path, "wb") as temp_file:
 77 |             temp_file.write(uploaded_file.read())
 78 |         return temp_file_path
 79 |     except Exception as error:
 80 |         st.error(f"An error occured: {error}")
 81 | 
 82 |     return temp_file_path
 83 | 
 84 | 
 85 | def tiktoken_tokens(documents, model="gpt-3.5-turbo-0125"):
 86 |     """Use tiktoken (tokeniser for OpenAI models) to return a list of token length per document."""
 87 | 
 88 |     # Get the encoding used by the model.
 89 |     encoding = tiktoken.encoding_for_model(model)
 90 | 
 91 |     # Calculate the token length of documents
 92 |     tokens_length = [len(encoding.encode(doc)) for doc in documents]
 93 | 
 94 |     return tokens_length
 95 | 
 96 | 
 97 | def select_embeddings_model(LLM_service="OpenAI"):
 98 |     """Select the Embeddings model: OpenAIEmbeddings or GoogleGenerativeAIEmbeddings."""
 99 | 
100 |     if LLM_service == "OpenAI":
101 |         embeddings = OpenAIEmbeddings(api_key=st.session_state.openai_api_key)
102 | 
103 |     if LLM_service == "Google":
104 |         embeddings = GoogleGenerativeAIEmbeddings(
105 |             model="models/embedding-001", google_api_key=st.session_state.google_api_key
106 |         )
107 | 
108 |     return embeddings
109 | 
110 | 
111 | def create_vectorstore(embeddings, documents):
112 |     """Create a Faiss vector database."""
113 |     vector_store = FAISS.from_documents(documents=documents, embedding=embeddings)
114 | 
115 |     return vector_store
116 | 
117 | 
118 | def Vectorstore_backed_retriever(
119 |     vectorstore, search_type="similarity", k=4, score_threshold=None
120 | ):
121 |     """create a vectorsore-backed retriever
122 |     Parameters:
123 |         search_type: Defines the type of search that the Retriever should perform.
124 |             Can be "similarity" (default), "mmr", or "similarity_score_threshold"
125 |         k: number of documents to return (Default: 4)
126 |         score_threshold: Minimum relevance threshold for similarity_score_threshold (default=None)
127 |     """
128 |     search_kwargs = {}
129 |     if k is not None:
130 |         search_kwargs["k"] = k
131 |     if score_threshold is not None:
132 |         search_kwargs["score_threshold"] = score_threshold
133 | 
134 |     retriever = vectorstore.as_retriever(
135 |         search_type=search_type, search_kwargs=search_kwargs
136 |     )
137 |     return retriever
138 | 
139 | 
140 | def CohereRerank_retriever(
141 |     base_retriever, cohere_api_key, cohere_model="rerank-multilingual-v2.0", top_n=4
142 | ):
143 |     """Build a ContextualCompressionRetriever using Cohere Rerank endpoint to reorder the results based on relevance.
144 |     Parameters:
145 |        base_retriever: a Vectorstore-backed retriever
146 |        cohere_api_key: the Cohere API key
147 |        cohere_model: The Cohere model can be either 'rerank-english-v2.0' or 'rerank-multilingual-v2.0', with the latter being the default.
148 |        top_n: top n results returned by Cohere rerank, default = 4.
149 |     """
150 | 
151 |     compressor = CohereRerank(
152 |         cohere_api_key=cohere_api_key, model=cohere_model, top_n=top_n
153 |     )
154 | 
155 |     retriever_Cohere = ContextualCompressionRetriever(
156 |         base_compressor=compressor, base_retriever=base_retriever
157 |     )
158 |     return retriever_Cohere
159 | 
160 | 
161 | def retrieval_main():
162 |     """Create a Langchain retrieval, which includes document loaders to upload the resume,
163 |     embeddings to create a numerical representation of the text, FAISS vector database to store the embeddings,
164 |     and CohereRerank retriever to find the most relevant documents.
165 |     """
166 | 
167 |     # 1. Delete old temp files from TMP directory.
168 |     delte_temp_files()
169 | 
170 |     if st.session_state.uploaded_file is not None:
171 |         # 2. Save uploaded_file to TMP directory.
172 |         saved_file_path = save_uploaded_file(st.session_state.uploaded_file)
173 | 
174 |         # 3. Load documents with Langchain loaders
175 |         documents = langchain_document_loader(saved_file_path)
176 |         st.session_state.documents = documents
177 | 
178 |         # 4. Embeddings
179 |         embeddings = select_embeddings_model(st.session_state.LLM_provider)
180 | 
181 |         # 5. Create a Faiss vector database
182 |         try:
183 |             st.session_state.vector_store = create_vectorstore(
184 |                 embeddings=embeddings, documents=documents
185 |             )
186 | 
187 |             # 6. Create CohereRerank retriever
188 |             base_retriever = Vectorstore_backed_retriever(
189 |                 st.session_state.vector_store, "similarity", k=min(4, len(documents))
190 |             )
191 |             st.session_state.retriever = CohereRerank_retriever(
192 |                 base_retriever=base_retriever,
193 |                 cohere_api_key=st.session_state.cohere_api_key,
194 |                 cohere_model="rerank-multilingual-v2.0",
195 |                 top_n=min(2, len(documents)),
196 |             )
197 |         except Exception as error:
198 |             st.error(f"An error occured:\n {error}")
199 | 
200 |     else:
201 |         st.error("Please upload a resume!")
202 |         st.stop()
203 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | aiohttp==3.9.3
  2 | aiosignal==1.3.1
  3 | altair==5.2.0
  4 | annotated-types==0.6.0
  5 | anyio==4.3.0
  6 | async-timeout==4.0.3
  7 | attrs==23.2.0
  8 | backoff==2.2.1
  9 | blinker==1.7.0
 10 | cachetools==5.3.3
 11 | certifi==2024.2.2
 12 | cffi==1.16.0
 13 | charset-normalizer==3.3.2
 14 | click==8.1.7
 15 | cohere==4.56
 16 | colorama==0.4.6
 17 | cryptography==42.0.5
 18 | dataclasses-json==0.6.4
 19 | distro==1.9.0
 20 | exceptiongroup==1.2.0
 21 | faiss-cpu==1.8.0
 22 | fastavro==1.9.4
 23 | frozenlist==1.4.1
 24 | gitdb==4.0.11
 25 | GitPython==3.1.42
 26 | google-ai-generativelanguage==0.4.0
 27 | google-api-core==2.17.1
 28 | google-auth==2.28.2
 29 | google-generativeai==0.3.2
 30 | googleapis-common-protos==1.63.0
 31 | greenlet==3.0.3
 32 | grpcio==1.62.1
 33 | grpcio-status==1.62.1
 34 | h11==0.14.0
 35 | httpcore==1.0.4
 36 | httpx==0.27.0
 37 | idna==3.6
 38 | importlib-metadata==6.11.0
 39 | Jinja2==3.1.3
 40 | jsonpatch==1.33
 41 | jsonpointer==2.4
 42 | jsonschema==4.21.1
 43 | jsonschema-specifications==2023.12.1
 44 | langchain==0.1.12
 45 | langchain-community==0.0.28
 46 | langchain-core==0.1.32
 47 | langchain-google-genai==0.0.6
 48 | langchain-openai==0.0.2.post1
 49 | langchain-text-splitters==0.0.1
 50 | langsmith==0.1.28
 51 | Markdown==3.6
 52 | markdown-it-py==3.0.0
 53 | MarkupSafe==2.1.5
 54 | marshmallow==3.21.1
 55 | mdurl==0.1.2
 56 | multidict==6.0.5
 57 | mypy-extensions==1.0.0
 58 | numpy==1.26.4
 59 | openai==1.14.1
 60 | orjson==3.9.15
 61 | packaging==23.2
 62 | pandas==2.2.1
 63 | pdfminer.six==20231228
 64 | pillow==10.2.0
 65 | proto-plus==1.23.0
 66 | protobuf==4.25.3
 67 | pyarrow==15.0.2
 68 | pyasn1==0.5.1
 69 | pyasn1-modules==0.3.0
 70 | pycparser==2.21
 71 | pydantic==2.6.4
 72 | pydantic_core==2.16.3
 73 | pydeck==0.8.1b0
 74 | Pygments==2.17.2
 75 | python-dateutil==2.9.0.post0
 76 | python-dotenv==1.0.1
 77 | pytz==2024.1
 78 | PyYAML==6.0.1
 79 | referencing==0.34.0
 80 | regex==2023.12.25
 81 | requests==2.31.0
 82 | rich==13.7.1
 83 | rpds-py==0.18.0
 84 | rsa==4.9
 85 | six==1.16.0
 86 | smmap==5.0.1
 87 | sniffio==1.3.1
 88 | SQLAlchemy==2.0.28
 89 | streamlit==1.28.0
 90 | tenacity==8.2.3
 91 | tiktoken==0.5.2
 92 | toml==0.10.2
 93 | toolz==0.12.1
 94 | tornado==6.4
 95 | tqdm==4.66.2
 96 | typing-inspect==0.9.0
 97 | typing_extensions==4.10.0
 98 | tzdata==2024.1
 99 | tzlocal==5.2
100 | urllib3==2.2.1
101 | validators==0.22.0
102 | watchdog==4.0.0
103 | yarl==1.9.4
104 | zipp==3.18.1
105 | 


--------------------------------------------------------------------------------