├── Notebooks ├── data │ └── resume │ │ └── ChatGPT_dataScientist.pdf ├── keys.env └── resume_scanner.ipynb ├── README.md ├── Streamlit_App ├── app.py ├── app_constants.py ├── app_display_results.py ├── app_sidebar.py ├── data │ └── Images │ │ ├── Education.png │ │ ├── Language.png │ │ ├── Leonardo_AI.jpg │ │ ├── app.png │ │ ├── contact_information.png │ │ ├── scores.png │ │ ├── top_3_strengths.png │ │ └── work_experience.png ├── keys.env ├── llm_functions.py ├── resume_analyzer.py └── retrieval.py └── requirements.txt /Notebooks/data/resume/ChatGPT_dataScientist.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Notebooks/data/resume/ChatGPT_dataScientist.pdf -------------------------------------------------------------------------------- /Notebooks/keys.env: -------------------------------------------------------------------------------- 1 | api_key_openai = "Your_API_key" 2 | api_key_google = "Your_API_key" 3 | api_key_cohere = "Your_API_key" -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 🔎 Resume scanner: 🚀 Leverage the power of LLM to improve your resume 2 | 3 | ### 🚀 Build a Streamlit application powered by Langchain, OpenAI and Google Generative AI 4 | 5 |
6 | 7 |
Image generated by Leonardo.ai
8 |
9 | 10 | ### Table of Contents 11 | 12 | 1. [Project Overview](#overview) 13 | 2. [Installation](#installation) 14 | 3. [File Descriptions](#files) 15 | 4. [Instructions](#instructions) 16 | 5. [Screenshots](#screenshots) 17 | 18 | ## Project Overview 19 | 20 | The aim of this project is to build a WEB application in [Streamlit](https://streamlit.io/) that will scan and improve a resume using instruction-tuned Large Language Models (LLMs). 21 | 22 | We leveraged the power of LLMs, specifically Chat GPT from [OpenAI](https://platform.openai.com/overview) and Gemini-pro from [Google](https://ai.google.dev/?hl=en), to extract, assess, and enhance resumes. 23 | 24 | We used [Langchain](https://python.langchain.com/docs/get_started/introduction), prompt engineering and retrieval augmented generation techniques to complete these steps. 25 | 26 | ## Installation 27 | 28 | This project requires Python 3 and the following Python libraries installed: 29 | 30 | `streamlit`, `langchain` ,`langchain-openai`, `langchain-google-genai`, `faiss-cpu`, `tiktoken`, `python-dotenv`, `pdfminer`, `markdown` 31 | 32 | The full list of requirements can be found in `requirements.txt` 33 | 34 | ## File Descriptions 35 | 36 | - **Streamlit_App** folder: contains the Streamlit application. 37 | 38 | - `requirements.txt`: contains the required packages for installation. 39 | - `keys.env`: Your OpenAI, Gemini, and Cohere API keys are stored here. 40 | - `llm_functions.py`: reads LLM API keys from keys.env and instantiates the LLM in Langchain. 41 | - `retrieval.py`: the script used to create a Langchain retrieval, including document loaders, embeddings, vector stores, and retrievers. 42 | - `app_constants.py`: contains templates for creating LLM prompts. 43 | - `app_sidebar.py`: the sidebar is where you can choose the LLM model and its parameters, such as temperature and top_p values, and enter your API keys. 44 | - `resume_analyzer.py`: this file contains the functions used to extract, assess, and improve each section of the resume using LLM. It is the **core** of the application. 45 | - `pp_display_results.py`: the script used to display resume sections, assessments, scores, and improved texts. 46 | - `app.py`: It's the main script of the app. It calls all the scripts and is used to run the Streamlit application. 47 | 48 | - **Notebooks** folder: contains the project's notebook. 49 | 50 | ## Instructions 51 | 52 | To run the app locally: 53 | 54 | 1. Create a virtual environment: `python -m venv virtualenv` 55 | 2. Activate the virtual environment : 56 | 57 | **Windows:** `.\virtualenv\Scripts\activate` 58 | 59 | **Linux:** `source virtualenv/bin/activate` 60 | 61 | 3. Install the required dependencies `pip install -r requirements.txt` 62 | 4. Add your OpenIA, Gemini, and Cohere API keys to the `keys.env` file. You can get your API keys from their respective websites. 63 | 64 | > - **OpenAI** API key: [Get an API key](https://platform.openai.com/account/api-keys) 65 | > - **Google** API key: [Get an API key](https://makersuite.google.com/app/apikey) 66 | > - **Cohere** API key: [Get an API key](https://dashboard.cohere.com/api-keys) 67 | 68 | 5. Start the app: `streamlit run ./Streamlit_App/app.py` 69 | 6. Select the LLM provider (either OpenAI or Google Generative AI) from the sidebar. Then, choose a model (GPT-3.5, GPT-4 or Gemini-pro) and adjust its parameters. 70 | 7. Use the file uploader widget to upload your resume in PDF format. 71 | 8. 🚀 To analyze and improve your resume, simply click the 'Analyze resume' button located in the main panel. 72 | 73 | ## Screenshots 74 | 75 | Here is a screenshot of the application. 76 | 77 |
78 | 79 |
80 |
81 | The results of the resume analysis and improvement are shown below. 82 | 83 | First, the resume's overview, top 3 strengths, and top 3 weaknesses are displayed. 84 | 85 |
86 | 87 |
88 |
89 | The scores are then displayed to give a general indication of the resume's quality. 90 | The resume is evaluated based on eight sections, each scored out of 100: contact information, summary, work experience, skills, education, language, projects, and certifications. 91 | 92 |
93 | 94 |
95 |
96 | Finally, the analysis of each section is presented in a st.expander. For instance, here is how the work experience is displayed. 97 | 98 |
99 | 100 |
101 | -------------------------------------------------------------------------------- /Streamlit_App/app.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from app_sidebar import sidebar 3 | from llm_functions import instantiate_LLM_main, get_api_keys_from_local_env 4 | from retrieval import retrieval_main 5 | from resume_analyzer import resume_analyzer_main 6 | from app_display_results import display_resume_analysis 7 | 8 | 9 | def main(): 10 | """Analyze the uploaded resume.""" 11 | 12 | if st.button("Analyze resume"): 13 | with st.spinner("Please wait..."): 14 | try: 15 | # 1. Create the Langchain retrieval 16 | retrieval_main() 17 | 18 | # 2. Instantiate a deterministic LLM with a temperature of 0.0. 19 | st.session_state.llm = instantiate_LLM_main(temperature=0.0, top_p=0.95) 20 | 21 | # 3. Instantiate LLM with temperature >0.1 for creativity. 22 | st.session_state.llm_creative = instantiate_LLM_main( 23 | temperature=st.session_state.temperature, 24 | top_p=st.session_state.top_p, 25 | ) 26 | 27 | # 4. Analyze the resume 28 | st.session_state.SCANNED_RESUME = resume_analyzer_main( 29 | llm=st.session_state.llm, 30 | llm_creative=st.session_state.llm_creative, 31 | documents=st.session_state.documents, 32 | ) 33 | 34 | # 5. Display results 35 | display_resume_analysis(st.session_state.SCANNED_RESUME) 36 | 37 | except Exception as e: 38 | st.error(f"An error occured: {e}") 39 | 40 | 41 | if __name__ == "__main__": 42 | # 1. Set app configuration 43 | st.set_page_config(page_title="Resume Scanner", page_icon="🚀") 44 | st.title("🔎 Resume Scanner") 45 | 46 | # 2. Get API keys from local "keys.env" file 47 | openai_api_key, google_api_key, cohere_api_key = get_api_keys_from_local_env() 48 | 49 | # 3. Create the sidebar 50 | sidebar(openai_api_key, google_api_key, cohere_api_key) 51 | 52 | # 4. File uploader widget 53 | st.session_state.uploaded_file = st.file_uploader( 54 | label="**Upload Resume**", 55 | accept_multiple_files=False, 56 | type=(["pdf"]), 57 | ) 58 | 59 | # 5. Analyze the uploaded resume 60 | main() 61 | -------------------------------------------------------------------------------- /Streamlit_App/app_constants.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | import os 3 | 4 | # 1. Constants 5 | 6 | list_LLM_providers = [":rainbow[**OpenAI**]", "**Google Generative AI**"] 7 | 8 | list_Assistant_Languages = [ 9 | "english", 10 | "french", 11 | "spanish", 12 | "german", 13 | "russian", 14 | "chinese", 15 | "arabic", 16 | "portuguese", 17 | "italian", 18 | "Japanese", 19 | ] 20 | 21 | TMP_DIR = Path(__file__).resolve().parent.joinpath("data", "tmp") 22 | 23 | 24 | # 2. PROMPT TEMPLATES 25 | 26 | templates = {} 27 | 28 | # 2.1 Contact information Section 29 | templates[ 30 | "Contact__information" 31 | ] = """Extract and evaluate the contact information. \ 32 | Output a dictionary with the following keys: 33 | - candidate__name 34 | - candidate__title 35 | - candidate__location 36 | - candidate__email 37 | - candidate__phone 38 | - candidate__social_media: Extract a list of all social media profiles, blogs or websites. 39 | - evaluation__ContactInfo: Evaluate in {language} the contact information. 40 | - score__ContactInfo: Rate the contact information by giving a score (integer) from 0 to 100. 41 | """ 42 | 43 | # 2.2. Summary Section 44 | templates[ 45 | "CV__summary" 46 | ] = """Extract the summary and/or objective section. This is a separate section of the resume. \ 47 | If the resume doed not contain a summary and/or objective section, then simply write "unknown".""" 48 | 49 | # 2.3. WORK Experience Section 50 | 51 | templates[ 52 | "Work__experience" 53 | ] = """Extract all work experiences. For each work experience: 54 | 1. Extract the job title. 55 | 2. Extract the company. 56 | 3. Extract the start date and output it in the following format: \ 57 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month). 58 | 4. Extract the end date and output it in the following format: \ 59 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month). 60 | 5. Create a dictionary with the following keys: job__title, job__company, job__start_date, job__end_date. 61 | 62 | Format your response as a list of dictionaries. 63 | """ 64 | 65 | # 2.4. Projects Section 66 | templates[ 67 | "CV__Projects" 68 | ] = """Include any side projects outside the work experience. 69 | For each project: 70 | 1. Extract the title of the project. 71 | 2. Extract the start date and output it in the following format: \ 72 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month). 73 | 3. Extract the end date and output it in the following format: \ 74 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month). 75 | 4. Create a dictionary with the following keys: project__title, project__start_date, project__end_date. 76 | 77 | Format your response as a list of dictionaries. 78 | """ 79 | 80 | # 2.5. Education Section 81 | templates[ 82 | "CV__Education" 83 | ] = """Extract all educational background and academic achievements. 84 | For each education achievement: 85 | 1. Extract the name of the college or the high school. 86 | 2. Extract the earned degree. Honors and achievements are included. 87 | 3. Extract the start date and output it in the following format: \ 88 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month). 89 | 4. Extract the end date and output it in the following format: \ 90 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month). 91 | 5. Create a dictionary with the following keys: edu__college, edu__degree, edu__start_date, edu__end_date. 92 | 93 | Format your response as a list of dictionaries. 94 | """ 95 | 96 | templates[ 97 | "Education__evaluation" 98 | ] = """Your task is to perform the following actions: 99 | 1. Rate the quality of the Education section by giving an integer score from 0 to 100. 100 | 2. Evaluate (in three sentences and in {language}) the quality of the Education section. 101 | 3. Format your response as a dictionary with the following keys: score__edu, evaluation__edu. 102 | """ 103 | 104 | # 2.6. Skills 105 | templates[ 106 | "candidate__skills" 107 | ] = """Extract the list of soft and hard skills from the skill section. Output a list. 108 | The skill section is a separate section. 109 | """ 110 | 111 | templates[ 112 | "Skills__evaluation" 113 | ] = """Your task is to perform the following actions: 114 | 1. Rate the quality of the Skills section by giving an integer score from 0 to 100. 115 | 2. Evaluate (in three sentences and in {language}) the quality of the Skills section. 116 | 3. Format your response as a dictionary with the following keys: score__skills, evaluation__skills. 117 | """ 118 | 119 | # 2.7. Languages 120 | templates[ 121 | "CV__Languages" 122 | ] = """Extract all the languages that the candidate can speak. For each language: 123 | 1. Extract the language. 124 | 2. Extract the fluency. If the fluency is not available, then simply write "unknown". 125 | 3. Create a dictionary with the following keys: spoken__language, language__fluency. 126 | 127 | Format your response as a list of dictionaries. 128 | """ 129 | 130 | templates[ 131 | "Languages__evaluation" 132 | ] = """ Your task is to perform the following actions: 133 | 1. Rate the quality of the language section by giving an integer score from 0 to 100. 134 | 2. Evaluate (in three sentences and in {language}) the quality of the language section. 135 | 3. Format your response as a dictionary with the following keys: score__language,evaluation__language. 136 | """ 137 | 138 | # 2.8. Certifications 139 | templates[ 140 | "CV__Certifications" 141 | ] = """Extraction of all certificates other than education background and academic achievements. \ 142 | For each certificate: 143 | 1. Extract the title of the certification. 144 | 2. Extract the name of the organization or institution that issues the certification. 145 | 3. Extract the date of certification and output it in the following format: \ 146 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month). 147 | 4. Extract the certification expiry date and output it in the following format: \ 148 | YYYY/MM/DD or YYYY/MM or YYYY (depending on the availability of the day and month). 149 | 5. Extract any other information listed about the certification. if not found, then simply write "unknown". 150 | 6. Create a dictionary with the following keys: certif__title, certif__organization, certif__date, certif__expiry_date, certif__details. 151 | 152 | Format your response as a list of dictionaries. 153 | """ 154 | 155 | templates[ 156 | "Certif__evaluation" 157 | ] = """Your task is to perform the following actions: 158 | 1. Rate the certifications by giving an integer score from 0 to 100. 159 | 2. Evaluate (in three sentences and in {language}) the certifications and the quality of the text. 160 | 3. Format your response as a dictionary with the following keys: score__certif,evaluation__certif. 161 | """ 162 | 163 | 164 | # 3. PROMPTS 165 | 166 | PROMPT_IMPROVE_SUMMARY = """Your are given a resume (delimited by ) \ 167 | and a summary (delimited by ). 168 | 1. In {language}, evaluate the summary (format and content) . 169 | 2. Rate the summary by giving an integer score from 0 to 100. \ 170 | If the summary is "unknown", the score is 0. 171 | 3. In {language}, strengthen the summary. The summary should not exceed 5 sentences. \ 172 | If the summary is "unknown", generate a strong summary in {language} with no more than 5 sentences. \ 173 | Please include: years of experience, top skills and experiences, some of the biggest achievements, and finally an attractive objective. 174 | 4. Format your response as a dictionary with the following keys: evaluation__summary, score__summary, CV__summary_enhanced. 175 | 176 | 177 | {summary} 178 | 179 | ------ 180 | 181 | {resume} 182 | 183 | """ 184 | 185 | PROMPT_IMPROVE_WORK_EXPERIENCE = """you are given a work experience text delimited by triple backticks. 186 | 1. Rate the quality of the work experience text by giving an integer score from 0 to 100. 187 | 2. Suggest in {language} how to make the work experience text better and stronger. 188 | 3. Strengthen the work experience text to make it more appealing to a recruiter in {language}. \ 189 | Provide additional details on responsibilities and quantify results for each bullet point. \ 190 | Format your text as a string in {language}. 191 | 4. Format your response as a dictionary with the following keys: "Score__WorkExperience", "Comments__WorkExperience" and "Improvement__WorkExperience". 192 | 193 | Work experience text: ```{text}``` 194 | """ 195 | 196 | PROMPT_IMPROVE_PROJECT = """you are given a project text delimited by triple backticks. 197 | 1. Rate the quality of the project text by giving an integer score from 0 to 100. 198 | 2. Suggest in {language} how to make the project text better and stronger. 199 | 3. Strengthen the project text to make it more appealing to a recruiter in {language}, \ 200 | including the problem, the approach taken, the tools used and quantifiable results. \ 201 | Format your text as a string in {language}. 202 | 4. Format your response as a dictionary with the following keys: Score__project, Comments__project, Improvement__project. 203 | 204 | project text: ```{text}``` 205 | """ 206 | 207 | PROMPT_EVALUATE_RESUME = """You are given a resume delimited by triple backticks. 208 | 1. Provide an overview of the resume in {language}. 209 | 2. Provide a comprehensive analysis of the three main strengths of the resume in {language}. \ 210 | Format the top 3 strengths as string containg three bullet points. 211 | 3. Provide a comprehensive analysis of the three main weaknesses of the resume in {language}. \ 212 | Format the top 3 weaknesses as string containg three bullet points. 213 | 4. Format your response as a dictionary with the following keys: resume_cv_overview, top_3_strengths, top_3_weaknesses. 214 | 215 | The strengths and weaknesses lie in the format, style and content of the resume. 216 | 217 | Resume: ```{text}``` 218 | """ 219 | -------------------------------------------------------------------------------- /Streamlit_App/app_display_results.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import markdown 3 | from resume_analyzer import get_section_scores 4 | 5 | 6 | def custom_markdown( 7 | text, 8 | html_tag="p", 9 | bg_color="white", 10 | color="black", 11 | font_size=None, 12 | text_align="left", 13 | ): 14 | """Customise markdown by specifying custom background colour, text colour, font size, and text alignment..""" 15 | 16 | style = f'style="background-color:{bg_color};color:{color};font-size:{font_size}px; \ 17 | text-align: {text_align};padding: 25px 25px 25px 25px;border-radius:2%;"' 18 | 19 | body = f"<{html_tag} {style}> {text}" 20 | 21 | st.markdown(body, unsafe_allow_html=True) 22 | st.write("") 23 | 24 | 25 | def set_background_color(score): 26 | """Set background color based on score.""" 27 | if score >= 80: 28 | bg_color = "#D4F1F4" 29 | elif score >= 60: 30 | bg_color = "#ededed" 31 | else: 32 | bg_color = "#fbcccd" 33 | return bg_color 34 | 35 | 36 | def format_object_to_string(object, separator="\n- "): 37 | """Convert object (e.g. list) to string.""" 38 | if not isinstance(object, str): 39 | return separator + separator.join(object) 40 | else: 41 | return object 42 | 43 | 44 | def markdown_to_html(md_text): 45 | """Convert Markdown to html.""" 46 | html_txt = ( 47 | markdown.markdown(md_text.replace("\\n", "\n").replace("- ", "\n- ")) 48 | .replace("\n", "") 49 | .replace('\\"', '"') 50 | ) 51 | return html_txt 52 | 53 | 54 | def display_scores_in_columns(section_names: list, scores: list, column_width: list): 55 | """Display the scores of the sections in side-by-side columns. 56 | The column_width variable sets the width of the columns.""" 57 | columns = st.columns(column_width) 58 | for i, column in enumerate(columns): 59 | with column: 60 | custom_markdown( 61 | text=f"{section_names[i]}

{scores[i]}
", 62 | bg_color=set_background_color(scores[i]), 63 | text_align="center", 64 | ) 65 | 66 | 67 | def display_section_results( 68 | expander_label: str, 69 | expander_header_fields: list, 70 | expander_header_links: list, 71 | score: int, 72 | section_original_text_header: str, 73 | section_original_text: list, 74 | original_text_bullet_points: bool, 75 | section_assessment, 76 | section_improved_text, 77 | ): 78 | if score > -1: 79 | expander_label += f"- 🎯 **{score}**/100" 80 | with st.expander(expander_label): 81 | st.write("") 82 | 83 | # 1. Display the header fields (for example, the company and dates of the work experience) 84 | if expander_header_fields is not None: 85 | for field in expander_header_fields: 86 | if not isinstance(field, list): 87 | st.markdown(field) 88 | else: 89 | # display fields in side-by-side columns. 90 | columns = st.columns(len(field)) 91 | for i, column in enumerate(columns): 92 | with column: 93 | st.markdown(field[i]) 94 | 95 | # 2. View the links (examle social media blogs and web sites) 96 | if expander_header_links is not None: 97 | if not isinstance(expander_header_links, list): 98 | link = expander_header_links.strip().replace('"', "") 99 | if not link.startswith("http"): 100 | link = "https://" + link 101 | st.markdown( 102 | f"""🌐 {link}""", 103 | unsafe_allow_html=True, 104 | ) 105 | else: 106 | for link in expander_header_links: 107 | if not link.startswith("http"): 108 | link = "https://" + link 109 | st.markdown( 110 | f"""🌐 {link}""", 111 | unsafe_allow_html=True, 112 | ) 113 | 114 | # 3. View the original text 115 | if section_original_text_header is not None: 116 | st.write("") 117 | st.markdown(section_original_text_header) 118 | if section_original_text is not None: 119 | for text in section_original_text: 120 | if original_text_bullet_points: 121 | st.markdown(f"- {text}") 122 | else: 123 | st.markdown(text) 124 | 125 | # 4. Display of section score 126 | st.divider() 127 | custom_markdown( 128 | html_tag="h4", 129 | text=f"🎯 Score: {score}/100", 130 | ) 131 | 132 | # 5. Display the assessmnet 133 | bg_color = set_background_color(score) 134 | assessment = markdown_to_html(format_object_to_string(section_assessment)) 135 | custom_markdown( 136 | text=f"🔎 Assessment:

{assessment}", 137 | html_tag="div", 138 | bg_color=bg_color, 139 | ) 140 | 141 | # 6. View the improved text 142 | if section_improved_text is not None: 143 | improved_text = markdown_to_html( 144 | format_object_to_string(section_improved_text) 145 | ) 146 | custom_markdown( 147 | text=f"🚀 Improvement:

{improved_text}", 148 | html_tag="div", 149 | bg_color="#ededed", 150 | ) 151 | st.write("") 152 | 153 | 154 | def display_assessment(score, section_assessment): 155 | """Display the section score and the assessment.""" 156 | # 1. View section score 157 | custom_markdown( 158 | html_tag="h4", 159 | text=f"🎯 Score: {score}/100", 160 | ) 161 | # 2. Display the assessmnet 162 | bg_color = set_background_color(score) 163 | assessment = markdown_to_html(format_object_to_string(section_assessment)) 164 | custom_markdown( 165 | text=f"🔎 Assessment:

{assessment}", 166 | html_tag="div", 167 | bg_color=bg_color, 168 | ) 169 | st.write("") 170 | 171 | 172 | def display_resume_analysis(SCANNED_RESUME): 173 | """Display the resume analysis.""" 174 | try: 175 | ############################################################### 176 | # Overview, Top 3 strengths and Top 3 weaknesses 177 | ############################################################### 178 | st.divider() 179 | st.header("🎯 Overview and scores") 180 | 181 | list_task = ["Overview", "Top 3 strengths", "Top 3 weaknesses"] 182 | list_content = [ 183 | SCANNED_RESUME["resume_cv_overview"], 184 | SCANNED_RESUME["top_3_strengths"], 185 | SCANNED_RESUME["top_3_weaknesses"], 186 | ] 187 | list_colors = ["#ededed", "#D4F1F4", "#fbcccd"] 188 | 189 | for i in range(3): 190 | st.write("") 191 | st.subheader(list_task[i]) 192 | custom_markdown( 193 | html_tag="div", 194 | text=markdown_to_html(format_object_to_string(list_content[i])), 195 | bg_color=list_colors[i], 196 | ) 197 | 198 | ############################################################### 199 | # Display scores 200 | ############################################################### 201 | st.write("") 202 | st.subheader("Scores over 100") 203 | st.write("") 204 | 205 | dict_scores = get_section_scores(SCANNED_RESUME) 206 | 207 | display_scores_in_columns( 208 | section_names=[ 209 | "👤 Contact", 210 | "📋 Summary", 211 | "📋 Work Experience", 212 | "💪 Skills", 213 | ], 214 | scores=[ 215 | dict_scores.get(key) 216 | for key in ["ContactInfo", "summary", "work_experience", "skills"] 217 | ], 218 | column_width=[2.25, 2.25, 2.75, 2.25], 219 | ) 220 | 221 | display_scores_in_columns( 222 | section_names=[ 223 | "🎓 Education", 224 | "🗣 Language", 225 | "📋 Projects", 226 | "🏅 Certifications", 227 | ], 228 | scores=[ 229 | dict_scores.get(key) 230 | for key in ["education", "language", "projects", "certfication"] 231 | ], 232 | column_width=[2.5, 2.5, 2.5, 2.75], 233 | ) 234 | 235 | ################################################################################## 236 | # Detailed analysis 237 | ################################################################################## 238 | st.divider() 239 | st.header("🔎 Detailed Analysis") 240 | 241 | # 1. Contact Information 242 | 243 | st.write("") 244 | st.subheader(f"Contact Information - 🎯 **{dict_scores['ContactInfo']}**/100") 245 | display_section_results( 246 | expander_label="🛈 Contact Information", 247 | expander_header_fields=[ 248 | f"**👤 {SCANNED_RESUME['Contact__information']['candidate__name']}**", 249 | f"{SCANNED_RESUME['Contact__information']['candidate__title']}", 250 | "", 251 | [ 252 | f"**📌 Location:** {SCANNED_RESUME['Contact__information']['candidate__location']}", 253 | f"**:telephone_receiver::** {SCANNED_RESUME['Contact__information']['candidate__phone']}", 254 | ], 255 | "", 256 | "**Email and Social media:**", 257 | f"**:e-mail:** {SCANNED_RESUME['Contact__information']['candidate__email']}", 258 | ], 259 | expander_header_links=SCANNED_RESUME["Contact__information"][ 260 | "candidate__social_media" 261 | ], 262 | score=dict_scores["ContactInfo"], 263 | section_original_text_header=None, 264 | section_original_text=None, 265 | original_text_bullet_points=False, 266 | section_assessment=SCANNED_RESUME["Contact__information"][ 267 | "evaluation__ContactInfo" 268 | ], 269 | section_improved_text=None, 270 | ) 271 | 272 | # 2. Summary 273 | 274 | st.write("") 275 | st.write("") 276 | st.subheader(f"Summary - 🎯 **{dict_scores['summary']}**/100") 277 | display_section_results( 278 | expander_label="Summary", 279 | expander_header_fields=[], 280 | expander_header_links=None, 281 | score=dict_scores["summary"], 282 | section_original_text_header="**📋 Summary:**", 283 | section_original_text=[SCANNED_RESUME["CV__summary"]], 284 | original_text_bullet_points=False, 285 | section_assessment=SCANNED_RESUME["Summary__evaluation"][ 286 | "evaluation__summary" 287 | ], 288 | section_improved_text=SCANNED_RESUME["Summary__evaluation"][ 289 | "CV__summary_enhanced" 290 | ], 291 | ) 292 | 293 | # 3. Work Experience 294 | 295 | st.write("") 296 | st.write("") 297 | st.subheader(f"work experience - 🎯 **{dict_scores['work_experience']}**/100") 298 | 299 | if len(SCANNED_RESUME["Work__experience"]) == 0: 300 | st.info("No work experience results.") 301 | else: 302 | for work_experience in SCANNED_RESUME["Work__experience"]: 303 | display_section_results( 304 | expander_label=f"{work_experience['job__title']}", 305 | expander_header_fields=[ 306 | [ 307 | f"**Company:**\n {work_experience['job__company']}", 308 | f"**📅**\n {work_experience['job__start_date']} - {work_experience['job__end_date']}", 309 | ] 310 | ], 311 | expander_header_links=None, 312 | score=work_experience["Score__WorkExperience"], 313 | section_original_text_header="**📋 Responsibilities:**", 314 | section_original_text=list( 315 | work_experience["work__duties"].values() 316 | ), 317 | original_text_bullet_points=True, 318 | section_assessment=work_experience["Comments__WorkExperience"], 319 | section_improved_text=work_experience[ 320 | "Improvement__WorkExperience" 321 | ], 322 | ) 323 | 324 | # 4. Skills 325 | 326 | st.write("") 327 | st.write("") 328 | st.subheader(f"Skills - 🎯 **{dict_scores['skills']}**/100") 329 | display_section_results( 330 | expander_label="💪 Skills", 331 | expander_header_fields=None, 332 | expander_header_links=None, 333 | score=dict_scores["skills"], 334 | section_original_text_header=None, 335 | section_original_text=[SCANNED_RESUME["candidate__skills"]], 336 | original_text_bullet_points=True, 337 | section_assessment=SCANNED_RESUME["Skills__evaluation"][ 338 | "evaluation__skills" 339 | ], 340 | section_improved_text=None, 341 | ) 342 | 343 | # 5. Education 344 | 345 | st.write("") 346 | st.write("") 347 | st.subheader(f"Education - 🎯 **{dict_scores['education']}**/100") 348 | with st.expander(f"🎓 Educational background and academic achievements."): 349 | st.write("") 350 | list_education = SCANNED_RESUME["CV__Education"] 351 | if not isinstance(list_education, list): 352 | st.markdown(f"- {list_education}") 353 | else: 354 | for edu in list_education: 355 | col1, col2 = st.columns([6, 4]) 356 | with col1: 357 | st.markdown(f"**🎓 Degree:** {edu['edu__degree']}") 358 | with col2: 359 | st.markdown( 360 | f"**📅** {edu['edu__start_date']} - {edu['edu__end_date']}" 361 | ) 362 | st.markdown(f"**🏛️** {edu['edu__college']}") 363 | st.divider() 364 | 365 | display_assessment( 366 | score=dict_scores["education"], 367 | section_assessment=SCANNED_RESUME["Education__evaluation"][ 368 | "evaluation__edu" 369 | ], 370 | ) 371 | 372 | # 6. Language (Optional section) 373 | 374 | st.divider() 375 | st.subheader(f"Language - 🎯 **{dict_scores['language']}**/100") 376 | languages = [] 377 | for language in SCANNED_RESUME["CV__Languages"]: 378 | languages.append( 379 | f"**🗣 {language['spoken__language']}** : {language['language__fluency']}" 380 | ) 381 | display_section_results( 382 | expander_label="🗣 Language", 383 | expander_header_fields=None, 384 | expander_header_links=None, 385 | score=dict_scores["language"], 386 | section_original_text_header=None, 387 | section_original_text=languages, 388 | original_text_bullet_points=False, 389 | section_assessment=SCANNED_RESUME["Languages__evaluation"][ 390 | "evaluation__language" 391 | ], 392 | section_improved_text=None, 393 | ) 394 | 395 | # 7. CERTIFICATIONS (optional section) 396 | 397 | st.write("") 398 | st.write("") 399 | st.subheader(f"Certifications - 🎯 **{dict_scores['certfication']}**/100") 400 | with st.expander("🏅 Certifications"): 401 | st.write("") 402 | list_certifs = SCANNED_RESUME["CV__Certifications"] 403 | if not isinstance(list_certifs, list): 404 | st.markdown(f"- {list_certifs}") 405 | else: 406 | for certif in list_certifs: 407 | col1, col2 = st.columns([6, 4]) 408 | with col1: 409 | st.markdown(f"**🏅 Title:** {certif['certif__title']}") 410 | with col2: 411 | st.markdown(f"**📅** {certif['certif__date']} ") 412 | st.markdown(f"**🏛️** {certif['certif__organization']}") 413 | 414 | if certif["certif__expiry_date"].lower() != "unknown": 415 | st.markdown( 416 | f"**📅 Expiry date:** {certif['certif__expiry_date']}" 417 | ) 418 | if certif["certif__details"].lower() != "unknown": 419 | st.write("") 420 | st.markdown(f"{certif['certif__details']}") 421 | st.divider() 422 | 423 | display_assessment( 424 | score=dict_scores["certfication"], 425 | section_assessment=SCANNED_RESUME["Certif__evaluation"][ 426 | "evaluation__certif" 427 | ], 428 | ) 429 | 430 | # 8. Projects (Optional section) 431 | 432 | st.write("") 433 | st.write("") 434 | st.subheader(f"Projects - 🎯 **{dict_scores['projects']}**/100") 435 | if len(SCANNED_RESUME["CV__Projects"]) == 0: 436 | st.info("No projects found.") 437 | else: 438 | for project in SCANNED_RESUME["CV__Projects"]: 439 | display_section_results( 440 | expander_label=f"{project['project__title']}", 441 | expander_header_fields=[ 442 | f"**📅**\n {project['project__start_date']} - {project['project__end_date']}" 443 | ], 444 | expander_header_links=None, 445 | score=project["Score__project"], 446 | section_original_text_header="**📋 Project details:**", 447 | section_original_text=[project["project__description"]], 448 | original_text_bullet_points=True, 449 | section_assessment=project["Comments__project"], 450 | section_improved_text=project["Improvement__project"], 451 | ) 452 | 453 | except Exception as exception: 454 | print(exception) 455 | -------------------------------------------------------------------------------- /Streamlit_App/app_sidebar.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | 3 | from app_constants import list_Assistant_Languages, list_LLM_providers 4 | 5 | 6 | def expander_model_parameters( 7 | LLM_provider="OpenAI", 8 | text_input_API_key="OpenAI API Key - [Get an API key](https://platform.openai.com/account/api-keys)", 9 | list_models=["gpt-3.5-turbo-0125", "gpt-3.5-turbo", "gpt-4-turbo-preview"], 10 | openai_api_key="", 11 | google_api_key="", 12 | ): 13 | """Add a text_input (for the API key) and a streamlit expander with models and parameters.""" 14 | 15 | st.session_state.LLM_provider = LLM_provider 16 | 17 | if LLM_provider == "OpenAI": 18 | st.session_state.openai_api_key = st.text_input( 19 | text_input_API_key, 20 | value=openai_api_key, 21 | type="password", 22 | placeholder="insert your API key", 23 | ) 24 | 25 | if LLM_provider == "Google": 26 | st.session_state.google_api_key = st.text_input( 27 | text_input_API_key, 28 | type="password", 29 | value=google_api_key, 30 | placeholder="insert your API key", 31 | ) 32 | 33 | with st.expander("**Models and parameters**"): 34 | st.session_state.selected_model = st.selectbox( 35 | f"Choose {LLM_provider} model", list_models 36 | ) 37 | # model parameters 38 | st.session_state.temperature = st.slider( 39 | "temperature", 40 | min_value=0.1, 41 | max_value=1.0, 42 | value=0.7, 43 | step=0.1, 44 | ) 45 | st.session_state.top_p = st.slider( 46 | "top_p", 47 | min_value=0.1, 48 | max_value=1.0, 49 | value=0.95, 50 | step=0.05, 51 | ) 52 | 53 | 54 | def sidebar(openai_api_key, google_api_key, cohere_api_key): 55 | """Create the sidebar.""" 56 | 57 | with st.sidebar: 58 | st.caption( 59 | "🚀 A resume scanner powered by 🔗 Langchain, OpenAI and Google Generative AI" 60 | ) 61 | st.write("") 62 | 63 | llm_chooser = st.radio( 64 | "Select provider", 65 | list_LLM_providers, 66 | captions=[ 67 | "[OpenAI pricing page](https://openai.com/pricing)", 68 | "Rate limit: 60 requests per minute.", 69 | ], 70 | ) 71 | 72 | st.divider() 73 | if llm_chooser == list_LLM_providers[0]: 74 | expander_model_parameters( 75 | LLM_provider="OpenAI", 76 | text_input_API_key="OpenAI API Key - [Get an API key](https://platform.openai.com/account/api-keys)", 77 | list_models=[ 78 | "gpt-3.5-turbo-0125", 79 | "gpt-3.5-turbo", 80 | "gpt-4-turbo-preview", 81 | ], 82 | openai_api_key=openai_api_key, 83 | google_api_key=google_api_key, 84 | ) 85 | 86 | if llm_chooser == list_LLM_providers[1]: 87 | expander_model_parameters( 88 | LLM_provider="Google", 89 | text_input_API_key="Google API Key - [Get an API key](https://makersuite.google.com/app/apikey)", 90 | list_models=["gemini-pro"], 91 | openai_api_key=openai_api_key, 92 | google_api_key=google_api_key, 93 | ) 94 | 95 | # Cohere API Key 96 | st.write("") 97 | st.session_state.cohere_api_key = st.text_input( 98 | "Coher API Key - [Get an API key](https://dashboard.cohere.com/api-keys)", 99 | type="password", 100 | value=cohere_api_key, 101 | placeholder="insert your API key", 102 | ) 103 | 104 | # Assistant language 105 | st.divider() 106 | st.session_state.assistant_language = st.selectbox( 107 | f"Assistant language", list_Assistant_Languages 108 | ) 109 | -------------------------------------------------------------------------------- /Streamlit_App/data/Images/Education.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Education.png -------------------------------------------------------------------------------- /Streamlit_App/data/Images/Language.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Language.png -------------------------------------------------------------------------------- /Streamlit_App/data/Images/Leonardo_AI.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/Leonardo_AI.jpg -------------------------------------------------------------------------------- /Streamlit_App/data/Images/app.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/app.png -------------------------------------------------------------------------------- /Streamlit_App/data/Images/contact_information.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/contact_information.png -------------------------------------------------------------------------------- /Streamlit_App/data/Images/scores.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/scores.png -------------------------------------------------------------------------------- /Streamlit_App/data/Images/top_3_strengths.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/top_3_strengths.png -------------------------------------------------------------------------------- /Streamlit_App/data/Images/work_experience.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaGrine/CV_Improver_with_LLMs/c7dd46f270fbcc109606e16b4e98b3bb7bb1f7e3/Streamlit_App/data/Images/work_experience.png -------------------------------------------------------------------------------- /Streamlit_App/keys.env: -------------------------------------------------------------------------------- 1 | api_key_openai = "Your_API_key" 2 | api_key_google = "Your_API_key" 3 | api_key_cohere = "Your_API_key" -------------------------------------------------------------------------------- /Streamlit_App/llm_functions.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | 3 | # LLM: openai 4 | from langchain_openai import ChatOpenAI 5 | 6 | # LLM: google_genai 7 | from langchain_google_genai import ChatGoogleGenerativeAI 8 | 9 | # dotenv and os 10 | from dotenv import load_dotenv, find_dotenv 11 | import os 12 | 13 | 14 | def get_api_keys_from_local_env(): 15 | """Get OpenAI, Gemini and Cohere API keys from local .env file""" 16 | try: 17 | found_dotenv = find_dotenv("keys.env", usecwd=True) 18 | load_dotenv(found_dotenv) 19 | try: 20 | openai_api_key = os.getenv("api_key_openai") 21 | except: 22 | openai_api_key = "" 23 | try: 24 | google_api_key = os.getenv("api_key_google") 25 | except: 26 | google_api_key = "" 27 | try: 28 | cohere_api_key = os.getenv("api_key_cohere") 29 | except: 30 | cohere_api_key = "" 31 | except Exception as e: 32 | print(e) 33 | 34 | return openai_api_key, google_api_key, cohere_api_key 35 | 36 | 37 | def instantiate_LLM( 38 | LLM_provider, api_key, temperature=0.5, top_p=0.95, model_name=None 39 | ): 40 | """Instantiate LLM in Langchain. 41 | Parameters: 42 | LLM_provider (str): the LLM provider; in ["OpenAI","Google"] 43 | model_name (str): in ["gpt-3.5-turbo", "gpt-3.5-turbo-0125", "gpt-4-turbo-preview","gemini-pro"]. 44 | api_key (str): google_api_key or openai_api_key 45 | temperature (float): Range: 0.0 - 1.0; default = 0.5 46 | top_p (float): : Range: 0.0 - 1.0; default = 1. 47 | """ 48 | if LLM_provider == "OpenAI": 49 | llm = ChatOpenAI( 50 | api_key=api_key, 51 | model=model_name, 52 | temperature=temperature, 53 | model_kwargs={"top_p": top_p}, 54 | ) 55 | if LLM_provider == "Google": 56 | llm = ChatGoogleGenerativeAI( 57 | google_api_key=api_key, 58 | # model="gemini-pro", 59 | model=model_name, 60 | temperature=temperature, 61 | top_p=top_p, 62 | convert_system_message_to_human=True, 63 | ) 64 | 65 | return llm 66 | 67 | 68 | def instantiate_LLM_main(temperature, top_p): 69 | """Instantiate the selected LLM model.""" 70 | try: 71 | if st.session_state.LLM_provider == "OpenAI": 72 | llm = instantiate_LLM( 73 | "OpenAI", 74 | api_key=st.session_state.openai_api_key, 75 | temperature=temperature, 76 | top_p=top_p, 77 | model_name=st.session_state.selected_model, 78 | ) 79 | else: 80 | llm = instantiate_LLM( 81 | "Google", 82 | api_key=st.session_state.google_api_key, 83 | temperature=temperature, 84 | top_p=top_p, 85 | model_name=st.session_state.selected_model, 86 | ) 87 | except Exception as e: 88 | st.error(f"An error occured: {e}") 89 | llm = None 90 | return llm 91 | -------------------------------------------------------------------------------- /Streamlit_App/resume_analyzer.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import json, warnings 3 | 4 | warnings.filterwarnings("ignore", category=FutureWarning) 5 | 6 | import datetime, json 7 | 8 | from langchain.prompts import PromptTemplate 9 | 10 | from app_constants import ( 11 | templates, 12 | PROMPT_IMPROVE_WORK_EXPERIENCE, 13 | PROMPT_IMPROVE_PROJECT, 14 | PROMPT_EVALUATE_RESUME, 15 | PROMPT_IMPROVE_SUMMARY, 16 | ) 17 | import retrieval 18 | 19 | 20 | def create_prompt_template(resume_sections, language="english"): 21 | """create the promptTemplate. 22 | Parameters: 23 | resume_sections (list): List of CV sections from which information will be extracted. 24 | language (str): the language of the assistant, default="english". 25 | """ 26 | 27 | # Create the Template 28 | template = f"""For the following resume, output in {language} the following information:\n\n""" 29 | 30 | for key in resume_sections: 31 | template += key + ": " + templates[key] + "\n\n" 32 | 33 | template += "For any requested information, if it is not found, output 'unknown' or ['unknown'] accordingly.\n\n" 34 | template += ( 35 | """Format the final output as a json dictionary with the following keys: (""" 36 | ) 37 | 38 | for key in resume_sections: 39 | template += "" + key + ", " 40 | template = template[:-2] + ")" # remove the last ", " 41 | 42 | template += """\n\nResume: {text}""" 43 | 44 | # Create the PromptTemplate 45 | prompt_template = PromptTemplate.from_template(template) 46 | 47 | return prompt_template 48 | 49 | 50 | def extract_from_text(text, start_tag, end_tag=None): 51 | """Use start and end tags to extract a substring from text. 52 | This helper function is used to parse the response content on the LLM in case 'json.loads' fails. 53 | """ 54 | start_index = text.find(start_tag) 55 | if end_tag is None: 56 | extacted_txt = text[start_index + len(start_tag) :] 57 | else: 58 | end_index = text.find(end_tag) 59 | extacted_txt = text[start_index + len(start_tag) : end_index] 60 | 61 | return extacted_txt 62 | 63 | 64 | def convert_text_to_list_of_dicts(text, dict_keys): 65 | """Convert text to a python list of dicts. 66 | Parameters: 67 | - text: string containing a list of dicts 68 | - dict_keys (list): the keys of the dictionary which will be returned. 69 | Output: 70 | - list_of_dicts (list): the list of dicts to return. 71 | """ 72 | list_of_dicts = [] 73 | 74 | if text != "": 75 | text_splitted = text.split("},\n") 76 | dict_keys.append(None) 77 | 78 | for i in range(len(text_splitted)): 79 | dict_i = {} 80 | 81 | for j in range(len(dict_keys) - 1): 82 | key_value = extract_from_text( 83 | text_splitted[i], f'"{dict_keys[j]}": ', f'"{dict_keys[j+1]}": ' 84 | ) 85 | key_value = key_value[: key_value.rfind(",\n")].strip()[1:-1] 86 | dict_i[dict_keys[j]] = key_value 87 | 88 | list_of_dicts.append(dict_i) # add the dict to the list. 89 | 90 | return list_of_dicts 91 | 92 | 93 | def get_current_time(): 94 | current_time = (datetime.datetime.now()).strftime("%H:%M:%S") 95 | return current_time 96 | 97 | 98 | def invoke_LLM( 99 | llm, 100 | documents, 101 | resume_sections: list, 102 | info_message="", 103 | language="english", 104 | ): 105 | """Invoke LLM and get a response. 106 | Parameters: 107 | - llm: the LLM to call 108 | - documents: our Langchain Documents. Will be use to format the prompt_template. 109 | - resume_sections (list): List of resume sections to be parsed. 110 | - info_message (str): display an informational message. 111 | - language (str): Assistant language. Will be use to format the prompt_template. 112 | 113 | Output: 114 | - response_content (str): the content of the LLM response. 115 | - response_tokens_count (int): count of response tokens. 116 | """ 117 | 118 | # 1. display the info message 119 | st.info(f"**{get_current_time()}** \t{info_message}") 120 | print(f"**{get_current_time()}** \t{info_message}") 121 | 122 | # 2. Create the promptTemplate. 123 | prompt_template = create_prompt_template( 124 | resume_sections, 125 | language=st.session_state.assistant_language, 126 | ) 127 | 128 | # 3. Format promptTemplate with the full documents 129 | if language is not None: 130 | prompt = prompt_template.format_prompt(text=documents, language=language).text 131 | else: 132 | prompt = prompt_template.format_prompt(text=documents).text 133 | 134 | # 4. Invoke LLM 135 | response = llm.invoke(prompt) 136 | 137 | response_content = response.content[ 138 | response.content.find("{") : response.content.rfind("}") + 1 139 | ] 140 | response_tokens_count = sum(retrieval.tiktoken_tokens([response_content])) 141 | 142 | return response_content, response_tokens_count 143 | 144 | 145 | def ResponseContent_Parser( 146 | response_content, list_fields, list_rfind, list_exclude_first_car 147 | ): 148 | """This is a function for parsing any response_content. 149 | Parameters: 150 | - response_content (str): the content of the LLM response we are going to parse. 151 | - list_fields (list): List of dictionary fields returned by this function. 152 | A field can be a dictionary. The key of the dict will not be parsed. 153 | Example: [{'Contact__information':['candidate__location','candidate__email','candidate__phone','candidate__social_media']}, 154 | 'CV__summary'] 155 | We will not parse the content for 'Contact__information'. 156 | - list_rfind (list): To parse the content of a field, first we will extract the text between this field and the next field. 157 | Then, extract text using `rfind` Python command, which returns the highest index in the text where the substring is found. 158 | - list_exclude_first_car (list): Exclusion or not of the first and last characters. 159 | 160 | Output: 161 | - INFORMATION_dict: dictionary, where fields are the keys and parsed texts are the values. 162 | 163 | """ 164 | 165 | list_fields_detailed = ( 166 | [] 167 | ) # list of tupples. tupple = (field,extract info (boolean), parent field) 168 | 169 | for field in list_fields: 170 | if type(field) is dict: 171 | list_fields_detailed.append( 172 | (list(field.keys())[0], False, None) 173 | ) # We will not extract any value for the text between this tag and the next. 174 | for val in list(field.values())[0]: 175 | list_fields_detailed.append((val, True, list(field.keys())[0])) 176 | else: 177 | list_fields_detailed.append((field, True, None)) 178 | 179 | list_fields_detailed.append((None, False, None)) 180 | 181 | # Parse the response_content 182 | INFORMATION_dict = {} 183 | 184 | for i in range(len(list_fields_detailed) - 1): 185 | if list_fields_detailed[i][1] is False: # Extract info = False 186 | INFORMATION_dict[list_fields_detailed[i][0]] = {} # Initialize the dict 187 | if list_fields_detailed[i][1]: 188 | extracted_value = extract_from_text( 189 | response_content, 190 | f'"{list_fields_detailed[i][0]}": ', 191 | f'"{list_fields_detailed[i+1][0]}":', 192 | ) 193 | extracted_value = extracted_value[ 194 | : extracted_value.rfind(list_rfind[i]) 195 | ].strip() 196 | if list_exclude_first_car[i]: 197 | extracted_value = extracted_value[1:-1].strip() 198 | if list_fields_detailed[i][2] is None: 199 | INFORMATION_dict[list_fields_detailed[i][0]] = extracted_value 200 | else: 201 | INFORMATION_dict[list_fields_detailed[i][2]][ 202 | list_fields_detailed[i][0] 203 | ] = extracted_value 204 | 205 | return INFORMATION_dict 206 | 207 | 208 | def Extract_contact_information(llm, documents): 209 | """Extract Contact Information: Name, Title, Location, Email, Phone number and Social media profiles.""" 210 | 211 | try: 212 | response_content, response_tokens_count = invoke_LLM( 213 | llm, 214 | documents, 215 | resume_sections=["Contact__information"], 216 | info_message="Extract and evaluate contact information...", 217 | language=st.session_state.assistant_language, 218 | ) 219 | 220 | try: 221 | # Load response_content to json dictionary 222 | CONTACT_INFORMATION = json.loads(response_content, strict=False) 223 | except Exception as e: 224 | print("[ERROR] json.loads returns error:", e) 225 | print("\n[INFO] Parse response content...\n") 226 | 227 | list_fields = [ 228 | { 229 | "Contact__information": [ 230 | "candidate__name", 231 | "candidate__title", 232 | "candidate__location", 233 | "candidate__email", 234 | "candidate__phone", 235 | "candidate__social_media", 236 | "evaluation__ContactInfo", 237 | "score__ContactInfo", 238 | ] 239 | } 240 | ] 241 | list_rfind = [",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", "}\n"] 242 | list_exclude_first_car = [ 243 | True, 244 | True, 245 | True, 246 | True, 247 | True, 248 | True, 249 | False, 250 | True, 251 | False, 252 | ] 253 | CONTACT_INFORMATION = ResponseContent_Parser( 254 | response_content, list_fields, list_rfind, list_exclude_first_car 255 | ) 256 | # convert score to int 257 | try: 258 | CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"] = int( 259 | CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"] 260 | ) 261 | except: 262 | CONTACT_INFORMATION["Contact__information"]["score__ContactInfo"] = -1 263 | 264 | except Exception as exception: 265 | print(f"[Error] {exception}") 266 | CONTACT_INFORMATION = { 267 | "Contact__information": { 268 | "candidate__name": "unknown", 269 | "candidate__title": "unknown", 270 | "candidate__location": "unknown", 271 | "candidate__email": "unknown", 272 | "candidate__phone": "unknown", 273 | "candidate__social_media": "unknown", 274 | "evaluation__ContactInfo": "unknown", 275 | "score__ContactInfo": -1, 276 | } 277 | } 278 | 279 | return CONTACT_INFORMATION 280 | 281 | 282 | def Extract_Evaluate_Summary(llm, documents): 283 | """Extract, evaluate and strengthen the summary.""" 284 | 285 | ###################################### 286 | # 1. Extract the summary 287 | ###################################### 288 | try: 289 | response_content, response_tokens_count = invoke_LLM( 290 | llm, 291 | documents, 292 | resume_sections=["CV__summary"], 293 | info_message="Extract and evaluate the Summary....", 294 | language=st.session_state.assistant_language, 295 | ) 296 | try: 297 | # Load response_content to json dictionary 298 | SUMMARY_SECTION = json.loads(response_content, strict=False) 299 | except Exception as e: 300 | print("[ERROR] json.loads returns error:", e) 301 | print("\n[INFO] Parse response content...\n") 302 | 303 | list_fields = ["CV__summary"] 304 | list_rfind = ["}\n"] 305 | list_exclude_first_car = [True] 306 | 307 | SUMMARY_SECTION = ResponseContent_Parser( 308 | response_content, list_fields, list_rfind, list_exclude_first_car 309 | ) 310 | 311 | except Exception as exception: 312 | print(f"[Error] {exception}") 313 | SUMMARY_SECTION = {"CV__summary": "unknown"} 314 | 315 | ###################################### 316 | # 2. Evaluate the summary 317 | ###################################### 318 | 319 | try: 320 | prompt_template = PromptTemplate.from_template(PROMPT_IMPROVE_SUMMARY) 321 | 322 | prompt = prompt_template.format_prompt( 323 | resume=documents, 324 | language=st.session_state.assistant_language, 325 | summary=SUMMARY_SECTION["CV__summary"], 326 | ).text 327 | 328 | # Invoke LLM 329 | response = llm.invoke(prompt) 330 | response_content = response.content[ 331 | response.content.find("{") : response.content.rfind("}") + 1 332 | ] 333 | 334 | try: 335 | SUMMARY_EVAL = {} 336 | SUMMARY_EVAL["Summary__evaluation"] = json.loads( 337 | response_content, strict=False 338 | ) 339 | except Exception as e: 340 | print("[ERROR] json.loads returns error:", e) 341 | print("\n[INFO] Parse response content...\n") 342 | 343 | list_fields = [ 344 | "evaluation__summary", 345 | "score__summary", 346 | "CV__summary_enhanced", 347 | ] 348 | list_rfind = [",\n", ",\n", "}\n"] 349 | list_exclude_first_car = [True, False, True] 350 | SUMMARY_EVAL["Summary__evaluation"] = ResponseContent_Parser( 351 | response_content, list_fields, list_rfind, list_exclude_first_car 352 | ) 353 | # convert score to int 354 | try: 355 | SUMMARY_EVAL["Summary__evaluation"]["score__summary"] = int( 356 | SUMMARY_EVAL["Summary__evaluation"]["score__summary"] 357 | ) 358 | except: 359 | SUMMARY_EVAL["Summary__evaluation"]["score__summary"] = -1 360 | 361 | except Exception as e: 362 | print(e) 363 | SUMMARY_EVAL = { 364 | "Summary__evaluation": { 365 | "evaluation__summary": "unknown", 366 | "score__summary": -1, 367 | "CV__summary_enhanced": "unknown", 368 | } 369 | } 370 | 371 | SUMMARY_EVAL["CV__summary"] = SUMMARY_SECTION["CV__summary"] 372 | 373 | return SUMMARY_EVAL 374 | 375 | 376 | def Extract_Education_Language(llm, documents): 377 | """Extract and evaluate education and language sections.""" 378 | 379 | try: 380 | response_content, response_tokens_count = invoke_LLM( 381 | llm, 382 | documents, 383 | resume_sections=[ 384 | "CV__Education", 385 | "Education__evaluation", 386 | "CV__Languages", 387 | "Languages__evaluation", 388 | ], 389 | info_message="Extract and evaluate education and language sections...", 390 | language=st.session_state.assistant_language, 391 | ) 392 | 393 | try: 394 | # Load response_content to json dictionary 395 | Education_Language_sections = json.loads(response_content, strict=False) 396 | except Exception as e: 397 | print("[ERROR] json.loads returns error:", e) 398 | print("\n[INFO] Parse response content...\n") 399 | 400 | list_fields = [ 401 | "CV__Education", 402 | {"Education__evaluation": ["score__edu", "evaluation__edu"]}, 403 | "CV__Languages", 404 | {"Languages__evaluation": ["score__language", "evaluation__language"]}, 405 | ] 406 | 407 | list_rfind = [",\n", ",\n", ",\n", ",\n", ",\n", ",\n", ",\n", "\n"] 408 | list_exclude_first_car = [True, True, False, True, True, True, False, True] 409 | 410 | Education_Language_sections = ResponseContent_Parser( 411 | response_content, list_fields, list_rfind, list_exclude_first_car 412 | ) 413 | 414 | # Convert scores to int 415 | try: 416 | Education_Language_sections["Education__evaluation"]["score__edu"] = ( 417 | int( 418 | Education_Language_sections["Education__evaluation"][ 419 | "score__edu" 420 | ] 421 | ) 422 | ) 423 | except: 424 | Education_Language_sections["Education__evaluation"]["score__edu"] = -1 425 | 426 | try: 427 | Education_Language_sections["Languages__evaluation"][ 428 | "score__language" 429 | ] = int( 430 | Education_Language_sections["Languages__evaluation"][ 431 | "score__language" 432 | ] 433 | ) 434 | except: 435 | Education_Language_sections["Languages__evaluation"][ 436 | "score__language" 437 | ] = -1 438 | 439 | # Split languages and educational texts into a Python list of dict 440 | languages = Education_Language_sections["CV__Languages"] 441 | Education_Language_sections["CV__Languages"] = ( 442 | convert_text_to_list_of_dicts( 443 | text=languages[ 444 | languages.find("[") + 1 : languages.rfind("]") 445 | ].strip(), 446 | dict_keys=["spoken__language", "language__fluency"], 447 | ) 448 | ) 449 | education = Education_Language_sections["CV__Education"] 450 | Education_Language_sections["CV__Education"] = ( 451 | convert_text_to_list_of_dicts( 452 | text=education[ 453 | education.find("[") + 1 : education.rfind("]") 454 | ].strip(), 455 | dict_keys=[ 456 | "edu__college", 457 | "edu__degree", 458 | "edu__start_date", 459 | "edu__end_date", 460 | ], 461 | ) 462 | ) 463 | except Exception as exception: 464 | print(exception) 465 | Education_Language_sections = { 466 | "CV__Education": [], 467 | "Education__evaluation": {"score__edu": -1, "evaluation__edu": "unknown"}, 468 | "CV__Languages": [], 469 | "Languages__evaluation": { 470 | "score__language": -1, 471 | "evaluation__language": "unknown", 472 | }, 473 | } 474 | 475 | return Education_Language_sections 476 | 477 | 478 | def Extract_Skills_and_Certifications(llm, documents): 479 | """Extract skills and certifications and evaluate these sections.""" 480 | 481 | try: 482 | response_content, response_tokens_count = invoke_LLM( 483 | llm, 484 | documents, 485 | resume_sections=[ 486 | "candidate__skills", 487 | "Skills__evaluation", 488 | "CV__Certifications", 489 | "Certif__evaluation", 490 | ], 491 | info_message="Extract and evaluate the skills and certifications...", 492 | language=st.session_state.assistant_language, 493 | ) 494 | 495 | try: 496 | # Load response_content to json dictionary 497 | SKILLS_and_CERTIF = json.loads(response_content, strict=False) 498 | except Exception as e: 499 | print("[ERROR] json.loads returns error:", e) 500 | print("\n[INFO] Parse response content...\n") 501 | 502 | skills = extract_from_text( 503 | response_content, '"candidate__skills": ', '"Skills__evaluation":' 504 | ) 505 | skills = skills.replace("\n ", "\n").replace("],\n", "").replace("[\n", "") 506 | score_skills = extract_from_text( 507 | response_content, '"score__skills": ', '"evaluation__skills":' 508 | ) 509 | evaluation_skills = extract_from_text( 510 | response_content, '"evaluation__skills": ', '"CV__Certifications":' 511 | ) 512 | 513 | certif_text = extract_from_text( 514 | response_content, '"CV__Certifications": ', '"Certif__evaluation":' 515 | ) 516 | certif_score = extract_from_text( 517 | response_content, '"score__certif": ', '"evaluation__certif":' 518 | ) 519 | certif_eval = extract_from_text( 520 | response_content, '"evaluation__certif": ', None 521 | ) 522 | 523 | # Create the dictionary 524 | SKILLS_and_CERTIF = {} 525 | SKILLS_and_CERTIF["candidate__skills"] = [ 526 | skill.strip()[1:-1] for skill in skills.split(",\n") 527 | ] 528 | try: 529 | score_skills_int = int(score_skills[0 : score_skills.rfind(",\n")]) 530 | except: 531 | score_skills_int = -1 532 | SKILLS_and_CERTIF["Skills__evaluation"] = { 533 | "score__skills": score_skills_int, 534 | "evaluation__skills": evaluation_skills[ 535 | : evaluation_skills.rfind("}\n") 536 | ].strip()[1:-1], 537 | } 538 | 539 | # Convert certificate text to list of dictionaries 540 | list_certifs = convert_text_to_list_of_dicts( 541 | text=certif_text[ 542 | certif_text.find("[") + 1 : certif_text.rfind("]") 543 | ].strip(), # .strip()[1:-1] 544 | dict_keys=[ 545 | "certif__title", 546 | "certif__organization", 547 | "certif__date", 548 | "certif__expiry_date", 549 | "certif__details", 550 | ], 551 | ) 552 | SKILLS_and_CERTIF["CV__Certifications"] = list_certifs 553 | try: 554 | certif_score_int = int(certif_score[0 : certif_score.rfind(",\n")]) 555 | except: 556 | certif_score_int = -1 557 | SKILLS_and_CERTIF["Certif__evaluation"] = { 558 | "score__certif": certif_score_int, 559 | "evaluation__certif": certif_eval[: certif_eval.rfind("}\n")].strip()[ 560 | 1:-1 561 | ], 562 | } 563 | 564 | except Exception as exception: 565 | SKILLS_and_CERTIF = { 566 | "candidate__skills": [], 567 | "Skills__evaluation": { 568 | "score__skills": -1, 569 | "evaluation__skills": "unknown", 570 | }, 571 | "CV__Certifications": [], 572 | "Certif__evaluation": { 573 | "score__certif": -1, 574 | "evaluation__certif": "unknown", 575 | }, 576 | } 577 | print(exception) 578 | 579 | return SKILLS_and_CERTIF 580 | 581 | 582 | def Extract_PROFESSIONAL_EXPERIENCE(llm, documents): 583 | """Extract list of work experience and projects.""" 584 | 585 | try: 586 | response_content, response_tokens_count = invoke_LLM( 587 | llm, 588 | documents, 589 | resume_sections=["Work__experience", "CV__Projects"], 590 | info_message="Extract list of work experience and projects...", 591 | language=st.session_state.assistant_language, 592 | ) 593 | 594 | try: 595 | # Load response_content to json dictionary 596 | PROFESSIONAL_EXPERIENCE = json.loads(response_content, strict=False) 597 | except Exception as e: 598 | print("[ERROR] json.loads returns error:", e) 599 | print("\n[INFO] Parse response content...\n") 600 | 601 | work_experiences = extract_from_text( 602 | response_content, '"Work__experience": ', '"CV__Projects":' 603 | ) 604 | projects = extract_from_text(response_content, '"CV__Projects": ', None) 605 | 606 | # Create the dictionary 607 | PROFESSIONAL_EXPERIENCE = {} 608 | PROFESSIONAL_EXPERIENCE["Work__experience"] = convert_text_to_list_of_dicts( 609 | text=work_experiences[ 610 | work_experiences.find("[") + 1 : work_experiences.rfind("]") 611 | ].strip()[1:-1], 612 | dict_keys=[ 613 | "job__title", 614 | "job__company", 615 | "job__start_date", 616 | "job__end_date", 617 | ], 618 | ) 619 | PROFESSIONAL_EXPERIENCE["CV__Projects"] = convert_text_to_list_of_dicts( 620 | text=projects[projects.find("[") + 1 : projects.rfind("]")].strip()[ 621 | 1:-1 622 | ], 623 | dict_keys=[ 624 | "project__title", 625 | "project__start_date", 626 | "project__end_date", 627 | ], 628 | ) 629 | # Exclude 'unknown' projects and work experiences 630 | try: 631 | for work_experience in PROFESSIONAL_EXPERIENCE["Work__experience"]: 632 | if work_experience["job__title"] == "unknown": 633 | PROFESSIONAL_EXPERIENCE["Work__experience"].remove(work_experience) 634 | except Exception as e: 635 | print(e) 636 | try: 637 | for project in PROFESSIONAL_EXPERIENCE["CV__Projects"]: 638 | if project["project__title"] == "unknown": 639 | PROFESSIONAL_EXPERIENCE["CV__Projects"].remove(project) 640 | except Exception as e: 641 | print(e) 642 | 643 | except Exception as exception: 644 | PROFESSIONAL_EXPERIENCE = {"Work__experience": [], "CV__Projects": []} 645 | print(exception) 646 | 647 | return PROFESSIONAL_EXPERIENCE 648 | 649 | 650 | def get_relevant_documents(query, documents): 651 | """Retreieve most relevant documents from Langchain documents using the CoherRerank retriever.""" 652 | 653 | # 1.1. Retrieve documents using the CohereRerank retriever 654 | 655 | retrieved_docs = st.session_state.retriever.get_relevant_documents(query) 656 | 657 | # 1.2. Keep only relevant documents where relevance_score >= (max(relevance_scores) - 0.1) 658 | 659 | relevance_scores = [ 660 | retrieved_docs[j].metadata["relevance_score"] 661 | for j in range(len(retrieved_docs)) 662 | ] 663 | max_relevance_score = max(relevance_scores) 664 | threshold = max_relevance_score - 0.1 665 | 666 | relevant_doc_ids = [] 667 | 668 | for j in range(len(retrieved_docs)): 669 | 670 | # keep relevant documents with (relevance_score >= threshold) 671 | 672 | if retrieved_docs[j].metadata["relevance_score"] >= threshold: 673 | # Append the retrieved document 674 | relevant_doc_ids.append(retrieved_docs[j].metadata["doc_number"]) 675 | 676 | # Append the next document to the most relevant document, as relevant information may be split between two documents. 677 | relevant_doc_ids.append(min(relevant_doc_ids[0] + 1, len(documents) - 1)) 678 | 679 | # Sort document ids 680 | relevant_doc_ids = sorted(set(relevant_doc_ids)) 681 | 682 | # Get the most relevant documents 683 | relevant_documents = [documents[k] for k in relevant_doc_ids] 684 | 685 | return relevant_documents 686 | 687 | 688 | def Extract_Job_Responsibilities(llm, documents, PROFESSIONAL_EXPERIENCE): 689 | """Extract job responsibilities for each job in PROFESSIONAL_EXPERIENCE.""" 690 | 691 | st.info(f"**{get_current_time()}** \tExtract work experience responsibilities...") 692 | print(f"**{get_current_time()}** \tExtract work experience responsibilities...") 693 | 694 | for i in range(len(PROFESSIONAL_EXPERIENCE["Work__experience"])): 695 | try: 696 | Work_experience_i = PROFESSIONAL_EXPERIENCE["Work__experience"][i] 697 | 698 | # 1. Extract relevant documents 699 | query = f"""Extract from the resume delimited by triple backticks \ 700 | all the duties and responsibilities of the following work experience: \ 701 | (title = '{Work_experience_i['job__title']}'""" 702 | if str(Work_experience_i["job__company"]) != "unknown": 703 | query += f" and company = '{Work_experience_i['job__company']}'" 704 | if str(Work_experience_i["job__start_date"]) != "unknown": 705 | query += f" and start date = '{Work_experience_i['job__start_date']}'" 706 | if str(Work_experience_i["job__end_date"]) != "unknown": 707 | query += f" and end date = '{Work_experience_i['job__end_date']}'" 708 | query += ")\n" 709 | 710 | try: 711 | relevant_documents = get_relevant_documents(query, documents) 712 | except Exception as err: 713 | st.error(f"get_relevant_documents error: {err}") 714 | relevant_documents = documents 715 | 716 | # 2. Invoke LLM 717 | 718 | prompt = ( 719 | query 720 | + f"""Output the duties in a json dictionary with the following keys (__duty_id__,__duty__). \ 721 | Use this format: "1":"duty","2":"another duty". 722 | Resume:\n\n ```{relevant_documents}```""" 723 | ) 724 | response = llm.invoke(prompt) 725 | 726 | # 3. Convert the response content to json dict and update work_experience 727 | response_content = response.content[ 728 | response.content.find("{") : response.content.rfind("}") + 1 729 | ] 730 | 731 | try: 732 | Work_experience_i["work__duties"] = json.loads( 733 | response_content, strict=False 734 | ) # Convert the response content to a json dict 735 | except Exception as e: 736 | print("\njson.loads returns error:", e, "\n\n") 737 | print("\n[INFO] Parse response content...\n") 738 | 739 | Work_experience_i["work__duties"] = {} 740 | list_duties = ( 741 | response_content[ 742 | response_content.find("{") + 1 : response_content.rfind("}") 743 | ] 744 | .strip() 745 | .split(",\n") 746 | ) 747 | 748 | for j in range(len(list_duties)): 749 | try: 750 | Work_experience_i["work__duties"][f"{j+1}"] = ( 751 | list_duties[j].split('":')[1].strip()[1:-1] 752 | ) 753 | except: 754 | Work_experience_i["work__duties"][f"{j+1}"] = "unknown" 755 | 756 | except Exception as exception: 757 | Work_experience_i["work__duties"] = {} 758 | print(exception) 759 | 760 | return PROFESSIONAL_EXPERIENCE 761 | 762 | 763 | def Extract_Project_Details(llm, documents, PROFESSIONAL_EXPERIENCE): 764 | """Extract project details for each project in PROFESSIONAL_EXPERIENCE.""" 765 | 766 | st.info(f"**{get_current_time()}** \tExtract project details...") 767 | print(f"**{get_current_time()}** \tExtract project details...") 768 | 769 | for i in range(len(PROFESSIONAL_EXPERIENCE["CV__Projects"])): 770 | try: 771 | project_i = PROFESSIONAL_EXPERIENCE["CV__Projects"][i] 772 | 773 | # 1. Extract relevant documents 774 | query = f"""Extract from the resume (delimited by triple backticks) what is listed about the following project: \ 775 | (project title = '{project_i['project__title']}'""" 776 | if str(project_i["project__start_date"]) != "unknown": 777 | query += f" and start date = '{project_i['project__start_date']}'" 778 | if str(project_i["project__end_date"]) != "unknown": 779 | query += f" and end date = '{project_i['project__end_date']}'" 780 | query += ")" 781 | 782 | try: 783 | relevant_documents = get_relevant_documents(query, documents) 784 | except Exception as err: 785 | st.error(f"get_relevant_documents error: {err}") 786 | relevant_documents = documents 787 | 788 | # 2. Invoke LLM 789 | 790 | prompt = ( 791 | query 792 | + f"""Format the extracted text into a string (with bullet points). 793 | Resume:\n\n ```{relevant_documents}```""" 794 | ) 795 | 796 | response = llm.invoke(prompt) 797 | 798 | response_content = response.content 799 | project_i["project__description"] = response_content 800 | 801 | except Exception as exception: 802 | project_i["project__description"] = "unknown" 803 | print(exception) 804 | 805 | return PROFESSIONAL_EXPERIENCE 806 | 807 | 808 | ############################################################################### 809 | # Improve Work Experience and Project texts 810 | ############################################################################### 811 | 812 | 813 | def improve_text_quality(PROMPT, text_to_imporve, llm, language): 814 | """Invoke LLM to improve the text quality.""" 815 | query = PROMPT.format(text=text_to_imporve, language=language) 816 | response = llm.invoke(query) 817 | return response 818 | 819 | 820 | def improve_work_experience(WORK_EXPERIENCE: list, llm): 821 | """Improve each bullet point in the work experience responsibilities.""" 822 | 823 | message = f"**{get_current_time()}** \tImprove the quality of the work experience section..." 824 | st.info(message) 825 | print(message) 826 | 827 | # Call LLM for any work experience to get a better and stronger text. 828 | for i in range(len(WORK_EXPERIENCE)): 829 | try: 830 | WORK_EXPERIENCE_i = WORK_EXPERIENCE[i] 831 | 832 | # 1. Convert the responsibilities from dict to string 833 | 834 | text_duties = "" 835 | for duty in list(WORK_EXPERIENCE_i["work__duties"].values()): 836 | text_duties += "- " + duty 837 | # 2. Call LLM 838 | 839 | response = improve_text_quality( 840 | PROMPT_IMPROVE_WORK_EXPERIENCE, 841 | text_duties, 842 | llm, 843 | st.session_state.assistant_language, 844 | ) 845 | response_content = response.content 846 | 847 | # 3. Convert response content to json dict with keys: 848 | # ('Score__WorkExperience','Comments__WorkExperience','Improvement__WorkExperience') 849 | 850 | response_content = response_content[ 851 | response_content.find("{") : response_content.rfind("}") + 1 852 | ] 853 | 854 | try: 855 | list_fields = [ 856 | "Score__WorkExperience", 857 | "Comments__WorkExperience", 858 | "Improvement__WorkExperience", 859 | ] 860 | list_rfind = [",\n", ",\n", "\n"] 861 | list_exclude_first_car = [False, True, True] 862 | response_content_dict = ResponseContent_Parser( 863 | response_content, list_fields, list_rfind, list_exclude_first_car 864 | ) 865 | try: 866 | response_content_dict["Score__WorkExperience"] = int( 867 | response_content_dict["Score__WorkExperience"] 868 | ) 869 | except: 870 | response_content_dict["Score__WorkExperience"] = -1 871 | 872 | except Exception as e: 873 | response_content_dict = { 874 | "Score__WorkExperience": -1, 875 | "Comments__WorkExperience": "", 876 | "Improvement__WorkExperience": "", 877 | } 878 | print(e) 879 | st.error(e) 880 | 881 | # 4. update PROFESSIONAL_EXPERIENCE: Add the new keys (overall_quality, comments, Improvement.) 882 | 883 | WORK_EXPERIENCE_i["Score__WorkExperience"] = response_content_dict[ 884 | "Score__WorkExperience" 885 | ] 886 | WORK_EXPERIENCE_i["Comments__WorkExperience"] = response_content_dict[ 887 | "Comments__WorkExperience" 888 | ] 889 | WORK_EXPERIENCE_i["Improvement__WorkExperience"] = response_content_dict[ 890 | "Improvement__WorkExperience" 891 | ] 892 | 893 | except Exception as exception: 894 | st.error(exception) 895 | print(exception) 896 | WORK_EXPERIENCE_i["Score__WorkExperience"] = -1 897 | WORK_EXPERIENCE_i["Comments__WorkExperience"] = "" 898 | WORK_EXPERIENCE_i["Improvement__WorkExperience"] = "" 899 | 900 | return WORK_EXPERIENCE 901 | 902 | 903 | def improve_projects(PROJECTS: list, llm): 904 | """Improve project text with LLM.""" 905 | 906 | st.info(f"**{get_current_time()}** \tImprove the quality of the project section...") 907 | print(f"**{get_current_time()}** \tImprove the quality of the project section...") 908 | 909 | for i in range(len(PROJECTS)): 910 | try: 911 | PROJECT_i = PROJECTS[i] # the ith project. 912 | 913 | # 1. LLM call to improve the text quality of each duty 914 | response = improve_text_quality( 915 | PROMPT_IMPROVE_PROJECT, 916 | PROJECT_i["project__title"] + "\n" + PROJECT_i["project__description"], 917 | llm, 918 | st.session_state.assistant_language, 919 | ) 920 | response_content = response.content 921 | 922 | # 2. Convert response content to json dict with keys: 923 | # ('Score__project','Comments__project','Improvement__project') 924 | 925 | response_content = response_content[ 926 | response_content.find("{") : response_content.rfind("}") + 1 927 | ] 928 | 929 | try: 930 | list_fields = [ 931 | "Score__project", 932 | "Comments__project", 933 | "Improvement__project", 934 | ] 935 | list_rfind = [",\n", ",\n", "\n"] 936 | list_exclude_first_car = [False, True, True] 937 | 938 | response_content_dict = ResponseContent_Parser( 939 | response_content, list_fields, list_rfind, list_exclude_first_car 940 | ) 941 | try: 942 | response_content_dict["Score__project"] = int( 943 | response_content_dict["Score__project"] 944 | ) 945 | except: 946 | response_content_dict["Score__project"] = -1 947 | 948 | except Exception as e: 949 | response_content_dict = { 950 | "Score__project": -1, 951 | "Comments__project": "", 952 | "Improvement__project": "", 953 | } 954 | print(e) 955 | 956 | # 3. Update PROJECTS 957 | PROJECT_i["Score__project"] = response_content_dict["Score__project"] 958 | PROJECT_i["Comments__project"] = response_content_dict["Comments__project"] 959 | PROJECT_i["Improvement__project"] = response_content_dict[ 960 | "Improvement__project" 961 | ] 962 | 963 | except Exception as exception: 964 | print(exception) 965 | 966 | PROJECT_i["Score__project"] = -1 967 | PROJECT_i["Comments__project"] = "" 968 | PROJECT_i["Improvement__project"] = "" 969 | 970 | return PROJECTS 971 | 972 | 973 | ############################################################################### 974 | # Evaluate the Resume 975 | ############################################################################### 976 | 977 | 978 | def Evaluate_the_Resume(llm, documents): 979 | try: 980 | st.info( 981 | f"**{get_current_time()}** \tEvaluate, outline and analyse \ 982 | the resume's top 3 strengths and top 3 weaknesses..." 983 | ) 984 | print( 985 | f"**{get_current_time()}** \tEvaluate, outline and analyse \ 986 | the resume's top 3 strengths and top 3 weaknesses..." 987 | ) 988 | 989 | prompt_template = PromptTemplate.from_template(PROMPT_EVALUATE_RESUME) 990 | prompt = prompt_template.format_prompt( 991 | text=documents, language=st.session_state.assistant_language 992 | ).text 993 | 994 | # Invoke LLM 995 | response = llm.invoke(prompt) 996 | response_content = response.content[ 997 | response.content.find("{") : response.content.rfind("}") + 1 998 | ] 999 | try: 1000 | RESUME_EVALUATION = json.loads(response_content) 1001 | except Exception as e: 1002 | print("[ERROR] json.loads returns error:", e) 1003 | print("\n[INFO] Parse response content...\n") 1004 | 1005 | list_fields = ["resume_cv_overview", "top_3_strengths", "top_3_weaknesses"] 1006 | list_rfind = [",\n", ",\n", "\n"] 1007 | list_exclude_first_car = [True, True, True] 1008 | RESUME_EVALUATION = ResponseContent_Parser( 1009 | response_content, list_fields, list_rfind, list_exclude_first_car 1010 | ) 1011 | 1012 | except Exception as error: 1013 | RESUME_EVALUATION = { 1014 | "resume_cv_overview": "unknown", 1015 | "top_3_strengths": "unknown", 1016 | "top_3_weaknesses": "unknown", 1017 | } 1018 | print(f"An error occured: {error}") 1019 | 1020 | return RESUME_EVALUATION 1021 | 1022 | 1023 | def get_section_scores(SCANNED_RESUME): 1024 | """Output in a dictionary the scores of all the sections of the resume (summary, skills...)""" 1025 | dict_scores = {} 1026 | 1027 | # Summary, Skills, EDUCATION 1028 | dict_scores["ContactInfo"] = max( 1029 | -1, SCANNED_RESUME["Contact__information"]["score__ContactInfo"] 1030 | ) 1031 | dict_scores["summary"] = max( 1032 | -1, SCANNED_RESUME["Summary__evaluation"]["score__summary"] 1033 | ) 1034 | dict_scores["skills"] = max( 1035 | -1, SCANNED_RESUME["Skills__evaluation"]["score__skills"] 1036 | ) 1037 | dict_scores["education"] = max( 1038 | -1, SCANNED_RESUME["Education__evaluation"]["score__edu"] 1039 | ) 1040 | dict_scores["language"] = max( 1041 | -1, SCANNED_RESUME["Languages__evaluation"]["score__language"] 1042 | ) 1043 | 1044 | dict_scores["certfication"] = max( 1045 | -1, SCANNED_RESUME["Certif__evaluation"]["score__certif"] 1046 | ) 1047 | 1048 | # Work__experience: The score is the average of the scores of all the work experiences. 1049 | scores = [] 1050 | for work_experience in SCANNED_RESUME["Work__experience"]: 1051 | score = work_experience["Score__WorkExperience"] 1052 | if score > -1: 1053 | scores.append(score) 1054 | try: 1055 | dict_scores["work_experience"] = int(sum(scores) / len(scores)) 1056 | except: 1057 | dict_scores["work_experience"] = 0 1058 | 1059 | # Projects: The score is the average of the scores of all projects. 1060 | scores = [] 1061 | for project in SCANNED_RESUME["CV__Projects"]: 1062 | score = project["Score__project"] 1063 | if score > -1: 1064 | scores.append(score) 1065 | try: 1066 | dict_scores["projects"] = int(sum(scores) / len(scores)) 1067 | except: 1068 | dict_scores["projects"] = 0 1069 | 1070 | return dict_scores 1071 | 1072 | 1073 | ############################################################################### 1074 | # Put it all together 1075 | ############################################################################### 1076 | 1077 | 1078 | def resume_analyzer_main(llm, llm_creative, documents): 1079 | """Put it all together: Extract, evaluate and improve all resume sections. 1080 | Save the final results in a dictionary. 1081 | """ 1082 | # 1. Extract Contact information: Name, Title, Location, Email,... 1083 | CONTACT_INFORMATION = Extract_contact_information(llm, documents) 1084 | 1085 | # 2. Extract, evaluate and improve the Summary 1086 | Summary_SECTION = Extract_Evaluate_Summary(llm, documents) 1087 | 1088 | # 3. Extract and evaluate education and language sections. 1089 | Education_Language_sections = Extract_Education_Language(llm, documents) 1090 | 1091 | # 4. Extract and evaluate the SKILLS. 1092 | SKILLS_and_CERTIF = Extract_Skills_and_Certifications(llm, documents) 1093 | 1094 | # 5. Extract Work Experience and Projects. 1095 | PROFESSIONAL_EXPERIENCE = Extract_PROFESSIONAL_EXPERIENCE(llm, documents) 1096 | 1097 | # 6. EXTRACT WORK EXPERIENCE RESPONSIBILITIES. 1098 | PROFESSIONAL_EXPERIENCE = Extract_Job_Responsibilities( 1099 | llm, documents, PROFESSIONAL_EXPERIENCE 1100 | ) 1101 | 1102 | # 7. EXTRACT PROJECT DETAILS. 1103 | PROFESSIONAL_EXPERIENCE = Extract_Project_Details( 1104 | llm, documents, PROFESSIONAL_EXPERIENCE 1105 | ) 1106 | 1107 | # 8. Improve the quality of the work experience section. 1108 | PROFESSIONAL_EXPERIENCE["Work__experience"] = improve_work_experience( 1109 | WORK_EXPERIENCE=PROFESSIONAL_EXPERIENCE["Work__experience"], llm=llm_creative 1110 | ) 1111 | 1112 | # 9. Improve the quality of the project section. 1113 | PROFESSIONAL_EXPERIENCE["CV__Projects"] = improve_projects( 1114 | PROJECTS=PROFESSIONAL_EXPERIENCE["CV__Projects"], llm=llm_creative 1115 | ) 1116 | 1117 | # 10. Evaluate the Resume 1118 | RESUME_EVALUATION = Evaluate_the_Resume(llm_creative, documents) 1119 | 1120 | # 11. Put it all together: create the SCANNED_RESUME dictionary 1121 | SCANNED_RESUME = {} 1122 | for dictionary in [ 1123 | CONTACT_INFORMATION, 1124 | Summary_SECTION, 1125 | Education_Language_sections, 1126 | SKILLS_and_CERTIF, 1127 | PROFESSIONAL_EXPERIENCE, 1128 | RESUME_EVALUATION, 1129 | ]: 1130 | SCANNED_RESUME.update(dictionary) 1131 | 1132 | # 12. Save the Scanned resume 1133 | try: 1134 | now = (datetime.datetime.now()).strftime("%Y%m%d_%H%M%S") 1135 | file_name = "results_" + now 1136 | with open(f"./data/{file_name}.json", "w") as fp: 1137 | json.dump(SCANNED_RESUME, fp) 1138 | except: 1139 | pass 1140 | 1141 | return SCANNED_RESUME 1142 | -------------------------------------------------------------------------------- /Streamlit_App/retrieval.py: -------------------------------------------------------------------------------- 1 | # Streamlit 2 | import streamlit as st 3 | 4 | # document loader 5 | from langchain_community.document_loaders import PDFMinerLoader 6 | 7 | # text_splitter 8 | from langchain.text_splitter import RecursiveCharacterTextSplitter 9 | 10 | # Cohere reranker 11 | from langchain.retrievers import ContextualCompressionRetriever 12 | from langchain.retrievers.document_compressors import CohereRerank 13 | from langchain_community.llms import Cohere 14 | 15 | # Embeddings 16 | from langchain_openai import OpenAIEmbeddings 17 | from langchain_google_genai import GoogleGenerativeAIEmbeddings 18 | 19 | # FAISS vector database 20 | from langchain_community.vectorstores import FAISS 21 | 22 | # Other libraries 23 | import os, glob, datetime 24 | from pathlib import Path 25 | import tiktoken 26 | import warnings 27 | 28 | warnings.filterwarnings("ignore", category=FutureWarning) 29 | 30 | 31 | # Data Directories: where temp files and vectorstores will be saved 32 | from app_constants import TMP_DIR 33 | 34 | 35 | def langchain_document_loader(file_path): 36 | """Load and split a PDF file in Langchain. 37 | Parameters: 38 | - file_path (str): path of the file. 39 | Output: 40 | - documents: list of Langchain Documents.""" 41 | 42 | if file_path.endswith(".pdf"): 43 | loader = PDFMinerLoader(file_path=file_path) 44 | else: 45 | st.error("You can only upload .pdf files!") 46 | 47 | # 1. Load and split documents 48 | documents = loader.load_and_split() 49 | 50 | # 2. Update the metadata: add document number to metadata 51 | for i in range(len(documents)): 52 | documents[i].metadata = { 53 | "source": documents[i].metadata["source"], 54 | "doc_number": i, 55 | } 56 | 57 | return documents 58 | 59 | 60 | def delte_temp_files(): 61 | """delete temp files from TMP_DIR""" 62 | files = glob.glob(TMP_DIR.as_posix() + "/*") 63 | for f in files: 64 | try: 65 | os.remove(f) 66 | except: 67 | pass 68 | 69 | 70 | def save_uploaded_file(uploaded_file): 71 | """Save the uploaded file (output of the Streamlit File Uploader widget) to TMP_DIR.""" 72 | 73 | temp_file_path = "" 74 | try: 75 | temp_file_path = os.path.join(TMP_DIR.as_posix(), uploaded_file.name) 76 | with open(temp_file_path, "wb") as temp_file: 77 | temp_file.write(uploaded_file.read()) 78 | return temp_file_path 79 | except Exception as error: 80 | st.error(f"An error occured: {error}") 81 | 82 | return temp_file_path 83 | 84 | 85 | def tiktoken_tokens(documents, model="gpt-3.5-turbo-0125"): 86 | """Use tiktoken (tokeniser for OpenAI models) to return a list of token length per document.""" 87 | 88 | # Get the encoding used by the model. 89 | encoding = tiktoken.encoding_for_model(model) 90 | 91 | # Calculate the token length of documents 92 | tokens_length = [len(encoding.encode(doc)) for doc in documents] 93 | 94 | return tokens_length 95 | 96 | 97 | def select_embeddings_model(LLM_service="OpenAI"): 98 | """Select the Embeddings model: OpenAIEmbeddings or GoogleGenerativeAIEmbeddings.""" 99 | 100 | if LLM_service == "OpenAI": 101 | embeddings = OpenAIEmbeddings(api_key=st.session_state.openai_api_key) 102 | 103 | if LLM_service == "Google": 104 | embeddings = GoogleGenerativeAIEmbeddings( 105 | model="models/embedding-001", google_api_key=st.session_state.google_api_key 106 | ) 107 | 108 | return embeddings 109 | 110 | 111 | def create_vectorstore(embeddings, documents): 112 | """Create a Faiss vector database.""" 113 | vector_store = FAISS.from_documents(documents=documents, embedding=embeddings) 114 | 115 | return vector_store 116 | 117 | 118 | def Vectorstore_backed_retriever( 119 | vectorstore, search_type="similarity", k=4, score_threshold=None 120 | ): 121 | """create a vectorsore-backed retriever 122 | Parameters: 123 | search_type: Defines the type of search that the Retriever should perform. 124 | Can be "similarity" (default), "mmr", or "similarity_score_threshold" 125 | k: number of documents to return (Default: 4) 126 | score_threshold: Minimum relevance threshold for similarity_score_threshold (default=None) 127 | """ 128 | search_kwargs = {} 129 | if k is not None: 130 | search_kwargs["k"] = k 131 | if score_threshold is not None: 132 | search_kwargs["score_threshold"] = score_threshold 133 | 134 | retriever = vectorstore.as_retriever( 135 | search_type=search_type, search_kwargs=search_kwargs 136 | ) 137 | return retriever 138 | 139 | 140 | def CohereRerank_retriever( 141 | base_retriever, cohere_api_key, cohere_model="rerank-multilingual-v2.0", top_n=4 142 | ): 143 | """Build a ContextualCompressionRetriever using Cohere Rerank endpoint to reorder the results based on relevance. 144 | Parameters: 145 | base_retriever: a Vectorstore-backed retriever 146 | cohere_api_key: the Cohere API key 147 | cohere_model: The Cohere model can be either 'rerank-english-v2.0' or 'rerank-multilingual-v2.0', with the latter being the default. 148 | top_n: top n results returned by Cohere rerank, default = 4. 149 | """ 150 | 151 | compressor = CohereRerank( 152 | cohere_api_key=cohere_api_key, model=cohere_model, top_n=top_n 153 | ) 154 | 155 | retriever_Cohere = ContextualCompressionRetriever( 156 | base_compressor=compressor, base_retriever=base_retriever 157 | ) 158 | return retriever_Cohere 159 | 160 | 161 | def retrieval_main(): 162 | """Create a Langchain retrieval, which includes document loaders to upload the resume, 163 | embeddings to create a numerical representation of the text, FAISS vector database to store the embeddings, 164 | and CohereRerank retriever to find the most relevant documents. 165 | """ 166 | 167 | # 1. Delete old temp files from TMP directory. 168 | delte_temp_files() 169 | 170 | if st.session_state.uploaded_file is not None: 171 | # 2. Save uploaded_file to TMP directory. 172 | saved_file_path = save_uploaded_file(st.session_state.uploaded_file) 173 | 174 | # 3. Load documents with Langchain loaders 175 | documents = langchain_document_loader(saved_file_path) 176 | st.session_state.documents = documents 177 | 178 | # 4. Embeddings 179 | embeddings = select_embeddings_model(st.session_state.LLM_provider) 180 | 181 | # 5. Create a Faiss vector database 182 | try: 183 | st.session_state.vector_store = create_vectorstore( 184 | embeddings=embeddings, documents=documents 185 | ) 186 | 187 | # 6. Create CohereRerank retriever 188 | base_retriever = Vectorstore_backed_retriever( 189 | st.session_state.vector_store, "similarity", k=min(4, len(documents)) 190 | ) 191 | st.session_state.retriever = CohereRerank_retriever( 192 | base_retriever=base_retriever, 193 | cohere_api_key=st.session_state.cohere_api_key, 194 | cohere_model="rerank-multilingual-v2.0", 195 | top_n=min(2, len(documents)), 196 | ) 197 | except Exception as error: 198 | st.error(f"An error occured:\n {error}") 199 | 200 | else: 201 | st.error("Please upload a resume!") 202 | st.stop() 203 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.9.3 2 | aiosignal==1.3.1 3 | altair==5.2.0 4 | annotated-types==0.6.0 5 | anyio==4.3.0 6 | async-timeout==4.0.3 7 | attrs==23.2.0 8 | backoff==2.2.1 9 | blinker==1.7.0 10 | cachetools==5.3.3 11 | certifi==2024.2.2 12 | cffi==1.16.0 13 | charset-normalizer==3.3.2 14 | click==8.1.7 15 | cohere==4.56 16 | colorama==0.4.6 17 | cryptography==42.0.5 18 | dataclasses-json==0.6.4 19 | distro==1.9.0 20 | exceptiongroup==1.2.0 21 | faiss-cpu==1.8.0 22 | fastavro==1.9.4 23 | frozenlist==1.4.1 24 | gitdb==4.0.11 25 | GitPython==3.1.42 26 | google-ai-generativelanguage==0.4.0 27 | google-api-core==2.17.1 28 | google-auth==2.28.2 29 | google-generativeai==0.3.2 30 | googleapis-common-protos==1.63.0 31 | greenlet==3.0.3 32 | grpcio==1.62.1 33 | grpcio-status==1.62.1 34 | h11==0.14.0 35 | httpcore==1.0.4 36 | httpx==0.27.0 37 | idna==3.6 38 | importlib-metadata==6.11.0 39 | Jinja2==3.1.3 40 | jsonpatch==1.33 41 | jsonpointer==2.4 42 | jsonschema==4.21.1 43 | jsonschema-specifications==2023.12.1 44 | langchain==0.1.12 45 | langchain-community==0.0.28 46 | langchain-core==0.1.32 47 | langchain-google-genai==0.0.6 48 | langchain-openai==0.0.2.post1 49 | langchain-text-splitters==0.0.1 50 | langsmith==0.1.28 51 | Markdown==3.6 52 | markdown-it-py==3.0.0 53 | MarkupSafe==2.1.5 54 | marshmallow==3.21.1 55 | mdurl==0.1.2 56 | multidict==6.0.5 57 | mypy-extensions==1.0.0 58 | numpy==1.26.4 59 | openai==1.14.1 60 | orjson==3.9.15 61 | packaging==23.2 62 | pandas==2.2.1 63 | pdfminer.six==20231228 64 | pillow==10.2.0 65 | proto-plus==1.23.0 66 | protobuf==4.25.3 67 | pyarrow==15.0.2 68 | pyasn1==0.5.1 69 | pyasn1-modules==0.3.0 70 | pycparser==2.21 71 | pydantic==2.6.4 72 | pydantic_core==2.16.3 73 | pydeck==0.8.1b0 74 | Pygments==2.17.2 75 | python-dateutil==2.9.0.post0 76 | python-dotenv==1.0.1 77 | pytz==2024.1 78 | PyYAML==6.0.1 79 | referencing==0.34.0 80 | regex==2023.12.25 81 | requests==2.31.0 82 | rich==13.7.1 83 | rpds-py==0.18.0 84 | rsa==4.9 85 | six==1.16.0 86 | smmap==5.0.1 87 | sniffio==1.3.1 88 | SQLAlchemy==2.0.28 89 | streamlit==1.28.0 90 | tenacity==8.2.3 91 | tiktoken==0.5.2 92 | toml==0.10.2 93 | toolz==0.12.1 94 | tornado==6.4 95 | tqdm==4.66.2 96 | typing-inspect==0.9.0 97 | typing_extensions==4.10.0 98 | tzdata==2024.1 99 | tzlocal==5.2 100 | urllib3==2.2.1 101 | validators==0.22.0 102 | watchdog==4.0.0 103 | yarl==1.9.4 104 | zipp==3.18.1 105 | --------------------------------------------------------------------------------