├── .gitignore ├── README.md ├── advanced_sql ├── 1_manipulate_tables.sql ├── 2_dates.sql ├── 3_cases.sql ├── 4_subqueries_CTE.sql ├── 5_unions.sql ├── problem_6.sql ├── problem_7.sql └── problem_8.sql ├── assets ├── 1_top_paying_roles.png └── 2_top_paying_roles_skills.png ├── project_sql ├── 1_top_paying_jobs.sql ├── 2_top_paying_job_skills.sql ├── 3_top_demanded_skills.sql ├── 4_top_paying_skills.sql └── 5_optimal_skills.sql └── sql_load ├── 1_create_database.sql ├── 2_create_tables.sql └── 3_modify_tables.sql /.gitignore: -------------------------------------------------------------------------------- 1 | /.DS_Store 2 | /.vscode 3 | /csv_files -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 📊 Dive into the data job market! Focusing on data analyst roles, this project explores 💰 top-paying jobs, 🔥 in-demand skills, and 📈 where high demand meets high salary in data analytics. 3 | 4 | 🔍 SQL queries? Check them out here: [project_sql folder](/project_sql/) 5 | 6 | # Background 7 | Driven by a quest to navigate the data analyst job market more effectively, this project was born from a desire to pinpoint top-paid and in-demand skills, streamlining others work to find optimal jobs. 8 | 9 | Data hails from my [SQL Course](https://lukebarousse.com/sql). It's packed with insights on job titles, salaries, locations, and essential skills. 10 | 11 | ### The questions I wanted to answer through my SQL queries were: 12 | 13 | 1. What are the top-paying data analyst jobs? 14 | 2. What skills are required for these top-paying jobs? 15 | 3. What skills are most in demand for data analysts? 16 | 4. Which skills are associated with higher salaries? 17 | 5. What are the most optimal skills to learn? 18 | 19 | # Tools I Used 20 | For my deep dive into the data analyst job market, I harnessed the power of several key tools: 21 | 22 | - **SQL:** The backbone of my analysis, allowing me to query the database and unearth critical insights. 23 | - **PostgreSQL:** The chosen database management system, ideal for handling the job posting data. 24 | - **Visual Studio Code:** My go-to for database management and executing SQL queries. 25 | - **Git & GitHub:** Essential for version control and sharing my SQL scripts and analysis, ensuring collaboration and project tracking. 26 | 27 | # The Analysis 28 | Each query for this project aimed at investigating specific aspects of the data analyst job market. Here’s how I approached each question: 29 | 30 | ### 1. Top Paying Data Analyst Jobs 31 | To identify the highest-paying roles, I filtered data analyst positions by average yearly salary and location, focusing on remote jobs. This query highlights the high paying opportunities in the field. 32 | 33 | ```sql 34 | SELECT 35 | job_id, 36 | job_title, 37 | job_location, 38 | job_schedule_type, 39 | salary_year_avg, 40 | job_posted_date, 41 | name AS company_name 42 | FROM 43 | job_postings_fact 44 | LEFT JOIN company_dim ON job_postings_fact.company_id = company_dim.company_id 45 | WHERE 46 | job_title_short = 'Data Analyst' AND 47 | job_location = 'Anywhere' AND 48 | salary_year_avg IS NOT NULL 49 | ORDER BY 50 | salary_year_avg DESC 51 | LIMIT 10; 52 | ``` 53 | Here's the breakdown of the top data analyst jobs in 2023: 54 | - **Wide Salary Range:** Top 10 paying data analyst roles span from $184,000 to $650,000, indicating significant salary potential in the field. 55 | - **Diverse Employers:** Companies like SmartAsset, Meta, and AT&T are among those offering high salaries, showing a broad interest across different industries. 56 | - **Job Title Variety:** There's a high diversity in job titles, from Data Analyst to Director of Analytics, reflecting varied roles and specializations within data analytics. 57 | 58 | ![Top Paying Roles](assets/1_top_paying_roles.png) 59 | *Bar graph visualizing the salary for the top 10 salaries for data analysts; ChatGPT generated this graph from my SQL query results* 60 | 61 | ### 2. Skills for Top Paying Jobs 62 | To understand what skills are required for the top-paying jobs, I joined the job postings with the skills data, providing insights into what employers value for high-compensation roles. 63 | ```sql 64 | WITH top_paying_jobs AS ( 65 | SELECT 66 | job_id, 67 | job_title, 68 | salary_year_avg, 69 | name AS company_name 70 | FROM 71 | job_postings_fact 72 | LEFT JOIN company_dim ON job_postings_fact.company_id = company_dim.company_id 73 | WHERE 74 | job_title_short = 'Data Analyst' AND 75 | job_location = 'Anywhere' AND 76 | salary_year_avg IS NOT NULL 77 | ORDER BY 78 | salary_year_avg DESC 79 | LIMIT 10 80 | ) 81 | 82 | SELECT 83 | top_paying_jobs.*, 84 | skills 85 | FROM top_paying_jobs 86 | INNER JOIN skills_job_dim ON top_paying_jobs.job_id = skills_job_dim.job_id 87 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 88 | ORDER BY 89 | salary_year_avg DESC; 90 | ``` 91 | Here's the breakdown of the most demanded skills for the top 10 highest paying data analyst jobs in 2023: 92 | - **SQL** is leading with a bold count of 8. 93 | - **Python** follows closely with a bold count of 7. 94 | - **Tableau** is also highly sought after, with a bold count of 6. 95 | Other skills like **R**, **Snowflake**, **Pandas**, and **Excel** show varying degrees of demand. 96 | 97 | ![Top Paying Skills](assets/2_top_paying_roles_skills.png) 98 | *Bar graph visualizing the count of skills for the top 10 paying jobs for data analysts; ChatGPT generated this graph from my SQL query results* 99 | 100 | ### 3. In-Demand Skills for Data Analysts 101 | 102 | This query helped identify the skills most frequently requested in job postings, directing focus to areas with high demand. 103 | 104 | ```sql 105 | SELECT 106 | skills, 107 | COUNT(skills_job_dim.job_id) AS demand_count 108 | FROM job_postings_fact 109 | INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id 110 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 111 | WHERE 112 | job_title_short = 'Data Analyst' 113 | AND job_work_from_home = True 114 | GROUP BY 115 | skills 116 | ORDER BY 117 | demand_count DESC 118 | LIMIT 5; 119 | ``` 120 | Here's the breakdown of the most demanded skills for data analysts in 2023 121 | - **SQL** and **Excel** remain fundamental, emphasizing the need for strong foundational skills in data processing and spreadsheet manipulation. 122 | - **Programming** and **Visualization Tools** like **Python**, **Tableau**, and **Power BI** are essential, pointing towards the increasing importance of technical skills in data storytelling and decision support. 123 | 124 | | Skills | Demand Count | 125 | |----------|--------------| 126 | | SQL | 7291 | 127 | | Excel | 4611 | 128 | | Python | 4330 | 129 | | Tableau | 3745 | 130 | | Power BI | 2609 | 131 | 132 | *Table of the demand for the top 5 skills in data analyst job postings* 133 | 134 | ### 4. Skills Based on Salary 135 | Exploring the average salaries associated with different skills revealed which skills are the highest paying. 136 | ```sql 137 | SELECT 138 | skills, 139 | ROUND(AVG(salary_year_avg), 0) AS avg_salary 140 | FROM job_postings_fact 141 | INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id 142 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 143 | WHERE 144 | job_title_short = 'Data Analyst' 145 | AND salary_year_avg IS NOT NULL 146 | AND job_work_from_home = True 147 | GROUP BY 148 | skills 149 | ORDER BY 150 | avg_salary DESC 151 | LIMIT 25; 152 | ``` 153 | Here's a breakdown of the results for top paying skills for Data Analysts: 154 | - **High Demand for Big Data & ML Skills:** Top salaries are commanded by analysts skilled in big data technologies (PySpark, Couchbase), machine learning tools (DataRobot, Jupyter), and Python libraries (Pandas, NumPy), reflecting the industry's high valuation of data processing and predictive modeling capabilities. 155 | - **Software Development & Deployment Proficiency:** Knowledge in development and deployment tools (GitLab, Kubernetes, Airflow) indicates a lucrative crossover between data analysis and engineering, with a premium on skills that facilitate automation and efficient data pipeline management. 156 | - **Cloud Computing Expertise:** Familiarity with cloud and data engineering tools (Elasticsearch, Databricks, GCP) underscores the growing importance of cloud-based analytics environments, suggesting that cloud proficiency significantly boosts earning potential in data analytics. 157 | 158 | | Skills | Average Salary ($) | 159 | |---------------|-------------------:| 160 | | pyspark | 208,172 | 161 | | bitbucket | 189,155 | 162 | | couchbase | 160,515 | 163 | | watson | 160,515 | 164 | | datarobot | 155,486 | 165 | | gitlab | 154,500 | 166 | | swift | 153,750 | 167 | | jupyter | 152,777 | 168 | | pandas | 151,821 | 169 | | elasticsearch | 145,000 | 170 | 171 | *Table of the average salary for the top 10 paying skills for data analysts* 172 | 173 | ### 5. Most Optimal Skills to Learn 174 | 175 | Combining insights from demand and salary data, this query aimed to pinpoint skills that are both in high demand and have high salaries, offering a strategic focus for skill development. 176 | 177 | ```sql 178 | SELECT 179 | skills_dim.skill_id, 180 | skills_dim.skills, 181 | COUNT(skills_job_dim.job_id) AS demand_count, 182 | ROUND(AVG(job_postings_fact.salary_year_avg), 0) AS avg_salary 183 | FROM job_postings_fact 184 | INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id 185 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 186 | WHERE 187 | job_title_short = 'Data Analyst' 188 | AND salary_year_avg IS NOT NULL 189 | AND job_work_from_home = True 190 | GROUP BY 191 | skills_dim.skill_id 192 | HAVING 193 | COUNT(skills_job_dim.job_id) > 10 194 | ORDER BY 195 | avg_salary DESC, 196 | demand_count DESC 197 | LIMIT 25; 198 | ``` 199 | 200 | | Skill ID | Skills | Demand Count | Average Salary ($) | 201 | |----------|------------|--------------|-------------------:| 202 | | 8 | go | 27 | 115,320 | 203 | | 234 | confluence | 11 | 114,210 | 204 | | 97 | hadoop | 22 | 113,193 | 205 | | 80 | snowflake | 37 | 112,948 | 206 | | 74 | azure | 34 | 111,225 | 207 | | 77 | bigquery | 13 | 109,654 | 208 | | 76 | aws | 32 | 108,317 | 209 | | 4 | java | 17 | 106,906 | 210 | | 194 | ssis | 12 | 106,683 | 211 | | 233 | jira | 20 | 104,918 | 212 | 213 | *Table of the most optimal skills for data analyst sorted by salary* 214 | 215 | Here's a breakdown of the most optimal skills for Data Analysts in 2023: 216 | - **High-Demand Programming Languages:** Python and R stand out for their high demand, with demand counts of 236 and 148 respectively. Despite their high demand, their average salaries are around $101,397 for Python and $100,499 for R, indicating that proficiency in these languages is highly valued but also widely available. 217 | - **Cloud Tools and Technologies:** Skills in specialized technologies such as Snowflake, Azure, AWS, and BigQuery show significant demand with relatively high average salaries, pointing towards the growing importance of cloud platforms and big data technologies in data analysis. 218 | - **Business Intelligence and Visualization Tools:** Tableau and Looker, with demand counts of 230 and 49 respectively, and average salaries around $99,288 and $103,795, highlight the critical role of data visualization and business intelligence in deriving actionable insights from data. 219 | - **Database Technologies:** The demand for skills in traditional and NoSQL databases (Oracle, SQL Server, NoSQL) with average salaries ranging from $97,786 to $104,534, reflects the enduring need for data storage, retrieval, and management expertise. 220 | 221 | # What I Learned 222 | 223 | Throughout this adventure, I've turbocharged my SQL toolkit with some serious firepower: 224 | 225 | - **🧩 Complex Query Crafting:** Mastered the art of advanced SQL, merging tables like a pro and wielding WITH clauses for ninja-level temp table maneuvers. 226 | - **📊 Data Aggregation:** Got cozy with GROUP BY and turned aggregate functions like COUNT() and AVG() into my data-summarizing sidekicks. 227 | - **💡 Analytical Wizardry:** Leveled up my real-world puzzle-solving skills, turning questions into actionable, insightful SQL queries. 228 | 229 | # Conclusions 230 | 231 | ### Insights 232 | From the analysis, several general insights emerged: 233 | 234 | 1. **Top-Paying Data Analyst Jobs**: The highest-paying jobs for data analysts that allow remote work offer a wide range of salaries, the highest at $650,000! 235 | 2. **Skills for Top-Paying Jobs**: High-paying data analyst jobs require advanced proficiency in SQL, suggesting it’s a critical skill for earning a top salary. 236 | 3. **Most In-Demand Skills**: SQL is also the most demanded skill in the data analyst job market, thus making it essential for job seekers. 237 | 4. **Skills with Higher Salaries**: Specialized skills, such as SVN and Solidity, are associated with the highest average salaries, indicating a premium on niche expertise. 238 | 5. **Optimal Skills for Job Market Value**: SQL leads in demand and offers for a high average salary, positioning it as one of the most optimal skills for data analysts to learn to maximize their market value. 239 | 240 | ### Closing Thoughts 241 | 242 | This project enhanced my SQL skills and provided valuable insights into the data analyst job market. The findings from the analysis serve as a guide to prioritizing skill development and job search efforts. Aspiring data analysts can better position themselves in a competitive job market by focusing on high-demand, high-salary skills. This exploration highlights the importance of continuous learning and adaptation to emerging trends in the field of data analytics. -------------------------------------------------------------------------------- /advanced_sql/1_manipulate_tables.sql: -------------------------------------------------------------------------------- 1 | CREATE TABLE job_applied 2 | ( 3 | job_id INT, 4 | application_sent_date DATE, 5 | custom_resume BOOLEAN, 6 | resume_file_name VARCHAR(255), 7 | cover_letter_sent BOOLEAN, 8 | cover_letter_file_name VARCHAR(255), 9 | status VARCHAR(50) 10 | ); 11 | 12 | INSERT INTO job_applied 13 | (job_id, 14 | application_sent_date, 15 | custom_resume, 16 | resume_file_name, 17 | cover_letter_sent, 18 | cover_letter_file_name, 19 | status) 20 | VALUES (1, 21 | '2024-02-01', 22 | true, 23 | 'resume_01.pdf', 24 | true, 25 | 'cover_letter_01.pdf', 26 | 'submitted'), 27 | (2, 28 | '2024-02-02', 29 | false, 30 | 'resume_02.pdf', 31 | false, 32 | NULL, 33 | 'interview scheduled'), 34 | (3, 35 | '2024-02-03', 36 | true, 37 | 'resume_03.pdf', 38 | true, 39 | 'cover_letter_03.pdf', 40 | 'ghosted'), 41 | (4, 42 | '2024-02-04', 43 | true, 44 | 'resume_04.pdf', 45 | false, 46 | NULL, 47 | 'submitted'), 48 | (5, 49 | '2024-02-05', 50 | false, 51 | 'resume_05.pdf', 52 | true, 53 | 'cover_letter_05.pdf', 54 | 'rejected'); 55 | 56 | ALTER TABLE job_applied 57 | ADD contact VARCHAR(50); 58 | 59 | UPDATE job_applied 60 | SET contact = 'Erlich Bachman' 61 | WHERE job_id = 1; 62 | 63 | UPDATE job_applied 64 | SET contact = 'Dinesh Chugtai' 65 | WHERE job_id = 2; 66 | 67 | UPDATE job_applied 68 | SET contact = 'Bertram Gilfoyle' 69 | WHERE job_id = 3; 70 | 71 | UPDATE job_applied 72 | SET contact = 'Jian Yang' 73 | WHERE job_id = 4; 74 | 75 | UPDATE job_applied 76 | SET contact = 'Big Head' 77 | WHERE job_id = 5; 78 | 79 | ALTER TABLE job_applied 80 | RENAME COLUMN contact TO contact_name; 81 | 82 | ALTER TABLE job_applied 83 | ALTER COLUMN contact_name TYPE TEXT; 84 | 85 | ALTER TABLE job_applied 86 | DROP COLUMN contact_name; 87 | 88 | DROP TABLE job_applied; -------------------------------------------------------------------------------- /advanced_sql/2_dates.sql: -------------------------------------------------------------------------------- 1 | SELECT 2 | job_title_short, 3 | job_location, 4 | job_posted_date AT TIME ZONE 'UTC' AT TIME ZONE 'EST' AS date_time 5 | FROM 6 | job_postings_fact; 7 | 8 | SELECT 9 | job_title_short, 10 | job_location, 11 | EXTRACT(MONTH FROM job_posted_date) AS job_posted_month, 12 | EXTRACT(YEAR FROM job_posted_date) AS job_posted_year 13 | FROM 14 | job_postings_fact; 15 | 16 | SELECT 17 | COUNT(job_id) AS job_posted_count, 18 | EXTRACT(MONTH FROM job_posted_date) AS job_posted_month 19 | FROM 20 | job_postings_fact 21 | WHERE 22 | job_title_short = 'Data Analyst' 23 | GROUP BY 24 | job_posted_month 25 | ORDER BY 26 | job_posted_count DESC; -------------------------------------------------------------------------------- /advanced_sql/3_cases.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Label new column as follows based on job_location: 3 | - 'Anywhere' jobs as 'Remote' 4 | - 'New York, NY' jobs as 'Local' 5 | - Otherwise 'Onsite' 6 | */ 7 | 8 | SELECT 9 | COUNT(job_id) AS number_of_jobs, 10 | CASE 11 | WHEN job_location = 'Anywhere' THEN 'Remote' 12 | WHEN job_location = 'New York, NY' THEN 'Local' 13 | ELSE 'Onsite' 14 | END AS location_category 15 | FROM 16 | job_postings_fact 17 | WHERE 18 | job_title_short = 'Data Analyst' 19 | GROUP BY 20 | location_category 21 | ORDER BY 22 | number_of_jobs DESC; 23 | 24 | -------------------------------------------------------------------------------- /advanced_sql/4_subqueries_CTE.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Look at companies that don’t require a degree 3 | - Degree requirements are in the job_posting_fact table 4 | - Use subquery to filter this in the company_dim table for company_names 5 | - Order by the company name alphabetically 6 | */ 7 | SELECT 8 | company_id, 9 | name AS company_name 10 | FROM 11 | company_dim 12 | WHERE company_id IN ( 13 | SELECT 14 | company_id 15 | FROM 16 | job_postings_fact 17 | WHERE 18 | job_no_degree_mention = true 19 | ORDER BY 20 | company_id 21 | ) 22 | ORDER BY 23 | name ASC 24 | 25 | /* 26 | Find the companies that have the most job openings. 27 | - Get the total number of job postings per company id (job_posting_fact) 28 | - Return the total number of jobs with the company name (company_dim) 29 | */ 30 | 31 | WITH company_job_count AS ( 32 | SELECT 33 | company_id, 34 | COUNT(*) AS total_jobs 35 | FROM 36 | job_postings_fact 37 | GROUP BY 38 | company_id 39 | ) 40 | 41 | SELECT 42 | company_dim.name AS company_name, 43 | company_job_count.total_jobs 44 | FROM 45 | company_dim 46 | LEFT JOIN company_job_count ON company_job_count.company_id = company_dim.company_id 47 | ORDER BY 48 | total_jobs DESC -------------------------------------------------------------------------------- /advanced_sql/5_unions.sql: -------------------------------------------------------------------------------- 1 | -- Get jobs and companies from January 2 | SELECT 3 | job_title_short, 4 | company_id, 5 | job_location 6 | FROM 7 | january_jobs 8 | 9 | UNION ALL 10 | 11 | -- Get jobs and companies from February 12 | SELECT 13 | job_title_short, 14 | company_id, 15 | job_location 16 | FROM 17 | february_jobs 18 | 19 | UNION ALL -- combine another table 20 | 21 | -- Get jobs and companies from March 22 | SELECT 23 | job_title_short, 24 | company_id, 25 | job_location 26 | FROM 27 | march_jobs -------------------------------------------------------------------------------- /advanced_sql/problem_6.sql: -------------------------------------------------------------------------------- 1 | -- For January 2 | CREATE TABLE january_jobs AS 3 | SELECT * 4 | FROM job_postings_fact 5 | WHERE EXTRACT(MONTH FROM job_posted_date) = 1; 6 | 7 | -- For February 8 | CREATE TABLE february_jobs AS 9 | SELECT * 10 | FROM job_postings_fact 11 | WHERE EXTRACT(MONTH FROM job_posted_date) = 2; 12 | 13 | -- For March 14 | CREATE TABLE march_jobs AS 15 | SELECT * 16 | FROM job_postings_fact 17 | WHERE EXTRACT(MONTH FROM job_posted_date) = 3; 18 | -------------------------------------------------------------------------------- /advanced_sql/problem_7.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Find the count of the number of remote job postings per skill 3 | - Display the top 5 skills by their demand in remote jobs 4 | - Include skill ID, name, and count of postings requiring the skill 5 | */ 6 | 7 | WITH remote_job_skills AS ( 8 | SELECT 9 | skill_id, 10 | COUNT(*) AS skill_count 11 | FROM 12 | skills_job_dim AS skills_to_job 13 | INNER JOIN job_postings_fact AS job_postings ON job_postings.job_id = skills_to_job.job_id 14 | WHERE 15 | job_postings.job_work_from_home = True AND 16 | job_postings.job_title_short = 'Data Analyst' 17 | GROUP BY 18 | skill_id 19 | ) 20 | 21 | SELECT 22 | skills.skill_id, 23 | skills AS skill_name, 24 | skill_count 25 | FROM remote_job_skills 26 | INNER JOIN skills_dim AS skills ON skills.skill_id = remote_job_skills.skill_id 27 | ORDER BY 28 | skill_count DESC 29 | LIMIT 5; -------------------------------------------------------------------------------- /advanced_sql/problem_8.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Find job postings from the first quarter that have a salary greater than $70K 3 | - Combine job posting tables from the first quarter of 2023 (Jan-Mar) 4 | - Gets job postings with an average yearly salary > $70,000 5 | - Filter for Data Analyst Jobs and order by salary 6 | */ 7 | 8 | SELECT 9 | job_title_short, 10 | job_location, 11 | job_via, 12 | job_posted_date::DATE, 13 | salary_year_avg 14 | FROM ( 15 | SELECT * 16 | FROM january_jobs 17 | UNION ALL 18 | SELECT * 19 | FROM february_jobs 20 | UNION ALL 21 | SELECT * 22 | FROM march_jobs 23 | ) AS quarter1_job_postings 24 | WHERE 25 | salary_year_avg > 70000 AND 26 | job_title_short = 'Data Analyst' 27 | ORDER BY 28 | salary_year_avg DESC -------------------------------------------------------------------------------- /assets/1_top_paying_roles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lukebarousse/SQL_Project_Data_Job_Analysis/ff272240ac3d7952b9c512e308b213a8b2a263fe/assets/1_top_paying_roles.png -------------------------------------------------------------------------------- /assets/2_top_paying_roles_skills.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lukebarousse/SQL_Project_Data_Job_Analysis/ff272240ac3d7952b9c512e308b213a8b2a263fe/assets/2_top_paying_roles_skills.png -------------------------------------------------------------------------------- /project_sql/1_top_paying_jobs.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Question: What are the top-paying data analyst jobs? 3 | - Identify the top 10 highest-paying Data Analyst roles that are available remotely 4 | - Focuses on job postings with specified salaries (remove nulls) 5 | - BONUS: Include company names of top 10 roles 6 | - Why? Highlight the top-paying opportunities for Data Analysts, offering insights into employment options and location flexibility. 7 | */ 8 | 9 | SELECT 10 | job_id, 11 | job_title, 12 | job_location, 13 | job_schedule_type, 14 | salary_year_avg, 15 | job_posted_date, 16 | name AS company_name 17 | FROM 18 | job_postings_fact 19 | LEFT JOIN company_dim ON job_postings_fact.company_id = company_dim.company_id 20 | WHERE 21 | job_title_short = 'Data Analyst' AND 22 | job_location = 'Anywhere' AND 23 | salary_year_avg IS NOT NULL 24 | ORDER BY 25 | salary_year_avg DESC 26 | LIMIT 10; 27 | 28 | /* 29 | Here's the breakdown of the top data analyst jobs in 2023: 30 | Wide Salary Range: Top 10 paying data analyst roles span from $184,000 to $650,000, indicating significant salary potential in the field. 31 | Diverse Employers: Companies like SmartAsset, Meta, and AT&T are among those offering high salaries, showing a broad interest across different industries. 32 | Job Title Variety: There's a high diversity in job titles, from Data Analyst to Director of Analytics, reflecting varied roles and specializations within data analytics. 33 | 34 | RESULTS 35 | ======= 36 | [ 37 | { 38 | "job_id": 226942, 39 | "job_title": "Data Analyst", 40 | "job_location": "Anywhere", 41 | "job_schedule_type": "Full-time", 42 | "salary_year_avg": "650000.0", 43 | "job_posted_date": "2023-02-20 15:13:33", 44 | "company_name": "Mantys" 45 | }, 46 | { 47 | "job_id": 547382, 48 | "job_title": "Director of Analytics", 49 | "job_location": "Anywhere", 50 | "job_schedule_type": "Full-time", 51 | "salary_year_avg": "336500.0", 52 | "job_posted_date": "2023-08-23 12:04:42", 53 | "company_name": "Meta" 54 | }, 55 | { 56 | "job_id": 552322, 57 | "job_title": "Associate Director- Data Insights", 58 | "job_location": "Anywhere", 59 | "job_schedule_type": "Full-time", 60 | "salary_year_avg": "255829.5", 61 | "job_posted_date": "2023-06-18 16:03:12", 62 | "company_name": "AT&T" 63 | }, 64 | { 65 | "job_id": 99305, 66 | "job_title": "Data Analyst, Marketing", 67 | "job_location": "Anywhere", 68 | "job_schedule_type": "Full-time", 69 | "salary_year_avg": "232423.0", 70 | "job_posted_date": "2023-12-05 20:00:40", 71 | "company_name": "Pinterest Job Advertisements" 72 | }, 73 | { 74 | "job_id": 1021647, 75 | "job_title": "Data Analyst (Hybrid/Remote)", 76 | "job_location": "Anywhere", 77 | "job_schedule_type": "Full-time", 78 | "salary_year_avg": "217000.0", 79 | "job_posted_date": "2023-01-17 00:17:23", 80 | "company_name": "Uclahealthcareers" 81 | }, 82 | { 83 | "job_id": 168310, 84 | "job_title": "Principal Data Analyst (Remote)", 85 | "job_location": "Anywhere", 86 | "job_schedule_type": "Full-time", 87 | "salary_year_avg": "205000.0", 88 | "job_posted_date": "2023-08-09 11:00:01", 89 | "company_name": "SmartAsset" 90 | }, 91 | { 92 | "job_id": 731368, 93 | "job_title": "Director, Data Analyst - HYBRID", 94 | "job_location": "Anywhere", 95 | "job_schedule_type": "Full-time", 96 | "salary_year_avg": "189309.0", 97 | "job_posted_date": "2023-12-07 15:00:13", 98 | "company_name": "Inclusively" 99 | }, 100 | { 101 | "job_id": 310660, 102 | "job_title": "Principal Data Analyst, AV Performance Analysis", 103 | "job_location": "Anywhere", 104 | "job_schedule_type": "Full-time", 105 | "salary_year_avg": "189000.0", 106 | "job_posted_date": "2023-01-05 00:00:25", 107 | "company_name": "Motional" 108 | }, 109 | { 110 | "job_id": 1749593, 111 | "job_title": "Principal Data Analyst", 112 | "job_location": "Anywhere", 113 | "job_schedule_type": "Full-time", 114 | "salary_year_avg": "186000.0", 115 | "job_posted_date": "2023-07-11 16:00:05", 116 | "company_name": "SmartAsset" 117 | }, 118 | { 119 | "job_id": 387860, 120 | "job_title": "ERM Data Analyst", 121 | "job_location": "Anywhere", 122 | "job_schedule_type": "Full-time", 123 | "salary_year_avg": "184000.0", 124 | "job_posted_date": "2023-06-09 08:01:04", 125 | "company_name": "Get It Recruit - Information Technology" 126 | } 127 | ] 128 | */ 129 | -------------------------------------------------------------------------------- /project_sql/2_top_paying_job_skills.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Question: What skills are required for the top-paying data analyst jobs? 3 | - Use the top 10 highest-paying Data Analyst jobs from first query 4 | - Add the specific skills required for these roles 5 | - Why? It provides a detailed look at which high-paying jobs demand certain skills, 6 | helping job seekers understand which skills to develop that align with top salaries 7 | */ 8 | 9 | WITH top_paying_jobs AS ( 10 | SELECT 11 | job_id, 12 | job_title, 13 | salary_year_avg, 14 | name AS company_name 15 | FROM 16 | job_postings_fact 17 | LEFT JOIN company_dim ON job_postings_fact.company_id = company_dim.company_id 18 | WHERE 19 | job_title_short = 'Data Analyst' AND 20 | job_location = 'Anywhere' AND 21 | salary_year_avg IS NOT NULL 22 | ORDER BY 23 | salary_year_avg DESC 24 | LIMIT 10 25 | ) 26 | 27 | SELECT 28 | top_paying_jobs.*, 29 | skills 30 | FROM top_paying_jobs 31 | INNER JOIN skills_job_dim ON top_paying_jobs.job_id = skills_job_dim.job_id 32 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 33 | ORDER BY 34 | salary_year_avg DESC; 35 | 36 | /* 37 | Here's the breakdown of the most demanded skills for data analysts in 2023, based on job postings: 38 | SQL is leading with a bold count of 8. 39 | Python follows closely with a bold count of 7. 40 | Tableau is also highly sought after, with a bold count of 6. 41 | Other skills like R, Snowflake, Pandas, and Excel show varying degrees of demand. 42 | 43 | [ 44 | { 45 | "job_id": 552322, 46 | "job_title": "Associate Director- Data Insights", 47 | "salary_year_avg": "255829.5", 48 | "company_name": "AT&T", 49 | "skills": "sql" 50 | }, 51 | { 52 | "job_id": 552322, 53 | "job_title": "Associate Director- Data Insights", 54 | "salary_year_avg": "255829.5", 55 | "company_name": "AT&T", 56 | "skills": "python" 57 | }, 58 | { 59 | "job_id": 552322, 60 | "job_title": "Associate Director- Data Insights", 61 | "salary_year_avg": "255829.5", 62 | "company_name": "AT&T", 63 | "skills": "r" 64 | }, 65 | { 66 | "job_id": 552322, 67 | "job_title": "Associate Director- Data Insights", 68 | "salary_year_avg": "255829.5", 69 | "company_name": "AT&T", 70 | "skills": "azure" 71 | }, 72 | { 73 | "job_id": 552322, 74 | "job_title": "Associate Director- Data Insights", 75 | "salary_year_avg": "255829.5", 76 | "company_name": "AT&T", 77 | "skills": "databricks" 78 | }, 79 | { 80 | "job_id": 552322, 81 | "job_title": "Associate Director- Data Insights", 82 | "salary_year_avg": "255829.5", 83 | "company_name": "AT&T", 84 | "skills": "aws" 85 | }, 86 | { 87 | "job_id": 552322, 88 | "job_title": "Associate Director- Data Insights", 89 | "salary_year_avg": "255829.5", 90 | "company_name": "AT&T", 91 | "skills": "pandas" 92 | }, 93 | { 94 | "job_id": 552322, 95 | "job_title": "Associate Director- Data Insights", 96 | "salary_year_avg": "255829.5", 97 | "company_name": "AT&T", 98 | "skills": "pyspark" 99 | }, 100 | { 101 | "job_id": 552322, 102 | "job_title": "Associate Director- Data Insights", 103 | "salary_year_avg": "255829.5", 104 | "company_name": "AT&T", 105 | "skills": "jupyter" 106 | }, 107 | { 108 | "job_id": 552322, 109 | "job_title": "Associate Director- Data Insights", 110 | "salary_year_avg": "255829.5", 111 | "company_name": "AT&T", 112 | "skills": "excel" 113 | }, 114 | { 115 | "job_id": 552322, 116 | "job_title": "Associate Director- Data Insights", 117 | "salary_year_avg": "255829.5", 118 | "company_name": "AT&T", 119 | "skills": "tableau" 120 | }, 121 | { 122 | "job_id": 552322, 123 | "job_title": "Associate Director- Data Insights", 124 | "salary_year_avg": "255829.5", 125 | "company_name": "AT&T", 126 | "skills": "power bi" 127 | }, 128 | { 129 | "job_id": 552322, 130 | "job_title": "Associate Director- Data Insights", 131 | "salary_year_avg": "255829.5", 132 | "company_name": "AT&T", 133 | "skills": "powerpoint" 134 | }, 135 | { 136 | "job_id": 99305, 137 | "job_title": "Data Analyst, Marketing", 138 | "salary_year_avg": "232423.0", 139 | "company_name": "Pinterest Job Advertisements", 140 | "skills": "sql" 141 | }, 142 | { 143 | "job_id": 99305, 144 | "job_title": "Data Analyst, Marketing", 145 | "salary_year_avg": "232423.0", 146 | "company_name": "Pinterest Job Advertisements", 147 | "skills": "python" 148 | }, 149 | { 150 | "job_id": 99305, 151 | "job_title": "Data Analyst, Marketing", 152 | "salary_year_avg": "232423.0", 153 | "company_name": "Pinterest Job Advertisements", 154 | "skills": "r" 155 | }, 156 | { 157 | "job_id": 99305, 158 | "job_title": "Data Analyst, Marketing", 159 | "salary_year_avg": "232423.0", 160 | "company_name": "Pinterest Job Advertisements", 161 | "skills": "hadoop" 162 | }, 163 | { 164 | "job_id": 99305, 165 | "job_title": "Data Analyst, Marketing", 166 | "salary_year_avg": "232423.0", 167 | "company_name": "Pinterest Job Advertisements", 168 | "skills": "tableau" 169 | }, 170 | { 171 | "job_id": 1021647, 172 | "job_title": "Data Analyst (Hybrid/Remote)", 173 | "salary_year_avg": "217000.0", 174 | "company_name": "Uclahealthcareers", 175 | "skills": "sql" 176 | }, 177 | { 178 | "job_id": 1021647, 179 | "job_title": "Data Analyst (Hybrid/Remote)", 180 | "salary_year_avg": "217000.0", 181 | "company_name": "Uclahealthcareers", 182 | "skills": "crystal" 183 | }, 184 | { 185 | "job_id": 1021647, 186 | "job_title": "Data Analyst (Hybrid/Remote)", 187 | "salary_year_avg": "217000.0", 188 | "company_name": "Uclahealthcareers", 189 | "skills": "oracle" 190 | }, 191 | { 192 | "job_id": 1021647, 193 | "job_title": "Data Analyst (Hybrid/Remote)", 194 | "salary_year_avg": "217000.0", 195 | "company_name": "Uclahealthcareers", 196 | "skills": "tableau" 197 | }, 198 | { 199 | "job_id": 1021647, 200 | "job_title": "Data Analyst (Hybrid/Remote)", 201 | "salary_year_avg": "217000.0", 202 | "company_name": "Uclahealthcareers", 203 | "skills": "flow" 204 | }, 205 | { 206 | "job_id": 168310, 207 | "job_title": "Principal Data Analyst (Remote)", 208 | "salary_year_avg": "205000.0", 209 | "company_name": "SmartAsset", 210 | "skills": "sql" 211 | }, 212 | { 213 | "job_id": 168310, 214 | "job_title": "Principal Data Analyst (Remote)", 215 | "salary_year_avg": "205000.0", 216 | "company_name": "SmartAsset", 217 | "skills": "python" 218 | }, 219 | { 220 | "job_id": 168310, 221 | "job_title": "Principal Data Analyst (Remote)", 222 | "salary_year_avg": "205000.0", 223 | "company_name": "SmartAsset", 224 | "skills": "go" 225 | }, 226 | { 227 | "job_id": 168310, 228 | "job_title": "Principal Data Analyst (Remote)", 229 | "salary_year_avg": "205000.0", 230 | "company_name": "SmartAsset", 231 | "skills": "snowflake" 232 | }, 233 | { 234 | "job_id": 168310, 235 | "job_title": "Principal Data Analyst (Remote)", 236 | "salary_year_avg": "205000.0", 237 | "company_name": "SmartAsset", 238 | "skills": "pandas" 239 | }, 240 | { 241 | "job_id": 168310, 242 | "job_title": "Principal Data Analyst (Remote)", 243 | "salary_year_avg": "205000.0", 244 | "company_name": "SmartAsset", 245 | "skills": "numpy" 246 | }, 247 | { 248 | "job_id": 168310, 249 | "job_title": "Principal Data Analyst (Remote)", 250 | "salary_year_avg": "205000.0", 251 | "company_name": "SmartAsset", 252 | "skills": "excel" 253 | }, 254 | { 255 | "job_id": 168310, 256 | "job_title": "Principal Data Analyst (Remote)", 257 | "salary_year_avg": "205000.0", 258 | "company_name": "SmartAsset", 259 | "skills": "tableau" 260 | }, 261 | { 262 | "job_id": 168310, 263 | "job_title": "Principal Data Analyst (Remote)", 264 | "salary_year_avg": "205000.0", 265 | "company_name": "SmartAsset", 266 | "skills": "gitlab" 267 | }, 268 | { 269 | "job_id": 731368, 270 | "job_title": "Director, Data Analyst - HYBRID", 271 | "salary_year_avg": "189309.0", 272 | "company_name": "Inclusively", 273 | "skills": "sql" 274 | }, 275 | { 276 | "job_id": 731368, 277 | "job_title": "Director, Data Analyst - HYBRID", 278 | "salary_year_avg": "189309.0", 279 | "company_name": "Inclusively", 280 | "skills": "python" 281 | }, 282 | { 283 | "job_id": 731368, 284 | "job_title": "Director, Data Analyst - HYBRID", 285 | "salary_year_avg": "189309.0", 286 | "company_name": "Inclusively", 287 | "skills": "azure" 288 | }, 289 | { 290 | "job_id": 731368, 291 | "job_title": "Director, Data Analyst - HYBRID", 292 | "salary_year_avg": "189309.0", 293 | "company_name": "Inclusively", 294 | "skills": "aws" 295 | }, 296 | { 297 | "job_id": 731368, 298 | "job_title": "Director, Data Analyst - HYBRID", 299 | "salary_year_avg": "189309.0", 300 | "company_name": "Inclusively", 301 | "skills": "oracle" 302 | }, 303 | { 304 | "job_id": 731368, 305 | "job_title": "Director, Data Analyst - HYBRID", 306 | "salary_year_avg": "189309.0", 307 | "company_name": "Inclusively", 308 | "skills": "snowflake" 309 | }, 310 | { 311 | "job_id": 731368, 312 | "job_title": "Director, Data Analyst - HYBRID", 313 | "salary_year_avg": "189309.0", 314 | "company_name": "Inclusively", 315 | "skills": "tableau" 316 | }, 317 | { 318 | "job_id": 731368, 319 | "job_title": "Director, Data Analyst - HYBRID", 320 | "salary_year_avg": "189309.0", 321 | "company_name": "Inclusively", 322 | "skills": "power bi" 323 | }, 324 | { 325 | "job_id": 731368, 326 | "job_title": "Director, Data Analyst - HYBRID", 327 | "salary_year_avg": "189309.0", 328 | "company_name": "Inclusively", 329 | "skills": "sap" 330 | }, 331 | { 332 | "job_id": 731368, 333 | "job_title": "Director, Data Analyst - HYBRID", 334 | "salary_year_avg": "189309.0", 335 | "company_name": "Inclusively", 336 | "skills": "jenkins" 337 | }, 338 | { 339 | "job_id": 731368, 340 | "job_title": "Director, Data Analyst - HYBRID", 341 | "salary_year_avg": "189309.0", 342 | "company_name": "Inclusively", 343 | "skills": "bitbucket" 344 | }, 345 | { 346 | "job_id": 731368, 347 | "job_title": "Director, Data Analyst - HYBRID", 348 | "salary_year_avg": "189309.0", 349 | "company_name": "Inclusively", 350 | "skills": "atlassian" 351 | }, 352 | { 353 | "job_id": 731368, 354 | "job_title": "Director, Data Analyst - HYBRID", 355 | "salary_year_avg": "189309.0", 356 | "company_name": "Inclusively", 357 | "skills": "jira" 358 | }, 359 | { 360 | "job_id": 731368, 361 | "job_title": "Director, Data Analyst - HYBRID", 362 | "salary_year_avg": "189309.0", 363 | "company_name": "Inclusively", 364 | "skills": "confluence" 365 | }, 366 | { 367 | "job_id": 310660, 368 | "job_title": "Principal Data Analyst, AV Performance Analysis", 369 | "salary_year_avg": "189000.0", 370 | "company_name": "Motional", 371 | "skills": "sql" 372 | }, 373 | { 374 | "job_id": 310660, 375 | "job_title": "Principal Data Analyst, AV Performance Analysis", 376 | "salary_year_avg": "189000.0", 377 | "company_name": "Motional", 378 | "skills": "python" 379 | }, 380 | { 381 | "job_id": 310660, 382 | "job_title": "Principal Data Analyst, AV Performance Analysis", 383 | "salary_year_avg": "189000.0", 384 | "company_name": "Motional", 385 | "skills": "r" 386 | }, 387 | { 388 | "job_id": 310660, 389 | "job_title": "Principal Data Analyst, AV Performance Analysis", 390 | "salary_year_avg": "189000.0", 391 | "company_name": "Motional", 392 | "skills": "git" 393 | }, 394 | { 395 | "job_id": 310660, 396 | "job_title": "Principal Data Analyst, AV Performance Analysis", 397 | "salary_year_avg": "189000.0", 398 | "company_name": "Motional", 399 | "skills": "bitbucket" 400 | }, 401 | { 402 | "job_id": 310660, 403 | "job_title": "Principal Data Analyst, AV Performance Analysis", 404 | "salary_year_avg": "189000.0", 405 | "company_name": "Motional", 406 | "skills": "atlassian" 407 | }, 408 | { 409 | "job_id": 310660, 410 | "job_title": "Principal Data Analyst, AV Performance Analysis", 411 | "salary_year_avg": "189000.0", 412 | "company_name": "Motional", 413 | "skills": "jira" 414 | }, 415 | { 416 | "job_id": 310660, 417 | "job_title": "Principal Data Analyst, AV Performance Analysis", 418 | "salary_year_avg": "189000.0", 419 | "company_name": "Motional", 420 | "skills": "confluence" 421 | }, 422 | { 423 | "job_id": 1749593, 424 | "job_title": "Principal Data Analyst", 425 | "salary_year_avg": "186000.0", 426 | "company_name": "SmartAsset", 427 | "skills": "sql" 428 | }, 429 | { 430 | "job_id": 1749593, 431 | "job_title": "Principal Data Analyst", 432 | "salary_year_avg": "186000.0", 433 | "company_name": "SmartAsset", 434 | "skills": "python" 435 | }, 436 | { 437 | "job_id": 1749593, 438 | "job_title": "Principal Data Analyst", 439 | "salary_year_avg": "186000.0", 440 | "company_name": "SmartAsset", 441 | "skills": "go" 442 | }, 443 | { 444 | "job_id": 1749593, 445 | "job_title": "Principal Data Analyst", 446 | "salary_year_avg": "186000.0", 447 | "company_name": "SmartAsset", 448 | "skills": "snowflake" 449 | }, 450 | { 451 | "job_id": 1749593, 452 | "job_title": "Principal Data Analyst", 453 | "salary_year_avg": "186000.0", 454 | "company_name": "SmartAsset", 455 | "skills": "pandas" 456 | }, 457 | { 458 | "job_id": 1749593, 459 | "job_title": "Principal Data Analyst", 460 | "salary_year_avg": "186000.0", 461 | "company_name": "SmartAsset", 462 | "skills": "numpy" 463 | }, 464 | { 465 | "job_id": 1749593, 466 | "job_title": "Principal Data Analyst", 467 | "salary_year_avg": "186000.0", 468 | "company_name": "SmartAsset", 469 | "skills": "excel" 470 | }, 471 | { 472 | "job_id": 1749593, 473 | "job_title": "Principal Data Analyst", 474 | "salary_year_avg": "186000.0", 475 | "company_name": "SmartAsset", 476 | "skills": "tableau" 477 | }, 478 | { 479 | "job_id": 1749593, 480 | "job_title": "Principal Data Analyst", 481 | "salary_year_avg": "186000.0", 482 | "company_name": "SmartAsset", 483 | "skills": "gitlab" 484 | }, 485 | { 486 | "job_id": 387860, 487 | "job_title": "ERM Data Analyst", 488 | "salary_year_avg": "184000.0", 489 | "company_name": "Get It Recruit - Information Technology", 490 | "skills": "sql" 491 | }, 492 | { 493 | "job_id": 387860, 494 | "job_title": "ERM Data Analyst", 495 | "salary_year_avg": "184000.0", 496 | "company_name": "Get It Recruit - Information Technology", 497 | "skills": "python" 498 | }, 499 | { 500 | "job_id": 387860, 501 | "job_title": "ERM Data Analyst", 502 | "salary_year_avg": "184000.0", 503 | "company_name": "Get It Recruit - Information Technology", 504 | "skills": "r" 505 | } 506 | ] 507 | */ 508 | 509 | -------------------------------------------------------------------------------- /project_sql/3_top_demanded_skills.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Question: What are the most in-demand skills for data analysts? 3 | - Join job postings to inner join table similar to query 2 4 | - Identify the top 5 in-demand skills for a data analyst. 5 | - Focus on all job postings. 6 | - Why? Retrieves the top 5 skills with the highest demand in the job market, 7 | providing insights into the most valuable skills for job seekers. 8 | */ 9 | 10 | SELECT 11 | skills, 12 | COUNT(skills_job_dim.job_id) AS demand_count 13 | FROM job_postings_fact 14 | INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id 15 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 16 | WHERE 17 | job_title_short = 'Data Analyst' 18 | AND job_work_from_home = True 19 | GROUP BY 20 | skills 21 | ORDER BY 22 | demand_count DESC 23 | LIMIT 5; 24 | 25 | /* 26 | Here's the breakdown of the most demanded skills for data analysts in 2023 27 | SQL and Excel remain fundamental, emphasizing the need for strong foundational skills in data processing and spreadsheet manipulation. 28 | Programming and Visualization Tools like Python, Tableau, and Power BI are essential, pointing towards the increasing importance of technical skills in data storytelling and decision support. 29 | 30 | [ 31 | { 32 | "skills": "sql", 33 | "demand_count": "7291" 34 | }, 35 | { 36 | "skills": "excel", 37 | "demand_count": "4611" 38 | }, 39 | { 40 | "skills": "python", 41 | "demand_count": "4330" 42 | }, 43 | { 44 | "skills": "tableau", 45 | "demand_count": "3745" 46 | }, 47 | { 48 | "skills": "power bi", 49 | "demand_count": "2609" 50 | } 51 | ] 52 | */ -------------------------------------------------------------------------------- /project_sql/4_top_paying_skills.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Answer: What are the top skills based on salary? 3 | - Look at the average salary associated with each skill for Data Analyst positions 4 | - Focuses on roles with specified salaries, regardless of location 5 | - Why? It reveals how different skills impact salary levels for Data Analysts and 6 | helps identify the most financially rewarding skills to acquire or improve 7 | */ 8 | 9 | SELECT 10 | skills, 11 | ROUND(AVG(salary_year_avg), 0) AS avg_salary 12 | FROM job_postings_fact 13 | INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id 14 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 15 | WHERE 16 | job_title_short = 'Data Analyst' 17 | AND salary_year_avg IS NOT NULL 18 | AND job_work_from_home = True 19 | GROUP BY 20 | skills 21 | ORDER BY 22 | avg_salary DESC 23 | LIMIT 25; 24 | 25 | /* 26 | Here's a breakdown of the results for top paying skills for Data Analysts: 27 | - High Demand for Big Data & ML Skills: Top salaries are commanded by analysts skilled in big data technologies (PySpark, Couchbase), machine learning tools (DataRobot, Jupyter), and Python libraries (Pandas, NumPy), reflecting the industry's high valuation of data processing and predictive modeling capabilities. 28 | - Software Development & Deployment Proficiency: Knowledge in development and deployment tools (GitLab, Kubernetes, Airflow) indicates a lucrative crossover between data analysis and engineering, with a premium on skills that facilitate automation and efficient data pipeline management. 29 | - Cloud Computing Expertise: Familiarity with cloud and data engineering tools (Elasticsearch, Databricks, GCP) underscores the growing importance of cloud-based analytics environments, suggesting that cloud proficiency significantly boosts earning potential in data analytics. 30 | 31 | [ 32 | { 33 | "skills": "pyspark", 34 | "avg_salary": "208172" 35 | }, 36 | { 37 | "skills": "bitbucket", 38 | "avg_salary": "189155" 39 | }, 40 | { 41 | "skills": "couchbase", 42 | "avg_salary": "160515" 43 | }, 44 | { 45 | "skills": "watson", 46 | "avg_salary": "160515" 47 | }, 48 | { 49 | "skills": "datarobot", 50 | "avg_salary": "155486" 51 | }, 52 | { 53 | "skills": "gitlab", 54 | "avg_salary": "154500" 55 | }, 56 | { 57 | "skills": "swift", 58 | "avg_salary": "153750" 59 | }, 60 | { 61 | "skills": "jupyter", 62 | "avg_salary": "152777" 63 | }, 64 | { 65 | "skills": "pandas", 66 | "avg_salary": "151821" 67 | }, 68 | { 69 | "skills": "elasticsearch", 70 | "avg_salary": "145000" 71 | }, 72 | { 73 | "skills": "golang", 74 | "avg_salary": "145000" 75 | }, 76 | { 77 | "skills": "numpy", 78 | "avg_salary": "143513" 79 | }, 80 | { 81 | "skills": "databricks", 82 | "avg_salary": "141907" 83 | }, 84 | { 85 | "skills": "linux", 86 | "avg_salary": "136508" 87 | }, 88 | { 89 | "skills": "kubernetes", 90 | "avg_salary": "132500" 91 | }, 92 | { 93 | "skills": "atlassian", 94 | "avg_salary": "131162" 95 | }, 96 | { 97 | "skills": "twilio", 98 | "avg_salary": "127000" 99 | }, 100 | { 101 | "skills": "airflow", 102 | "avg_salary": "126103" 103 | }, 104 | { 105 | "skills": "scikit-learn", 106 | "avg_salary": "125781" 107 | }, 108 | { 109 | "skills": "jenkins", 110 | "avg_salary": "125436" 111 | }, 112 | { 113 | "skills": "notion", 114 | "avg_salary": "125000" 115 | }, 116 | { 117 | "skills": "scala", 118 | "avg_salary": "124903" 119 | }, 120 | { 121 | "skills": "postgresql", 122 | "avg_salary": "123879" 123 | }, 124 | { 125 | "skills": "gcp", 126 | "avg_salary": "122500" 127 | }, 128 | { 129 | "skills": "microstrategy", 130 | "avg_salary": "121619" 131 | } 132 | ] 133 | */ -------------------------------------------------------------------------------- /project_sql/5_optimal_skills.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Answer: What are the most optimal skills to learn (aka it’s in high demand and a high-paying skill)? 3 | - Identify skills in high demand and associated with high average salaries for Data Analyst roles 4 | - Concentrates on remote positions with specified salaries 5 | - Why? Targets skills that offer job security (high demand) and financial benefits (high salaries), 6 | offering strategic insights for career development in data analysis 7 | */ 8 | 9 | -- Identifies skills in high demand for Data Analyst roles 10 | -- Use Query #3 11 | WITH skills_demand AS ( 12 | SELECT 13 | skills_dim.skill_id, 14 | skills_dim.skills, 15 | COUNT(skills_job_dim.job_id) AS demand_count 16 | FROM job_postings_fact 17 | INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id 18 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 19 | WHERE 20 | job_title_short = 'Data Analyst' 21 | AND salary_year_avg IS NOT NULL 22 | AND job_work_from_home = True 23 | GROUP BY 24 | skills_dim.skill_id 25 | ), 26 | -- Skills with high average salaries for Data Analyst roles 27 | -- Use Query #4 28 | average_salary AS ( 29 | SELECT 30 | skills_job_dim.skill_id, 31 | ROUND(AVG(job_postings_fact.salary_year_avg), 0) AS avg_salary 32 | FROM job_postings_fact 33 | INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id 34 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 35 | WHERE 36 | job_title_short = 'Data Analyst' 37 | AND salary_year_avg IS NOT NULL 38 | AND job_work_from_home = True 39 | GROUP BY 40 | skills_job_dim.skill_id 41 | ) 42 | -- Return high demand and high salaries for 10 skills 43 | SELECT 44 | skills_demand.skill_id, 45 | skills_demand.skills, 46 | demand_count, 47 | avg_salary 48 | FROM 49 | skills_demand 50 | INNER JOIN average_salary ON skills_demand.skill_id = average_salary.skill_id 51 | WHERE 52 | demand_count > 10 53 | ORDER BY 54 | avg_salary DESC, 55 | demand_count DESC 56 | LIMIT 25; 57 | 58 | -- rewriting this same query more concisely 59 | SELECT 60 | skills_dim.skill_id, 61 | skills_dim.skills, 62 | COUNT(skills_job_dim.job_id) AS demand_count, 63 | ROUND(AVG(job_postings_fact.salary_year_avg), 0) AS avg_salary 64 | FROM job_postings_fact 65 | INNER JOIN skills_job_dim ON job_postings_fact.job_id = skills_job_dim.job_id 66 | INNER JOIN skills_dim ON skills_job_dim.skill_id = skills_dim.skill_id 67 | WHERE 68 | job_title_short = 'Data Analyst' 69 | AND salary_year_avg IS NOT NULL 70 | AND job_work_from_home = True 71 | GROUP BY 72 | skills_dim.skill_id 73 | HAVING 74 | COUNT(skills_job_dim.job_id) > 10 75 | ORDER BY 76 | avg_salary DESC, 77 | demand_count DESC 78 | LIMIT 25; 79 | 80 | /* 81 | Here's a breakdown of the most optimal skills for Data Analysts in 2023: 82 | High-Demand Programming Languages: Python and R stand out for their high demand, with demand counts of 236 and 148 respectively. Despite their high demand, their average salaries are around $101,397 for Python and $100,499 for R, indicating that proficiency in these languages is highly valued but also widely available. 83 | Cloud Tools and Technologies: Skills in specialized technologies such as Snowflake, Azure, AWS, and BigQuery show significant demand with relatively high average salaries, pointing towards the growing importance of cloud platforms and big data technologies in data analysis. 84 | Business Intelligence and Visualization Tools: Tableau and Looker, with demand counts of 230 and 49 respectively, and average salaries around $99,288 and $103,795, highlight the critical role of data visualization and business intelligence in deriving actionable insights from data. 85 | Database Technologies: The demand for skills in traditional and NoSQL databases (Oracle, SQL Server, NoSQL) with average salaries ranging from $97,786 to $104,534, reflects the enduring need for data storage, retrieval, and management expertise. 86 | 87 | [ 88 | { 89 | "skill_id": 8, 90 | "skills": "go", 91 | "demand_count": "27", 92 | "avg_salary": "115320" 93 | }, 94 | { 95 | "skill_id": 234, 96 | "skills": "confluence", 97 | "demand_count": "11", 98 | "avg_salary": "114210" 99 | }, 100 | { 101 | "skill_id": 97, 102 | "skills": "hadoop", 103 | "demand_count": "22", 104 | "avg_salary": "113193" 105 | }, 106 | { 107 | "skill_id": 80, 108 | "skills": "snowflake", 109 | "demand_count": "37", 110 | "avg_salary": "112948" 111 | }, 112 | { 113 | "skill_id": 74, 114 | "skills": "azure", 115 | "demand_count": "34", 116 | "avg_salary": "111225" 117 | }, 118 | { 119 | "skill_id": 77, 120 | "skills": "bigquery", 121 | "demand_count": "13", 122 | "avg_salary": "109654" 123 | }, 124 | { 125 | "skill_id": 76, 126 | "skills": "aws", 127 | "demand_count": "32", 128 | "avg_salary": "108317" 129 | }, 130 | { 131 | "skill_id": 4, 132 | "skills": "java", 133 | "demand_count": "17", 134 | "avg_salary": "106906" 135 | }, 136 | { 137 | "skill_id": 194, 138 | "skills": "ssis", 139 | "demand_count": "12", 140 | "avg_salary": "106683" 141 | }, 142 | { 143 | "skill_id": 233, 144 | "skills": "jira", 145 | "demand_count": "20", 146 | "avg_salary": "104918" 147 | }, 148 | { 149 | "skill_id": 79, 150 | "skills": "oracle", 151 | "demand_count": "37", 152 | "avg_salary": "104534" 153 | }, 154 | { 155 | "skill_id": 185, 156 | "skills": "looker", 157 | "demand_count": "49", 158 | "avg_salary": "103795" 159 | }, 160 | { 161 | "skill_id": 2, 162 | "skills": "nosql", 163 | "demand_count": "13", 164 | "avg_salary": "101414" 165 | }, 166 | { 167 | "skill_id": 1, 168 | "skills": "python", 169 | "demand_count": "236", 170 | "avg_salary": "101397" 171 | }, 172 | { 173 | "skill_id": 5, 174 | "skills": "r", 175 | "demand_count": "148", 176 | "avg_salary": "100499" 177 | }, 178 | { 179 | "skill_id": 78, 180 | "skills": "redshift", 181 | "demand_count": "16", 182 | "avg_salary": "99936" 183 | }, 184 | { 185 | "skill_id": 187, 186 | "skills": "qlik", 187 | "demand_count": "13", 188 | "avg_salary": "99631" 189 | }, 190 | { 191 | "skill_id": 182, 192 | "skills": "tableau", 193 | "demand_count": "230", 194 | "avg_salary": "99288" 195 | }, 196 | { 197 | "skill_id": 197, 198 | "skills": "ssrs", 199 | "demand_count": "14", 200 | "avg_salary": "99171" 201 | }, 202 | { 203 | "skill_id": 92, 204 | "skills": "spark", 205 | "demand_count": "13", 206 | "avg_salary": "99077" 207 | }, 208 | { 209 | "skill_id": 13, 210 | "skills": "c++", 211 | "demand_count": "11", 212 | "avg_salary": "98958" 213 | }, 214 | { 215 | "skill_id": 186, 216 | "skills": "sas", 217 | "demand_count": "63", 218 | "avg_salary": "98902" 219 | }, 220 | { 221 | "skill_id": 7, 222 | "skills": "sas", 223 | "demand_count": "63", 224 | "avg_salary": "98902" 225 | }, 226 | { 227 | "skill_id": 61, 228 | "skills": "sql server", 229 | "demand_count": "35", 230 | "avg_salary": "97786" 231 | }, 232 | { 233 | "skill_id": 9, 234 | "skills": "javascript", 235 | "demand_count": "20", 236 | "avg_salary": "97587" 237 | } 238 | ] 239 | */ -------------------------------------------------------------------------------- /sql_load/1_create_database.sql: -------------------------------------------------------------------------------- 1 | CREATE DATABASE sql_course; 2 | 3 | -- DROP DATABASE IF EXISTS sql_course; -------------------------------------------------------------------------------- /sql_load/2_create_tables.sql: -------------------------------------------------------------------------------- 1 | -- Create company_dim table with primary key 2 | CREATE TABLE public.company_dim 3 | ( 4 | company_id INT PRIMARY KEY, 5 | name TEXT, 6 | link TEXT, 7 | link_google TEXT, 8 | thumbnail TEXT 9 | ); 10 | 11 | -- Create skills_dim table with primary key 12 | CREATE TABLE public.skills_dim 13 | ( 14 | skill_id INT PRIMARY KEY, 15 | skills TEXT, 16 | type TEXT 17 | ); 18 | 19 | -- Create job_postings_fact table with primary key 20 | CREATE TABLE public.job_postings_fact 21 | ( 22 | job_id INT PRIMARY KEY, 23 | company_id INT, 24 | job_title_short VARCHAR(255), 25 | job_title TEXT, 26 | job_location TEXT, 27 | job_via TEXT, 28 | job_schedule_type TEXT, 29 | job_work_from_home BOOLEAN, 30 | search_location TEXT, 31 | job_posted_date TIMESTAMP, 32 | job_no_degree_mention BOOLEAN, 33 | job_health_insurance BOOLEAN, 34 | job_country TEXT, 35 | salary_rate TEXT, 36 | salary_year_avg NUMERIC, 37 | salary_hour_avg NUMERIC, 38 | FOREIGN KEY (company_id) REFERENCES public.company_dim (company_id) 39 | ); 40 | 41 | -- Create skills_job_dim table with a composite primary key and foreign keys 42 | CREATE TABLE public.skills_job_dim 43 | ( 44 | job_id INT, 45 | skill_id INT, 46 | PRIMARY KEY (job_id, skill_id), 47 | FOREIGN KEY (job_id) REFERENCES public.job_postings_fact (job_id), 48 | FOREIGN KEY (skill_id) REFERENCES public.skills_dim (skill_id) 49 | ); 50 | 51 | -- Set ownership of the tables to the postgres user 52 | ALTER TABLE public.company_dim OWNER to postgres; 53 | ALTER TABLE public.skills_dim OWNER to postgres; 54 | ALTER TABLE public.job_postings_fact OWNER to postgres; 55 | ALTER TABLE public.skills_job_dim OWNER to postgres; 56 | 57 | -- Create indexes on foreign key columns for better performance 58 | CREATE INDEX idx_company_id ON public.job_postings_fact (company_id); 59 | CREATE INDEX idx_skill_id ON public.skills_job_dim (skill_id); 60 | CREATE INDEX idx_job_id ON public.skills_job_dim (job_id); -------------------------------------------------------------------------------- /sql_load/3_modify_tables.sql: -------------------------------------------------------------------------------- 1 | /* ⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️⚠️ 2 | Database Load Issues (follow if receiving permission denied when running SQL code below) 3 | 4 | Possible Errors: 5 | - ERROR >> duplicate key value violates unique constraint "company_dim_pkey" 6 | - ERROR >> could not open file "C:\Users\...\company_dim.csv" for reading: Permission denied 7 | 8 | 1. Drop the Database 9 | DROP DATABASE IF EXISTS sql_course; 10 | 2. Repeat steps to create database and load table schemas 11 | - 1_create_database.sql 12 | - 2_create_tables.sql 13 | 3. Open pgAdmin 14 | 4. In Object Explorer (left-hand pane), navigate to `sql_course` database 15 | 5. Right-click `sql_course` and select `PSQL Tool` 16 | - This opens a terminal window to write the following code 17 | 6. Get the absolute file path of your csv files 18 | 1. Find path by right-clicking a CSV file in VS Code and selecting “Copy Path” 19 | 7. Paste the following into `PSQL Tool`, (with the CORRECT file path) 20 | 21 | \copy company_dim FROM '[Insert File Path]/company_dim.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', ENCODING 'UTF8'); 22 | 23 | \copy skills_dim FROM '[Insert File Path]/skills_dim.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', ENCODING 'UTF8'); 24 | 25 | \copy job_postings_fact FROM '[Insert File Path]/job_postings_fact.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', ENCODING 'UTF8'); 26 | 27 | \copy skills_job_dim FROM '[Insert File Path]/skills_job_dim.csv' WITH (FORMAT csv, HEADER true, DELIMITER ',', ENCODING 'UTF8'); 28 | 29 | */ 30 | 31 | -- NOTE: This has been updated from the video to fix issues with encoding 32 | COPY company_dim 33 | FROM 'C:\Program Files\PostgreSQL\16\data\Datasets\sql_course\company_dim.csv' 34 | WITH (FORMAT csv, HEADER true, DELIMITER ',', ENCODING 'UTF8'); 35 | 36 | COPY skills_dim 37 | FROM 'C:\Program Files\PostgreSQL\16\data\Datasets\sql_course\skills_dim.csv' 38 | WITH (FORMAT csv, HEADER true, DELIMITER ',', ENCODING 'UTF8'); 39 | 40 | COPY job_postings_fact 41 | FROM 'C:\Program Files\PostgreSQL\16\data\Datasets\sql_course\job_postings_fact.csv' 42 | WITH (FORMAT csv, HEADER true, DELIMITER ',', ENCODING 'UTF8'); 43 | 44 | COPY skills_job_dim 45 | FROM 'C:\Program Files\PostgreSQL\16\data\Datasets\sql_course\skills_job_dim.csv' 46 | WITH (FORMAT csv, HEADER true, DELIMITER ',', ENCODING 'UTF8'); 47 | --------------------------------------------------------------------------------