├── .DS_Store ├── data └── tech_layoffs.xlsx ├── README.md ├── Outlines ├── final_section_outline.ipynb ├── layoffs_fyi_cleaner.ipynb ├── Kirthin_final_section_outline.ipynb ├── final_outline_anna.ipynb ├── final_section_seb.ipynb └── template.ipynb ├── .gitignore ├── ProjectProposal_Group019_WI24.ipynb ├── Individual Uploads └── McKayla_final_section_outline.ipynb └── DataCheckpoint_Group019_WI24.ipynb /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/COGS108/Group019_WI24/master/.DS_Store -------------------------------------------------------------------------------- /data/tech_layoffs.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/COGS108/Group019_WI24/master/data/tech_layoffs.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This is your group repo for your final project for COGS108. 2 | 3 | This repository is private, and is only visible to the course instructors and your group mates; it is not visible to anyone else. 4 | 5 | Template notebooks for each component are provided. Only work on the notebook prior to its due date. After each submission is due, move onto the next notebook (For example, after the proposal is due, start working in the Data Checkpoint notebook). 6 | 7 | This repository will be frozen on the final project due date. No further changes can be made after that time. 8 | 9 | Your project proposal and final project will be graded based solely on the corresponding project notebooks in this repository. 10 | 11 | Template Jupyter notebooks have been included, with your group number replacing the XXX in the following file names. For each due date, make sure you have a notebook present in this repository by each due date with the following name (where XXX is replaced by your group number): 12 | 13 | - `ProjectProposal_groupXXX.ipynb` 14 | - `DataCheckpoint_groupXXX.ipynb` 15 | - `EDACheckpoint_groupXXX.ipynb` 16 | - `FinalProject_groupXXX.ipynb` 17 | 18 | This is *your* repo. You are free to manage the repo as you see fit, edit this README, add data files, add scripts, etc. So long as there are the four files above on due dates with the required information, the rest is up to you all. 19 | 20 | Also, you are free and encouraged to share this project after the course and to add it to your portfolio. Just be sure to fork it to your GitHub at the end of the quarter! 21 | -------------------------------------------------------------------------------- /Outlines/final_section_outline.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Title" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Some text explaining:\n", 15 | " - Which variable specifically we are looking into\n", 16 | " - It's relationship to the hypothesis\n", 17 | " - What we expect to find" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": null, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "# TODO: Exploratory plot 1" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "Some text explaining what this plot shows and how it effectively demonstrates the variable we are looking into" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "# TODO: (OPTIONAL) Exploratory plot 2" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "Some text explaining what this plot shows and how it effectively demonstrates the variable we are looking into" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "Set up Null hypothesis and Alternative hypothesis\n", 57 | " - Null: \n", 58 | " - Alternative: " 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "$ H_0: \\mu_{\\text{purebred}} = \\mu_{\\text{mixed-breed}} $\n", 66 | "\n", 67 | "$ H_1: \\mu_{\\text{purebred}} < \\mu_{\\text{mixed-breed}} $\n", 68 | "\n", 69 | "note: smaller adoption speed means faster" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "# TODO: Perform 1 stats test (don't print entire output of a model, only the p-value)" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "Use p value to demonstrate what this says about our expectations from the beginning of the section. 1-2 sentences, relate it to the hypothesis" 86 | ] 87 | } 88 | ], 89 | "metadata": { 90 | "language_info": { 91 | "name": "python" 92 | } 93 | }, 94 | "nbformat": 4, 95 | "nbformat_minor": 2 96 | } 97 | -------------------------------------------------------------------------------- /Outlines/layoffs_fyi_cleaner.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import json\n", 10 | "d = json.load(open(\"data/resp.json\"))\n", 11 | "jtab = d[\"data\"][\"table\"]\n", 12 | "j_entries = jtab[\"rows\"]\n", 13 | "# id createdTime cellValuesByColumnId...\n", 14 | "for i in j_entries:\n", 15 | " i.update(i[\"cellValuesByColumnId\"])\n", 16 | " i.pop(\"cellValuesByColumnId\")\n", 17 | " \n", 18 | "df2 = pd.DataFrame.from_records(j_entries)\n", 19 | "[i[\"name\"] for i in jtab[\"columns\"]]\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": null, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "## don't run this cell twice please.\n", 29 | "# map columns to their eng names\n", 30 | "df2cols = {}\n", 31 | "for i in jtab[\"columns\"]:\n", 32 | " # id: company\n", 33 | " df2cols[i[\"id\"]] = i[\"name\"]\n", 34 | "# df2cols\n", 35 | "\n", 36 | "df2c = list(df2.columns)\n", 37 | "\n", 38 | "for i in range(len(df2c)):\n", 39 | " if df2c[i] in df2cols:\n", 40 | " df2c[i] = df2cols[df2c[i]]\n", 41 | "\n", 42 | "df2.columns = df2c\n", 43 | "# df2\n", 44 | "\n", 45 | "lmap = jtab[\"columns\"][1][\"typeOptions\"][\"choices\"]\n", 46 | "df2[\"Location HQ\"] = df2[\"Location HQ\"].apply(lambda x: \" \".join([lmap[i][\"name\"] if i in lmap else i for i in x]) if (isinstance(x, list)) else print(\"weird value in loc hq\", x))\n", 47 | "lmap = jtab[\"columns\"][8][\"typeOptions\"][\"choices\"]\n", 48 | "df2[\"Stage\"] = df2[\"Stage\"].apply(lambda x: (lmap[x][\"name\"] if x in lmap else x) if (isinstance(x, str)) else print(\"weird value in stage\", x))\n", 49 | "lmap = jtab[\"columns\"][10][\"typeOptions\"][\"choices\"]\n", 50 | "df2[\"Country\"] = df2[\"Country\"].apply(lambda x: (lmap[x][\"name\"] if x in lmap else x) if (isinstance(x, str)) else print(\"weird value in country\", x))\n", 51 | "lmap = jtab[\"columns\"][5][\"typeOptions\"][\"choices\"]\n", 52 | "df2[\"Industry\"] = df2[\"Industry\"].apply(lambda x: (lmap[x][\"name\"] if x in lmap else x) if (isinstance(x, str)) else print(\"weird value in industry\", x))\n", 53 | "df2.head()" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "df2.to_csv(\"data/layoffs_fyi.csv\")" 63 | ] 64 | } 65 | ], 66 | "metadata": { 67 | "language_info": { 68 | "name": "python" 69 | } 70 | }, 71 | "nbformat": 4, 72 | "nbformat_minor": 2 73 | } 74 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # poetry 98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 102 | #poetry.lock 103 | 104 | # pdm 105 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 106 | #pdm.lock 107 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 108 | # in version control. 109 | # https://pdm.fming.dev/#use-with-ide 110 | .pdm.toml 111 | 112 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 113 | __pypackages__/ 114 | 115 | # Celery stuff 116 | celerybeat-schedule 117 | celerybeat.pid 118 | 119 | # SageMath parsed files 120 | *.sage.py 121 | 122 | # Environments 123 | .env 124 | .venv 125 | env/ 126 | venv/ 127 | ENV/ 128 | env.bak/ 129 | venv.bak/ 130 | 131 | # Spyder project settings 132 | .spyderproject 133 | .spyproject 134 | 135 | # Rope project settings 136 | .ropeproject 137 | 138 | # mkdocs documentation 139 | /site 140 | 141 | # mypy 142 | .mypy_cache/ 143 | .dmypy.json 144 | dmypy.json 145 | 146 | # Pyre type checker 147 | .pyre/ 148 | 149 | # pytype static type analyzer 150 | .pytype/ 151 | 152 | # Cython debug symbols 153 | cython_debug/ 154 | 155 | # PyCharm 156 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 157 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 158 | # and can be added to the global gitignore or merged into this file. For a more nuclear 159 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 160 | #.idea/ 161 | 162 | #testing code 163 | test.ipynb 164 | -------------------------------------------------------------------------------- /Outlines/Kirthin_final_section_outline.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Correlation Between Money Raised in Millions and Percentage Laid Off" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this section we are looking at the correlation between the money a company raises and the percentage of their staff, they laid off. We expect larger companies to layoff a higher percentage of employees than smaller companies because they will be able to raise more money and handle these layoffs, while the small companies would need their staff." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [ 22 | { 23 | "ename": "NameError", 24 | "evalue": "name 'plt' is not defined", 25 | "output_type": "error", 26 | "traceback": [ 27 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 28 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 29 | "Cell \u001b[0;32mIn[1], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m plt\u001b[38;5;241m.\u001b[39mscatter(df[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mFunding\u001b[39m\u001b[38;5;124m'\u001b[39m], df[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mPercentage\u001b[39m\u001b[38;5;124m'\u001b[39m])\n\u001b[1;32m 2\u001b[0m plt\u001b[38;5;241m.\u001b[39mxlabel(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mFunding\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 3\u001b[0m plt\u001b[38;5;241m.\u001b[39mylabel(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mPercentage\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", 30 | "\u001b[0;31mNameError\u001b[0m: name 'plt' is not defined" 31 | ] 32 | } 33 | ], 34 | "source": [ 35 | "# Assuming df is your DataFrame containing the data\n", 36 | "x = df['Funding']\n", 37 | "y = df['Percentage']\n", 38 | "\n", 39 | "# Define the number of bins for each axis\n", 40 | "bins_x = np.linspace(min(x), 900, 20)\n", 41 | "bins_y = np.linspace(min(y), max(y), 20)\n", 42 | "\n", 43 | "# Create 2D histogram\n", 44 | "hist, x_edges, y_edges = np.histogram2d(x, y, bins=(bins_x, bins_y))\n", 45 | "\n", 46 | "# Create meshgrid of x and y values for plotting\n", 47 | "x_grid, y_grid = np.meshgrid(x_edges, y_edges)\n", 48 | "\n", 49 | "# Plot the 2D histogram using a pcolormesh\n", 50 | "plt.figure(figsize=(8, 6))\n", 51 | "plt.pcolormesh(x_grid, y_grid, hist.T, cmap='Blues')\n", 52 | "plt.colorbar(label='Frequency')\n", 53 | "\n", 54 | "# Add labels and title\n", 55 | "plt.xlabel('Funding')\n", 56 | "plt.ylabel('Percentage')\n", 57 | "plt.title('2D Histogram of Funding and Percentage')\n", 58 | "\n", 59 | "\n", 60 | "# Show the plot\n", 61 | "plt.show()" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "The 2D histogram we created depicts the correlation between the two variables. We chose to remove outliers because a majority of our data comes from smaller companies relative to others on the list. From our analysis, it appears there is a negative correlation between the two variables, though it is not a strong relationship. Furthermore, we observe that as a company receives more funding, the likelihood of layoffs decreases. The majority of the darker shaded blue squares are concentrated on the left side of the plot, where funding is lower." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "no_outliers = df[df.get('Funding') < 120000]\n", 78 | "outcome, predictors = patsy.dmatrices('Percentage ~ Funding', no_outliers)\n", 79 | "model = sm.OLS(outcome, predictors)\n", 80 | "results = model.fit()\n", 81 | "slope = results.params[1]\n", 82 | "print(\"P-value:\", results.pvalues[1])\n", 83 | "print(\"T-test:\", results.tvalues[1]) \n", 84 | "print(\"Slope:\", slope)" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "The results of the t-test revealed a statistically significant difference between the means of the two groups (p = 0.0124). Given the obtained p-value of 0.0124, which is less than the conventional significance level of 0.05, we can reject the null hypothesis and conclude that there is a significant difference between the groups.\n", 92 | "\n", 93 | "The negative t-statistic (-2.50) indicates that the mean of 'Percentage' is lower than the mean of 'Funding'. This suggests that there is a meaningful difference in the outcome variable between the two groups, with 'Percentage' exhibiting lower values compared to 'Funding'.\n", 94 | "\n", 95 | "Furthermore, the negative slope of the line indicates a negative correlation between the two variables, as highlighted in the 2D Histogram. This corroborates the observed difference between the means and provides additional evidence of the relationship between 'Percentage' and 'Funding'." 96 | ] 97 | } 98 | ], 99 | "metadata": { 100 | "kernelspec": { 101 | "display_name": "Python 3 (ipykernel)", 102 | "language": "python", 103 | "name": "python3" 104 | }, 105 | "language_info": { 106 | "codemirror_mode": { 107 | "name": "ipython", 108 | "version": 3 109 | }, 110 | "file_extension": ".py", 111 | "mimetype": "text/x-python", 112 | "name": "python", 113 | "nbconvert_exporter": "python", 114 | "pygments_lexer": "ipython3", 115 | "version": "3.11.5" 116 | } 117 | }, 118 | "nbformat": 4, 119 | "nbformat_minor": 2 120 | } 121 | -------------------------------------------------------------------------------- /Outlines/final_outline_anna.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Relationship between Company Size Before Layoffs and Percentage of Employees Laid Off" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this section we are looking at the effect of company size before layoffs and the percentage of employees laid off, which directly relates to our hypothesis. We expect larger companies to layoff a higher percentage of employees than smaller companies." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "# Plotting scatterplot comparing Company Size Before Layoffs and Percentage Laid Off\n", 24 | "plt.figure(figsize=(12,6))\n", 25 | "sns.scatterplot(x='Company_Size_before_Layoffs', y='Percentage', data=df, s=100)\n", 26 | "\n", 27 | "# Plotting line of best fit\n", 28 | "sns.regplot(x='Company_Size_before_Layoffs', y='Percentage', data=df, scatter=False, color='blue')\n", 29 | "\n", 30 | "plt.title('Company Size Before Layoffs vs. Percentage of Employees Laid Off', fontsize=20)\n", 31 | "plt.xlabel('Company Size Before Layoffs', fontsize=14)\n", 32 | "plt.ylabel('Percentage of Layoffs', fontsize=14)\n", 33 | "plt.show()" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "According to this scatterplot, there is a relationship between company size before layoffs and the percentage of employees laid off. However, this relationship appears negative, such that, as the size of the company before layoffs increases, the percentage of layoffs decreases. This contradicts our hypothesis and expectation that larger companies will have a higher percentage of layoffs. Given this initial relationship, we would like to measure the strength of association between company size before layoffs and the percentage of employees laid off." 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "#Creating a heat map to determine the different correlation coefficients for Company Size Before Layoffs, Percentage of Layoffs, Company Size After Layoffs, Money Raised in Million, and Year\n", 50 | "heatmap_data = df[['Company_Size_before_Layoffs', 'Percentage', 'Company_Size_after_layoffs', 'Money_Raised_in_$_mil', 'Year']]\n", 51 | "\n", 52 | "correlation_matrix = heatmap_data.corr()\n", 53 | "\n", 54 | "plt.figure(figsize=(12,6))\n", 55 | "sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=\".2f\")\n", 56 | "\n", 57 | "plt.title('Heatmap: Correlation Matrix of Variables')\n", 58 | "plt.show()" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "According to this heat map, there is a weak negative correlation between company size before layoffs and the percentage of layoffs (r = -0.11). This suggests that the company size before layoffs has only a minor impact on the percentage of layoffs. It seems that the correlation coefficient is the highest between the year of the layoffs and the percentage of layoffs. Given this week negative correlation, we are interested to see if there is a significant relationship between company size before layoffs and the percentage of employees laid off." 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "$H_o$: There is no relationship between company size before layoffs and the percentage of employees laid off ($\\beta = 0$)\n", 73 | "\n", 74 | "$H_a$: There is a relationship between company size before layoffs and the percentage of employees laid off ($\\beta \\ne 0$)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "# Finding the p-value and t-value using the OLS regression\n", 84 | "outcome, predictors = patsy.dmatrices('Percentage ~ Company_Size_before_Layoffs', df)\n", 85 | "model = sm.OLS(outcome, predictors)\n", 86 | "results = model.fit()\n", 87 | "print(\"P-value:\", results.pvalues[1])\n", 88 | "print(\"T-test:\", results.tvalues[1])" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "The p-value is 3.21e-05 (p < 0.05) and the t-value is -4.171, which demonstrates a statistically significant relationship between company size before layoffs and the percentage of employees laid off. Given this p-value, we reject the null hypothesis in favor of the alternative hypothesis, concluding that larger companies before layoffs have lower percentage of layoffs. \n", 96 | "\n", 97 | "Nonetheless, this finding is different from what we hypothesized, as we expected larger companies to have a higher percentage of employee layoffs. One reason for the inverse relationship is that compared to smaller companies, larger companies have more employees to begin with, so if larger companies layoff more people than smaller companies, the percentage is not going to be as high. It is also important to consider other factors, such as industry type, geographic location, and economic condiitons, that could influence the relationship between company size before layoffs and the percentage of layoffs. " 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [] 104 | } 105 | ], 106 | "metadata": { 107 | "kernelspec": { 108 | "display_name": "Python 3", 109 | "language": "python", 110 | "name": "python3" 111 | }, 112 | "language_info": { 113 | "codemirror_mode": { 114 | "name": "ipython", 115 | "version": 3 116 | }, 117 | "file_extension": ".py", 118 | "mimetype": "text/x-python", 119 | "name": "python", 120 | "nbconvert_exporter": "python", 121 | "pygments_lexer": "ipython3", 122 | "version": "3.10.4" 123 | } 124 | }, 125 | "nbformat": 4, 126 | "nbformat_minor": 2 127 | } 128 | -------------------------------------------------------------------------------- /Outlines/final_section_seb.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Analyzing Correlation Between Funding Stage and Percentage of Employees Laid Off" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "The funding stage column of our dataset contains information on which round of funding the company is on. Since a lot of these companies are startups, we have information from many companies across many different rounds of funding. We'd like to investigate if there is a correlation between these funding stages and the percentage of employees laid off. This correlation will help us determine if the funding stage of a company can be used to indicate if a company is more likely to perform large-scale layoffs." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "fig = plt.figure(figsize=(12, 5))\n", 24 | "sns.boxplot(x='Stage', y='Percentage', data=df, hue='Stage')\n", 25 | "plt.xticks(df['Stage'].unique(), rotation=45)\n", 26 | "plt.ylabel('Percentage Laid Off')\n", 27 | "plt.show()" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "Looking at this plot, we can see that it appears as the companies goes through more rounds of funding, the distributions for their percentage of employees laid off tends to become smaller, and more centered between 0% - 20% of employees laid off. This is an interesting trend, so let's see if we can build a linear regressor to help identify this trend. Since the funding stage is a categorical column, we will first need to convert it into ordinal data. Ordinal data is a type of qualitative data where you can identify a clear scale between the different categories. Since funding stages happen in a sequence, and are in essence just a count of the amount of times a started has gotten funding, we can convert this to a numerical scale. To ensure it was created correctly, we will regenerate the same plot but with our numerized versions of the `Stage` column instead." 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "series = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']\n", 44 | "stage_numerizer = dict(zip([f'Series {s}' for s in series], np.arange(1, len(series)+1, 1)))\n", 45 | "stage_numerizer['Seed'] = 0\n", 46 | "numerizer = defaultdict(lambda : np.nan)\n", 47 | "numerizer.update(stage_numerizer)\n", 48 | "clear_funding = df[['Percentage', 'Funding', 'Stage']]\n", 49 | "clear_funding.loc[:, 'Stage_numerized'] = clear_funding['Stage'].apply(lambda i: numerizer[i])\n", 50 | "\n", 51 | "fig = plt.figure(figsize=(12, 5))\n", 52 | "sns.boxplot(x='Stage_numerized', y='Percentage', data=clear_funding, hue='Stage')\n", 53 | "plt.xlabel('Stage Numerized')\n", 54 | "plt.ylabel('Percentage Laid Off')\n", 55 | "plt.legend().remove()\n", 56 | "plt.show()" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "Looks like we created it correctly. Lets see if we can perform a linear regression now and find a correlation between Stage Numerized and " 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "\n", 71 | "$ H_0: \\text{There is no satistically significant correlation between the Stage\\_numerized and Percentage columns} $\n", 72 | "\n", 73 | "$ H_1: \\text{There is a satistically significant correlation between the Stage\\_numerized and Percentage columns} $\n", 74 | "\n", 75 | "For this test, we will be using a p-value of 0.01. We will use a smaller p-value because we converted the data numerically, and are omitting some of the other funding stages that did not fit into this ordinal categorization. A smaller p-value will help eliminate some of the bias introduced by performing these modifications." 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "outcome, predictors = patsy.dmatrices('Percentage ~ Stage_numerized', clear_funding)\n", 85 | "model = sm.OLS(outcome, predictors)\n", 86 | "results = model.fit()\n", 87 | "p_value = results.pvalues[1]\n", 88 | "t_value = results.tvalues[1]\n", 89 | "stages = np.sort(clear_funding['Stage_numerized'].unique())\n", 90 | "\n", 91 | "fig = plt.figure(figsize=(12, 5))\n", 92 | "sns.boxplot(x='Stage_numerized', y='Percentage', data=clear_funding, hue='Stage')\n", 93 | "\n", 94 | "predicted_percentage = results.params[0] + results.params[1] * stages\n", 95 | "sns.lineplot(x=stages, y=predicted_percentage, color='red', linewidth=2, label='Linear Regression')\n", 96 | "plt.xlabel('Stage Numerized')\n", 97 | "plt.ylabel('Percentage Laid Off')\n", 98 | "plt.legend(loc='upper right', handles=[\n", 99 | " mpatches.Patch(color='red', label='Linear Regression'),\n", 100 | " mpatches.Patch(color='none', label=f'P_value={np.round(p_value, 4)}'),\n", 101 | " mpatches.Patch(color='none', label=f'P_value={np.round(t_value, 4)}'),\n", 102 | "])\n", 103 | "plt.title('Linearly Regressed Layoff Percentage using Stage Numerized')\n", 104 | "plt.show()" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "We succesfully disproved the null hypothesis with a p-value of $0.0$, this suggests that there is a statistically significant correlation between the numerized stage and the percentage of employees laid off. With a t-value of $-11.5861$, we find that for every round of funding, a company that performs layoffs is expted to layoff around $10\\%$ less employees. This means that if you were looking to identify a company to be more likely to perform a large-scale layoff, it would be a company that is in the lower funding stages." 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [] 118 | } 119 | ], 120 | "metadata": { 121 | "language_info": { 122 | "name": "python" 123 | } 124 | }, 125 | "nbformat": 4, 126 | "nbformat_minor": 2 127 | } 128 | -------------------------------------------------------------------------------- /ProjectProposal_Group019_WI24.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# COGS 108 - Project Proposal" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Names\n", 15 | "\n", 16 | "- Sebastian Modafferi\n", 17 | "- McKayla David\n", 18 | "- Kirthin Rajkumar\n", 19 | "- Matthew Chan\n", 20 | "- Anna Potapenko" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "# Research Question" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "Is a larger company more likely to lay off employees than a smaller company? If not, what indicators can be viewed to analyze what influences company layoffs?" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## Background and Prior Work" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "Layoffs refer to economic and organizational changes, and are a significant indicator of the success and development of companies. While they affect employees, layoff rates have a broader implication on the health of the economy, industry trends, and the state of the workforce. Understanding the trends and impacts of layoffs is vital not only on a global economic scale, but within communities of stakeholders and individual communities.\n", 49 | "\n", 50 | "Layoffs are important to study due to their relevance to both companies and employees. For a company, understanding indicators which can predict imminent layoffs can help them course correct before reaching a point of no return. On the other hand, employees understanding layoff indicators can help them in choosing the correct company for their next role, ensuring job safety. \n", 51 | "\n", 52 | "Research published in the Journal of the European Economic Association [1^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) explored the economic influences that cause layoffs, and inquired into how financial health and market factors influence layoff decisions. A similar study published in Journal of Labor Economics looks into the effects of layoffs on unemployment rates, and found that layoffs can have lasting impact on the job market and employee career trajectory.\n", 53 | "\n", 54 | "The journal of Labor Empirical Finance [3^](#https://doi.org/10.1016/s0927-5398\\(01\\)00024-x) also looks into the different firms and what caused their layoffs, giving insight into company restructuring and different technologies that help to modify the workforce requirements. Additionally, past precedent reviewed by JSTOR [2^](#https://www.jstor.org/stable/117002?casa_token=m7s1bFw7mY4AAAAA%3AhaYXwJWsj5E0Xo7vbnjns6omvUnSFYlenLVZ99nBhONKkQRCLyfLIdEk3ZJycob9If4HtLaMga7y7cQzrzAO6QfJYXTkccHfVciVYhTXREH7HSHuGN4) article explains the repetition of layoffs and how it correlates with economic cycles. This suggests that layoffs are an essential part of economic growth.\n", 55 | "\n", 56 | "Research on layoffs adopts an interdisciplinary approach, using economic theories, organizational behavior, and societal impacts. Overall, it is imperative to understand the factors that influence layoffs because knowledge about these factors can help researchers develop strategies to mitigate the negative effects of layoffs on employees and the economy at large. \n", 57 | "\n", 58 | "1. [^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) Gathmann, C., Helm, I., & Schönberg, U. (2018). Spillover effects of mass layoffs. Journal of the European Economic Association, 18(1), 427–468. https://doi.org/10.1093/jeea/jvy045\n", 59 | "2. [^](#cite_ref-2) Hallock, Kevin, (1998). Layoffs, top executive pay, and firm performance on JSTOR. (n.d.). www.jstor.org. https://www.jstor.org/stable/117002\n", 60 | "3. [^](#https://doi.org/10.1016/s0927-5398\\(01\\)00024-x) Chen, P., Mehrotra, V., Sivakumar, R., & Yu, W. (2001). Layoffs, shareholders’ wealth, and corporate performance. Journal of Empirical Finance, 8(2), 171–199. https://doi.org/10.1016/s0927-5398(01)00024-x\n" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "# Hypothesis\n" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "We are inclined to believe this due to the fact that smaller companies already have less employees, so lay-offs are more likely to harm the business than benefit it. Additionally, larger companies are able to withstand more financial pressure, allowing them to perform large layoffs despite the impact on company performance given that they have enough capital with withstand the losses.\n" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "# Data" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "The ideal dataset would include revenue, number of employees, percent laid off, year, type of company, size of company, share price (public or private), growth, year before data, number of mass layoffs in companies life, etc. Some promising datasets we have found are\n", 89 | " - https://www.kaggle.com/datasets/ulrikeherold/tech-layoffs-2020-2024\n", 90 | " - https://layoffs.fyi/\n", 91 | " - https://www.kaggle.com/datasets/mysarahmadbhat/inc-5000-companies" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "# Ethics & Privacy" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "Potential ethical concerns: \n", 106 | " - layoffs.fyi dataset has some data about fired employees with names (we should ignore that column)\n", 107 | " - Layoffs.fyi only pulls data from news articles, so it is a biased sample (information that is only accessible to the public)" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "How will we address these concerns?\n", 115 | " - Omit private employee information\n", 116 | " - Omit company name" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "# Team Expectations " 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | " - Be reliable in terms of completing one’s own work/contributions\n", 131 | " - Maintain open communication between team members\n", 132 | " - Be an active contributor to group discussions\n", 133 | " " 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "# Project Timeline Proposal" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "| Meeting Date | Meeting Time| Completed Before Meeting | Discuss at Meeting |\n", 148 | "|---|---|---|---|\n", 149 | "| 02/04 | 1 PM | Read previous COGS 108 Final Projects | Complete previous quarters’ COGS 108 Final Project Analysis, plan meeting times, begin discussing project topics. | \n", 150 | "| 02/05 | 1 PM | Brainstorm project topics, potential data sources, and viability of research questions | Discuss and decide on final project topic; discuss hypothesis; begin background research Discuss ideal dataset(s) and ethics; draft project proposal| \n", 151 | "| 02/11 | 1 PM | Edit, finalize, and submit proposal; Search for datasets | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part |\n", 152 | "| 02/18 | 1 PM | Delegate Tasks and start wrangling | Go over what everyone has done. Make edits or revise things before. Also go over revisions and feedback from the proposal. |\n", 153 | "| 02/25 | 1 PM | Finalize wrangling/EDA; Begin Analysis | Meet for Checkpoint #1 |\n", 154 | "| 03/13 | 12 PM | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |\n", 155 | "| 03/20 | Before 11:59 PM | NA | Turn in Final Project & Group Project Surveys |" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [] 162 | } 163 | ], 164 | "metadata": { 165 | "kernelspec": { 166 | "display_name": "Python 3 (ipykernel)", 167 | "language": "python", 168 | "name": "python3" 169 | }, 170 | "language_info": { 171 | "codemirror_mode": { 172 | "name": "ipython", 173 | "version": 3 174 | }, 175 | "file_extension": ".py", 176 | "mimetype": "text/x-python", 177 | "name": "python", 178 | "nbconvert_exporter": "python", 179 | "pygments_lexer": "ipython3", 180 | "version": "3.9.7" 181 | } 182 | }, 183 | "nbformat": 4, 184 | "nbformat_minor": 2 185 | } 186 | -------------------------------------------------------------------------------- /Outlines/template.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "import numpy as np\n" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "## Loading in data" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 4, 23 | "metadata": {}, 24 | "outputs": [ 25 | { 26 | "data": { 27 | "text/html": [ 28 | "
\n", 29 | "\n", 42 | "\n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | "
#CompanyLocation_HQCountryContinentLaid_OffDate_layoffsPercentageCompany_Size_before_LayoffsCompany_Size_after_layoffsIndustryStageMoney_Raised_in_$_milYearlatlng
03ShareChatBengaluruIndiaAsia2002023-12-2015.013331133ConsumerSeries H$1700202312.9719477.59369
14InSightecHaifaIsraelAsia1002023-12-1920.0500400HealthcareUnknown$733202332.8184134.98850
26Enphase EnergySan Francisco Bay AreaUSANorth America3502023-12-1810.035003150EnergyPost-IPO$116202337.54827-121.98857
37UdaanBengaluruIndiaAsia1002023-12-1810.01000900RetailUnknown1500202312.9719477.59369
414CruiseSan Francisco Bay AreaUSANorth America9002023-12-1424.037502850TransportationAcquired$15000202337.77493-122.41942
\n", 162 | "
" 163 | ], 164 | "text/plain": [ 165 | " # Company Location_HQ Country Continent \\\n", 166 | "0 3 ShareChat Bengaluru India Asia \n", 167 | "1 4 InSightec Haifa Israel Asia \n", 168 | "2 6 Enphase Energy San Francisco Bay Area USA North America \n", 169 | "3 7 Udaan Bengaluru India Asia \n", 170 | "4 14 Cruise San Francisco Bay Area USA North America \n", 171 | "\n", 172 | " Laid_Off Date_layoffs Percentage Company_Size_before_Layoffs \\\n", 173 | "0 200 2023-12-20 15.0 1333 \n", 174 | "1 100 2023-12-19 20.0 500 \n", 175 | "2 350 2023-12-18 10.0 3500 \n", 176 | "3 100 2023-12-18 10.0 1000 \n", 177 | "4 900 2023-12-14 24.0 3750 \n", 178 | "\n", 179 | " Company_Size_after_layoffs Industry Stage Money_Raised_in_$_mil \\\n", 180 | "0 1133 Consumer Series H $1700 \n", 181 | "1 400 Healthcare Unknown $733 \n", 182 | "2 3150 Energy Post-IPO $116 \n", 183 | "3 900 Retail Unknown 1500 \n", 184 | "4 2850 Transportation Acquired $15000 \n", 185 | "\n", 186 | " Year lat lng \n", 187 | "0 2023 12.97194 77.59369 \n", 188 | "1 2023 32.81841 34.98850 \n", 189 | "2 2023 37.54827 -121.98857 \n", 190 | "3 2023 12.97194 77.59369 \n", 191 | "4 2023 37.77493 -122.41942 " 192 | ] 193 | }, 194 | "execution_count": 4, 195 | "metadata": {}, 196 | "output_type": "execute_result" 197 | } 198 | ], 199 | "source": [ 200 | "df = pd.read_excel('/Users/sebastian/Documents/code-things/Group019_WI24/data/tech_layoffs.xlsx')\n", 201 | "df.head()" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "## Correcting data types\n", 209 | "\n", 210 | "In order to prepare the data for exploratory analysis, we are going to correct the datatypes such that they are easily usable by plotting functions." 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 10, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/plain": [ 221 | "# int64\n", 222 | "Company object\n", 223 | "Location_HQ object\n", 224 | "Country object\n", 225 | "Continent object\n", 226 | "Laid_Off int64\n", 227 | "Date_layoffs datetime64[ns]\n", 228 | "Percentage float64\n", 229 | "Company_Size_before_Layoffs int64\n", 230 | "Company_Size_after_layoffs int64\n", 231 | "Industry object\n", 232 | "Stage object\n", 233 | "Money_Raised_in_$_mil object\n", 234 | "Year int64\n", 235 | "lat float64\n", 236 | "lng float64\n", 237 | "Funding float64\n", 238 | "dtype: object" 239 | ] 240 | }, 241 | "execution_count": 10, 242 | "metadata": {}, 243 | "output_type": "execute_result" 244 | } 245 | ], 246 | "source": [ 247 | "df.dtypes" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 9, 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "data": { 257 | "text/plain": [ 258 | "0 1700.0\n", 259 | "1 733.0\n", 260 | "2 116.0\n", 261 | "3 500.0\n", 262 | "4 15000.0\n", 263 | "Name: Funding, dtype: float64" 264 | ] 265 | }, 266 | "execution_count": 9, 267 | "metadata": {}, 268 | "output_type": "execute_result" 269 | } 270 | ], 271 | "source": [ 272 | "df['Funding'] = df['Money_Raised_in_$_mil'].apply(lambda s: np.float64(s[1:])) \n", 273 | "df['Funding'].head()" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [] 282 | } 283 | ], 284 | "metadata": { 285 | "kernelspec": { 286 | "display_name": "dsc80", 287 | "language": "python", 288 | "name": "python3" 289 | }, 290 | "language_info": { 291 | "codemirror_mode": { 292 | "name": "ipython", 293 | "version": 3 294 | }, 295 | "file_extension": ".py", 296 | "mimetype": "text/x-python", 297 | "name": "python", 298 | "nbconvert_exporter": "python", 299 | "pygments_lexer": "ipython3", 300 | "version": "3.8.16" 301 | } 302 | }, 303 | "nbformat": 4, 304 | "nbformat_minor": 2 305 | } 306 | -------------------------------------------------------------------------------- /Individual Uploads/McKayla_final_section_outline.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "import numpy as np\n" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "## Loading in data" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 4, 23 | "metadata": {}, 24 | "outputs": [ 25 | { 26 | "data": { 27 | "text/html": [ 28 | "
\n", 29 | "\n", 42 | "\n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | "
#CompanyLocation_HQCountryContinentLaid_OffDate_layoffsPercentageCompany_Size_before_LayoffsCompany_Size_after_layoffsIndustryStageMoney_Raised_in_$_milYearlatlng
03ShareChatBengaluruIndiaAsia2002023-12-2015.013331133ConsumerSeries H$1700202312.9719477.59369
14InSightecHaifaIsraelAsia1002023-12-1920.0500400HealthcareUnknown$733202332.8184134.98850
26Enphase EnergySan Francisco Bay AreaUSANorth America3502023-12-1810.035003150EnergyPost-IPO$116202337.54827-121.98857
37UdaanBengaluruIndiaAsia1002023-12-1810.01000900RetailUnknown1500202312.9719477.59369
414CruiseSan Francisco Bay AreaUSANorth America9002023-12-1424.037502850TransportationAcquired$15000202337.77493-122.41942
\n", 162 | "
" 163 | ], 164 | "text/plain": [ 165 | " # Company Location_HQ Country Continent \\\n", 166 | "0 3 ShareChat Bengaluru India Asia \n", 167 | "1 4 InSightec Haifa Israel Asia \n", 168 | "2 6 Enphase Energy San Francisco Bay Area USA North America \n", 169 | "3 7 Udaan Bengaluru India Asia \n", 170 | "4 14 Cruise San Francisco Bay Area USA North America \n", 171 | "\n", 172 | " Laid_Off Date_layoffs Percentage Company_Size_before_Layoffs \\\n", 173 | "0 200 2023-12-20 15.0 1333 \n", 174 | "1 100 2023-12-19 20.0 500 \n", 175 | "2 350 2023-12-18 10.0 3500 \n", 176 | "3 100 2023-12-18 10.0 1000 \n", 177 | "4 900 2023-12-14 24.0 3750 \n", 178 | "\n", 179 | " Company_Size_after_layoffs Industry Stage Money_Raised_in_$_mil \\\n", 180 | "0 1133 Consumer Series H $1700 \n", 181 | "1 400 Healthcare Unknown $733 \n", 182 | "2 3150 Energy Post-IPO $116 \n", 183 | "3 900 Retail Unknown 1500 \n", 184 | "4 2850 Transportation Acquired $15000 \n", 185 | "\n", 186 | " Year lat lng \n", 187 | "0 2023 12.97194 77.59369 \n", 188 | "1 2023 32.81841 34.98850 \n", 189 | "2 2023 37.54827 -121.98857 \n", 190 | "3 2023 12.97194 77.59369 \n", 191 | "4 2023 37.77493 -122.41942 " 192 | ] 193 | }, 194 | "execution_count": 4, 195 | "metadata": {}, 196 | "output_type": "execute_result" 197 | } 198 | ], 199 | "source": [ 200 | "df = pd.read_excel('/Users/sebastian/Documents/code-things/Group019_WI24/data/tech_layoffs.xlsx')\n", 201 | "df.head()" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "## Correcting data types\n", 209 | "\n", 210 | "In order to prepare the data for exploratory analysis, we are going to correct the datatypes such that they are easily usable by plotting functions." 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 10, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/plain": [ 221 | "# int64\n", 222 | "Company object\n", 223 | "Location_HQ object\n", 224 | "Country object\n", 225 | "Continent object\n", 226 | "Laid_Off int64\n", 227 | "Date_layoffs datetime64[ns]\n", 228 | "Percentage float64\n", 229 | "Company_Size_before_Layoffs int64\n", 230 | "Company_Size_after_layoffs int64\n", 231 | "Industry object\n", 232 | "Stage object\n", 233 | "Money_Raised_in_$_mil object\n", 234 | "Year int64\n", 235 | "lat float64\n", 236 | "lng float64\n", 237 | "Funding float64\n", 238 | "dtype: object" 239 | ] 240 | }, 241 | "execution_count": 10, 242 | "metadata": {}, 243 | "output_type": "execute_result" 244 | } 245 | ], 246 | "source": [ 247 | "df.dtypes" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 9, 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "data": { 257 | "text/plain": [ 258 | "0 1700.0\n", 259 | "1 733.0\n", 260 | "2 116.0\n", 261 | "3 500.0\n", 262 | "4 15000.0\n", 263 | "Name: Funding, dtype: float64" 264 | ] 265 | }, 266 | "execution_count": 9, 267 | "metadata": {}, 268 | "output_type": "execute_result" 269 | } 270 | ], 271 | "source": [ 272 | "df['Funding'] = df['Money_Raised_in_$_mil'].apply(lambda s: np.float64(s[1:])) \n", 273 | "df['Funding'].head()" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [] 282 | } 283 | ], 284 | "metadata": { 285 | "kernelspec": { 286 | "display_name": "dsc80", 287 | "language": "python", 288 | "name": "python3" 289 | }, 290 | "language_info": { 291 | "codemirror_mode": { 292 | "name": "ipython", 293 | "version": 3 294 | }, 295 | "file_extension": ".py", 296 | "mimetype": "text/x-python", 297 | "name": "python", 298 | "nbconvert_exporter": "python", 299 | "pygments_lexer": "ipython3", 300 | "version": "3.8.16" 301 | } 302 | }, 303 | "nbformat": 4, 304 | "nbformat_minor": 2 305 | } 306 | -------------------------------------------------------------------------------- /DataCheckpoint_Group019_WI24.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback** \n", 8 | "\n", 9 | "Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.\n", 10 | "\n", 11 | "Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "# COGS 108 - Data Checkpoint" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "# Names\n", 26 | "\n", 27 | "- McKayla David\n", 28 | "- Sebastian Modafferi\n", 29 | "- Anna Potapenko\n", 30 | "- Matthew Chan\n", 31 | "- Kirthin Rajkumar\n" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "# Research Question" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "- Include a specific, clear data science question.\n", 46 | "- Make sure what you're measuring (variables) to answer the question is clear\n", 47 | "\n", 48 | "What is your research question? Include the specific question you're setting out to answer. This question should be specific, answerable with data, and clear. A general question with specific subquestions is permitted. (1-2 sentences)\n", 49 | "\n", 50 | "On a global economic scale, is a larger (on the basis of funding and quantity of employees) company more likely to lay off employees than a smaller company? If not, what indicators can be viewed to analyze what influences company layoffs?\n" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## Background and Prior Work" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "Layoffs refer to economic and organizational changes, and are a significant indicator of the success and development of companies. While they affect employees, layoff rates have a broader implication on the health of the economy, industry trends, and the state of the workforce. Understanding the trends and impacts of layoffs is vital not only on a global economic scale, but within communities of stakeholders and individual communities.\n", 65 | "\n", 66 | "Layoffs are important to study due to their relevance to both companies and employees. For a company, understanding indicators which can predict imminent layoffs can help them course correct before reaching a point of no return. On the other hand, employees understanding layoff indicators can help them in choosing the correct company for their next role, ensuring job safety. \n", 67 | "\n", 68 | "Research published in the Journal of the European Economic Association [1^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) explored the economic influences that cause layoffs, and inquired into how financial health and market factors influence layoff decisions. A similar study published in Journal of Labor Economics looks into the effects of layoffs on unemployment rates, and found that layoffs can have lasting impact on the job market and employee career trajectory.\n", 69 | "\n", 70 | "The journal of Labor Empirical Finance [3^](#https://doi.org/10.1016/s0927-5398\\(01\\)00024-x) also looks into the different firms and what caused their layoffs, giving insight into company restructuring and different technologies that help to modify the workforce requirements. Additionally, past precedent reviewed by JSTOR [2^](#https://www.jstor.org/stable/117002?casa_token=m7s1bFw7mY4AAAAA%3AhaYXwJWsj5E0Xo7vbnjns6omvUnSFYlenLVZ99nBhONKkQRCLyfLIdEk3ZJycob9If4HtLaMga7y7cQzrzAO6QfJYXTkccHfVciVYhTXREH7HSHuGN4) article explains the repetition of layoffs and how it correlates with economic cycles. This suggests that layoffs are an essential part of economic growth.\n", 71 | "\n", 72 | "Research on layoffs adopts an interdisciplinary approach, using economic theories, organizational behavior, and societal impacts. Overall, it is imperative to understand the factors that influence layoffs because knowledge about these factors can help researchers develop strategies to mitigate the negative effects of layoffs on employees and the economy at large. Existing work does not provide internal indicators for when a company is about to execute layoffs, so our research seeks to identify a correlation between company size and layoffs.\n", 73 | "\n", 74 | "1. [^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) Gathmann, C., Helm, I., & Schönberg, U. (2018). Spillover effects of mass layoffs. Journal of the European Economic Association, 18(1), 427–468. https://doi.org/10.1093/jeea/jvy045\n", 75 | "2. [^](#cite_ref-2) Hallock, Kevin, (1998). Layoffs, top executive pay, and firm performance on JSTOR. (n.d.). www.jstor.org. https://www.jstor.org/stable/117002\n", 76 | "3. [^](#https://doi.org/10.1016/s0927-5398\\(01\\)00024-x) Chen, P., Mehrotra, V., Sivakumar, R., & Yu, W. (2001). Layoffs, shareholders’ wealth, and corporate performance. Journal of Empirical Finance, 8(2), 171–199. https://doi.org/10.1016/s0927-5398(01)00024-x\n" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "# Hypothesis\n" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "We hypothesize that larger companies are more likely to lay off employees (especially amidst a recession) than smaller companies. We are inclined to believe this due to the fact that smaller companies already have less employees, so lay-offs are more likely to harm the business than benefit it. Additionally, larger companies are able to withstand more financial pressure, allowing them to perform large layoffs despite the impact on company performance given that they have enough capital with withstand the losses.\n" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "# Data" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "## Data overview\n", 105 | "\n", 106 | "For each dataset include the following information\n", 107 | "- Dataset #1 - Kaggle\n", 108 | " - Dataset Name: \"Tech Layoffs 2020-2024\"\n", 109 | " - https://www.kaggle.com/datasets/ulrikeherold/tech-layoffs-2020-2024\n", 110 | " - Number of observations: 1418\n", 111 | " - Number of variables: 16\n", 112 | "\n", 113 | "This dataset was webscraped from layoffs.fyi. It contains layoff data over the past 4 years which was webscraped from news articles. The key data variables we will be using are `Money_Raised_in_$_mil`, `Percentage`, `Laid_Off`, `Funding`, and `Stage`. We are focusing analysis on these columns because they contain vital information about layoffs and how the company is performing. It comes fairly clean, and the only correction required is the `Money_Raised_in_$_mil` column, as it initally was stored as a string containing a dollar sign character." 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "## Dataset #1 (Layoffs.fyi)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 7, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "data": { 130 | "text/html": [ 131 | "
\n", 132 | "\n", 145 | "\n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | "
#CompanyLocation_HQCountryContinentLaid_OffDate_layoffsPercentageCompany_Size_before_LayoffsCompany_Size_after_layoffsIndustryStageMoney_Raised_in_$_milYearlatlng
03ShareChatBengaluruIndiaAsia2002023-12-2015.013331133ConsumerSeries H$1700202312.9719477.59369
14InSightecHaifaIsraelAsia1002023-12-1920.0500400HealthcareUnknown$733202332.8184134.98850
26Enphase EnergySan Francisco Bay AreaUSANorth America3502023-12-1810.035003150EnergyPost-IPO$116202337.54827-121.98857
37UdaanBengaluruIndiaAsia1002023-12-1810.01000900RetailUnknown1500202312.9719477.59369
414CruiseSan Francisco Bay AreaUSANorth America9002023-12-1424.037502850TransportationAcquired$15000202337.77493-122.41942
\n", 265 | "
" 266 | ], 267 | "text/plain": [ 268 | " # Company Location_HQ Country Continent \\\n", 269 | "0 3 ShareChat Bengaluru India Asia \n", 270 | "1 4 InSightec Haifa Israel Asia \n", 271 | "2 6 Enphase Energy San Francisco Bay Area USA North America \n", 272 | "3 7 Udaan Bengaluru India Asia \n", 273 | "4 14 Cruise San Francisco Bay Area USA North America \n", 274 | "\n", 275 | " Laid_Off Date_layoffs Percentage Company_Size_before_Layoffs \\\n", 276 | "0 200 2023-12-20 15.0 1333 \n", 277 | "1 100 2023-12-19 20.0 500 \n", 278 | "2 350 2023-12-18 10.0 3500 \n", 279 | "3 100 2023-12-18 10.0 1000 \n", 280 | "4 900 2023-12-14 24.0 3750 \n", 281 | "\n", 282 | " Company_Size_after_layoffs Industry Stage Money_Raised_in_$_mil \\\n", 283 | "0 1133 Consumer Series H $1700 \n", 284 | "1 400 Healthcare Unknown $733 \n", 285 | "2 3150 Energy Post-IPO $116 \n", 286 | "3 900 Retail Unknown 1500 \n", 287 | "4 2850 Transportation Acquired $15000 \n", 288 | "\n", 289 | " Year lat lng \n", 290 | "0 2023 12.97194 77.59369 \n", 291 | "1 2023 32.81841 34.98850 \n", 292 | "2 2023 37.54827 -121.98857 \n", 293 | "3 2023 12.97194 77.59369 \n", 294 | "4 2023 37.77493 -122.41942 " 295 | ] 296 | }, 297 | "execution_count": 7, 298 | "metadata": {}, 299 | "output_type": "execute_result" 300 | } 301 | ], 302 | "source": [ 303 | "import pandas as pd\n", 304 | "import numpy as np\n", 305 | "df = pd.read_excel('./data/tech_layoffs.xlsx')\n", 306 | "df.head()" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 9, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [ 315 | "#remove company name\n", 316 | "df = df.drop(columns=['Company'])" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 10, 322 | "metadata": {}, 323 | "outputs": [ 324 | { 325 | "data": { 326 | "text/plain": [ 327 | "dtype('O')" 328 | ] 329 | }, 330 | "execution_count": 10, 331 | "metadata": {}, 332 | "output_type": "execute_result" 333 | } 334 | ], 335 | "source": [ 336 | "df['Money_Raised_in_$_mil'].dtypes" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 11, 342 | "metadata": {}, 343 | "outputs": [ 344 | { 345 | "data": { 346 | "text/plain": [ 347 | "(1418, 16)" 348 | ] 349 | }, 350 | "execution_count": 11, 351 | "metadata": {}, 352 | "output_type": "execute_result" 353 | } 354 | ], 355 | "source": [ 356 | "#clean the Money_Raised_in_$_mil column to be float instead of string\n", 357 | "df['Funding'] = df['Money_Raised_in_$_mil'].apply(lambda s: np.float64(s[1:])) \n", 358 | "df['Funding'].head()\n", 359 | "df.shape" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "# Ethics & Privacy" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "##### Potential ethical concerns and how we plan to address them: \n", 374 | "Our dataset is webscraped from Layoffs.fyi, which contains explicit personal information on individuals who were laid off. Without explicit documentation of informed consent, for the sake of privacy conservation, we will be omitting this information and focusing on the metadata (corporations over the individual). Additionally, Layoffs.fyi only pulls data from news articles, so it is a biased sample that is pulled from data that is only accessible to the public. This dataset is primarily constructed by data contained to the USA, which effectively neglects layoffs that occur in other regions of the world, leading to potentially biased analysis and results. As a result of unsatisfactory observations from foreign countries, we will be orienting our data analysis in the context of the USA's economy. However, we will still include models and representations of non-US observations to provide scope and a point of reference to our data. The timeframe of our data is 2020-2024, which unfortunately excludes a larger historical context regarding layoffs, compounding potential bias and lack of scope. Due to this, our analysis will be further oriented towards a COVID and post-COVID economy." 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "# Team Expectations " 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "We expect our team members to be reliable in terms of completing one’s own work/contributions. They should maintain open communication between team members and are expected to communicate any scheduling conflicts for team meetings. They are still expected to complete their work before the meeting even if they are not able to make it. During team meetings, we expect all members to be actively contributing to discussion, and to be professional when discussing conflicts between ideas. Each member will be assigned tasks by the end of the team meeting, and they are expected to arrive to the next team meeting with their task completed sufficiently, and uploaded to the repository, such that we are able to discuss progress and any issues we ran into." 389 | ] 390 | }, 391 | { 392 | "cell_type": "markdown", 393 | "metadata": {}, 394 | "source": [ 395 | "# Project Timeline Proposal" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "| Meeting Date | Meeting Time| Completed Before Meeting | Discuss at Meeting |\n", 403 | "|---|---|---|---|\n", 404 | "| 02/04 | 1 PM | Read previous COGS 108 Final Projects | Complete previous quarters’ COGS 108 Final Project Analysis, plan meeting times, begin discussing project topics. | \n", 405 | "| 02/05 | 1 PM | Brainstorm project topics, potential data sources, and viability of research questions | Discuss and decide on final project topic; discuss hypothesis; begin background research Discuss ideal dataset(s) and ethics; draft project proposal| \n", 406 | "| 02/11 | 1 PM | Edit, finalize, and submit proposal; Search for datasets | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part |\n", 407 | "| 02/18 | 1 PM | Delegate Tasks and start wrangling | Go over what everyone has done. Make edits or revise things before. Also go over revisions and feedback from the proposal. |\n", 408 | "| 02/25 | 1 PM | Finalize wrangling/EDA; Begin Analysis | Meet for Checkpoint #1 |\n", 409 | "| 03/03 | 1 PM | Finalize Data Viz and EDA; Begin Analysis | Meet for Checkpoint #2 |\n", 410 | "| 03/10 | 1 PM | Finalize quantitative analysis; Discuss approach to final video submission | Meet for video and final submission semantics |\n", 411 | "| 03/13 | 12 PM | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |\n", 412 | "| 03/20 | Before 11:59 PM | NA | Turn in Final Project & Group Project Surveys |" 413 | ] 414 | } 415 | ], 416 | "metadata": { 417 | "kernelspec": { 418 | "display_name": "Python 3 (ipykernel)", 419 | "language": "python", 420 | "name": "python3" 421 | }, 422 | "language_info": { 423 | "codemirror_mode": { 424 | "name": "ipython", 425 | "version": 3 426 | }, 427 | "file_extension": ".py", 428 | "mimetype": "text/x-python", 429 | "name": "python", 430 | "nbconvert_exporter": "python", 431 | "pygments_lexer": "ipython3", 432 | "version": "3.12.2" 433 | } 434 | }, 435 | "nbformat": 4, 436 | "nbformat_minor": 2 437 | } 438 | --------------------------------------------------------------------------------