├── .DS_Store
├── data
└── tech_layoffs.xlsx
├── README.md
├── Outlines
├── final_section_outline.ipynb
├── layoffs_fyi_cleaner.ipynb
├── Kirthin_final_section_outline.ipynb
├── final_outline_anna.ipynb
├── final_section_seb.ipynb
└── template.ipynb
├── .gitignore
├── ProjectProposal_Group019_WI24.ipynb
├── Individual Uploads
└── McKayla_final_section_outline.ipynb
└── DataCheckpoint_Group019_WI24.ipynb
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/COGS108/Group019_WI24/master/.DS_Store
--------------------------------------------------------------------------------
/data/tech_layoffs.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/COGS108/Group019_WI24/master/data/tech_layoffs.xlsx
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | This is your group repo for your final project for COGS108.
2 |
3 | This repository is private, and is only visible to the course instructors and your group mates; it is not visible to anyone else.
4 |
5 | Template notebooks for each component are provided. Only work on the notebook prior to its due date. After each submission is due, move onto the next notebook (For example, after the proposal is due, start working in the Data Checkpoint notebook).
6 |
7 | This repository will be frozen on the final project due date. No further changes can be made after that time.
8 |
9 | Your project proposal and final project will be graded based solely on the corresponding project notebooks in this repository.
10 |
11 | Template Jupyter notebooks have been included, with your group number replacing the XXX in the following file names. For each due date, make sure you have a notebook present in this repository by each due date with the following name (where XXX is replaced by your group number):
12 |
13 | - `ProjectProposal_groupXXX.ipynb`
14 | - `DataCheckpoint_groupXXX.ipynb`
15 | - `EDACheckpoint_groupXXX.ipynb`
16 | - `FinalProject_groupXXX.ipynb`
17 |
18 | This is *your* repo. You are free to manage the repo as you see fit, edit this README, add data files, add scripts, etc. So long as there are the four files above on due dates with the required information, the rest is up to you all.
19 |
20 | Also, you are free and encouraged to share this project after the course and to add it to your portfolio. Just be sure to fork it to your GitHub at the end of the quarter!
21 |
--------------------------------------------------------------------------------
/Outlines/final_section_outline.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Title"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "Some text explaining:\n",
15 | " - Which variable specifically we are looking into\n",
16 | " - It's relationship to the hypothesis\n",
17 | " - What we expect to find"
18 | ]
19 | },
20 | {
21 | "cell_type": "code",
22 | "execution_count": null,
23 | "metadata": {},
24 | "outputs": [],
25 | "source": [
26 | "# TODO: Exploratory plot 1"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "Some text explaining what this plot shows and how it effectively demonstrates the variable we are looking into"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": null,
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "# TODO: (OPTIONAL) Exploratory plot 2"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "Some text explaining what this plot shows and how it effectively demonstrates the variable we are looking into"
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "Set up Null hypothesis and Alternative hypothesis\n",
57 | " - Null: \n",
58 | " - Alternative: "
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "$ H_0: \\mu_{\\text{purebred}} = \\mu_{\\text{mixed-breed}} $\n",
66 | "\n",
67 | "$ H_1: \\mu_{\\text{purebred}} < \\mu_{\\text{mixed-breed}} $\n",
68 | "\n",
69 | "note: smaller adoption speed means faster"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": null,
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "# TODO: Perform 1 stats test (don't print entire output of a model, only the p-value)"
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "Use p value to demonstrate what this says about our expectations from the beginning of the section. 1-2 sentences, relate it to the hypothesis"
86 | ]
87 | }
88 | ],
89 | "metadata": {
90 | "language_info": {
91 | "name": "python"
92 | }
93 | },
94 | "nbformat": 4,
95 | "nbformat_minor": 2
96 | }
97 |
--------------------------------------------------------------------------------
/Outlines/layoffs_fyi_cleaner.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import json\n",
10 | "d = json.load(open(\"data/resp.json\"))\n",
11 | "jtab = d[\"data\"][\"table\"]\n",
12 | "j_entries = jtab[\"rows\"]\n",
13 | "# id createdTime cellValuesByColumnId...\n",
14 | "for i in j_entries:\n",
15 | " i.update(i[\"cellValuesByColumnId\"])\n",
16 | " i.pop(\"cellValuesByColumnId\")\n",
17 | " \n",
18 | "df2 = pd.DataFrame.from_records(j_entries)\n",
19 | "[i[\"name\"] for i in jtab[\"columns\"]]\n"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": null,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": [
28 | "## don't run this cell twice please.\n",
29 | "# map columns to their eng names\n",
30 | "df2cols = {}\n",
31 | "for i in jtab[\"columns\"]:\n",
32 | " # id: company\n",
33 | " df2cols[i[\"id\"]] = i[\"name\"]\n",
34 | "# df2cols\n",
35 | "\n",
36 | "df2c = list(df2.columns)\n",
37 | "\n",
38 | "for i in range(len(df2c)):\n",
39 | " if df2c[i] in df2cols:\n",
40 | " df2c[i] = df2cols[df2c[i]]\n",
41 | "\n",
42 | "df2.columns = df2c\n",
43 | "# df2\n",
44 | "\n",
45 | "lmap = jtab[\"columns\"][1][\"typeOptions\"][\"choices\"]\n",
46 | "df2[\"Location HQ\"] = df2[\"Location HQ\"].apply(lambda x: \" \".join([lmap[i][\"name\"] if i in lmap else i for i in x]) if (isinstance(x, list)) else print(\"weird value in loc hq\", x))\n",
47 | "lmap = jtab[\"columns\"][8][\"typeOptions\"][\"choices\"]\n",
48 | "df2[\"Stage\"] = df2[\"Stage\"].apply(lambda x: (lmap[x][\"name\"] if x in lmap else x) if (isinstance(x, str)) else print(\"weird value in stage\", x))\n",
49 | "lmap = jtab[\"columns\"][10][\"typeOptions\"][\"choices\"]\n",
50 | "df2[\"Country\"] = df2[\"Country\"].apply(lambda x: (lmap[x][\"name\"] if x in lmap else x) if (isinstance(x, str)) else print(\"weird value in country\", x))\n",
51 | "lmap = jtab[\"columns\"][5][\"typeOptions\"][\"choices\"]\n",
52 | "df2[\"Industry\"] = df2[\"Industry\"].apply(lambda x: (lmap[x][\"name\"] if x in lmap else x) if (isinstance(x, str)) else print(\"weird value in industry\", x))\n",
53 | "df2.head()"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": null,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "df2.to_csv(\"data/layoffs_fyi.csv\")"
63 | ]
64 | }
65 | ],
66 | "metadata": {
67 | "language_info": {
68 | "name": "python"
69 | }
70 | },
71 | "nbformat": 4,
72 | "nbformat_minor": 2
73 | }
74 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | share/python-wheels/
24 | *.egg-info/
25 | .installed.cfg
26 | *.egg
27 | MANIFEST
28 |
29 | # PyInstaller
30 | # Usually these files are written by a python script from a template
31 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
32 | *.manifest
33 | *.spec
34 |
35 | # Installer logs
36 | pip-log.txt
37 | pip-delete-this-directory.txt
38 |
39 | # Unit test / coverage reports
40 | htmlcov/
41 | .tox/
42 | .nox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *.cover
49 | *.py,cover
50 | .hypothesis/
51 | .pytest_cache/
52 | cover/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | .pybuilder/
76 | target/
77 |
78 | # Jupyter Notebook
79 | .ipynb_checkpoints
80 |
81 | # IPython
82 | profile_default/
83 | ipython_config.py
84 |
85 | # pyenv
86 | # For a library or package, you might want to ignore these files since the code is
87 | # intended to run in multiple environments; otherwise, check them in:
88 | # .python-version
89 |
90 | # pipenv
91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
94 | # install all needed dependencies.
95 | #Pipfile.lock
96 |
97 | # poetry
98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99 | # This is especially recommended for binary packages to ensure reproducibility, and is more
100 | # commonly ignored for libraries.
101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102 | #poetry.lock
103 |
104 | # pdm
105 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106 | #pdm.lock
107 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108 | # in version control.
109 | # https://pdm.fming.dev/#use-with-ide
110 | .pdm.toml
111 |
112 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113 | __pypackages__/
114 |
115 | # Celery stuff
116 | celerybeat-schedule
117 | celerybeat.pid
118 |
119 | # SageMath parsed files
120 | *.sage.py
121 |
122 | # Environments
123 | .env
124 | .venv
125 | env/
126 | venv/
127 | ENV/
128 | env.bak/
129 | venv.bak/
130 |
131 | # Spyder project settings
132 | .spyderproject
133 | .spyproject
134 |
135 | # Rope project settings
136 | .ropeproject
137 |
138 | # mkdocs documentation
139 | /site
140 |
141 | # mypy
142 | .mypy_cache/
143 | .dmypy.json
144 | dmypy.json
145 |
146 | # Pyre type checker
147 | .pyre/
148 |
149 | # pytype static type analyzer
150 | .pytype/
151 |
152 | # Cython debug symbols
153 | cython_debug/
154 |
155 | # PyCharm
156 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
157 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158 | # and can be added to the global gitignore or merged into this file. For a more nuclear
159 | # option (not recommended) you can uncomment the following to ignore the entire idea folder.
160 | #.idea/
161 |
162 | #testing code
163 | test.ipynb
164 |
--------------------------------------------------------------------------------
/Outlines/Kirthin_final_section_outline.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Correlation Between Money Raised in Millions and Percentage Laid Off"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "In this section we are looking at the correlation between the money a company raises and the percentage of their staff, they laid off. We expect larger companies to layoff a higher percentage of employees than smaller companies because they will be able to raise more money and handle these layoffs, while the small companies would need their staff."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 1,
20 | "metadata": {},
21 | "outputs": [
22 | {
23 | "ename": "NameError",
24 | "evalue": "name 'plt' is not defined",
25 | "output_type": "error",
26 | "traceback": [
27 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
28 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
29 | "Cell \u001b[0;32mIn[1], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m plt\u001b[38;5;241m.\u001b[39mscatter(df[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mFunding\u001b[39m\u001b[38;5;124m'\u001b[39m], df[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mPercentage\u001b[39m\u001b[38;5;124m'\u001b[39m])\n\u001b[1;32m 2\u001b[0m plt\u001b[38;5;241m.\u001b[39mxlabel(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mFunding\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 3\u001b[0m plt\u001b[38;5;241m.\u001b[39mylabel(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mPercentage\u001b[39m\u001b[38;5;124m'\u001b[39m)\n",
30 | "\u001b[0;31mNameError\u001b[0m: name 'plt' is not defined"
31 | ]
32 | }
33 | ],
34 | "source": [
35 | "# Assuming df is your DataFrame containing the data\n",
36 | "x = df['Funding']\n",
37 | "y = df['Percentage']\n",
38 | "\n",
39 | "# Define the number of bins for each axis\n",
40 | "bins_x = np.linspace(min(x), 900, 20)\n",
41 | "bins_y = np.linspace(min(y), max(y), 20)\n",
42 | "\n",
43 | "# Create 2D histogram\n",
44 | "hist, x_edges, y_edges = np.histogram2d(x, y, bins=(bins_x, bins_y))\n",
45 | "\n",
46 | "# Create meshgrid of x and y values for plotting\n",
47 | "x_grid, y_grid = np.meshgrid(x_edges, y_edges)\n",
48 | "\n",
49 | "# Plot the 2D histogram using a pcolormesh\n",
50 | "plt.figure(figsize=(8, 6))\n",
51 | "plt.pcolormesh(x_grid, y_grid, hist.T, cmap='Blues')\n",
52 | "plt.colorbar(label='Frequency')\n",
53 | "\n",
54 | "# Add labels and title\n",
55 | "plt.xlabel('Funding')\n",
56 | "plt.ylabel('Percentage')\n",
57 | "plt.title('2D Histogram of Funding and Percentage')\n",
58 | "\n",
59 | "\n",
60 | "# Show the plot\n",
61 | "plt.show()"
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "The 2D histogram we created depicts the correlation between the two variables. We chose to remove outliers because a majority of our data comes from smaller companies relative to others on the list. From our analysis, it appears there is a negative correlation between the two variables, though it is not a strong relationship. Furthermore, we observe that as a company receives more funding, the likelihood of layoffs decreases. The majority of the darker shaded blue squares are concentrated on the left side of the plot, where funding is lower."
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": null,
74 | "metadata": {},
75 | "outputs": [],
76 | "source": [
77 | "no_outliers = df[df.get('Funding') < 120000]\n",
78 | "outcome, predictors = patsy.dmatrices('Percentage ~ Funding', no_outliers)\n",
79 | "model = sm.OLS(outcome, predictors)\n",
80 | "results = model.fit()\n",
81 | "slope = results.params[1]\n",
82 | "print(\"P-value:\", results.pvalues[1])\n",
83 | "print(\"T-test:\", results.tvalues[1]) \n",
84 | "print(\"Slope:\", slope)"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "The results of the t-test revealed a statistically significant difference between the means of the two groups (p = 0.0124). Given the obtained p-value of 0.0124, which is less than the conventional significance level of 0.05, we can reject the null hypothesis and conclude that there is a significant difference between the groups.\n",
92 | "\n",
93 | "The negative t-statistic (-2.50) indicates that the mean of 'Percentage' is lower than the mean of 'Funding'. This suggests that there is a meaningful difference in the outcome variable between the two groups, with 'Percentage' exhibiting lower values compared to 'Funding'.\n",
94 | "\n",
95 | "Furthermore, the negative slope of the line indicates a negative correlation between the two variables, as highlighted in the 2D Histogram. This corroborates the observed difference between the means and provides additional evidence of the relationship between 'Percentage' and 'Funding'."
96 | ]
97 | }
98 | ],
99 | "metadata": {
100 | "kernelspec": {
101 | "display_name": "Python 3 (ipykernel)",
102 | "language": "python",
103 | "name": "python3"
104 | },
105 | "language_info": {
106 | "codemirror_mode": {
107 | "name": "ipython",
108 | "version": 3
109 | },
110 | "file_extension": ".py",
111 | "mimetype": "text/x-python",
112 | "name": "python",
113 | "nbconvert_exporter": "python",
114 | "pygments_lexer": "ipython3",
115 | "version": "3.11.5"
116 | }
117 | },
118 | "nbformat": 4,
119 | "nbformat_minor": 2
120 | }
121 |
--------------------------------------------------------------------------------
/Outlines/final_outline_anna.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Relationship between Company Size Before Layoffs and Percentage of Employees Laid Off"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "In this section we are looking at the effect of company size before layoffs and the percentage of employees laid off, which directly relates to our hypothesis. We expect larger companies to layoff a higher percentage of employees than smaller companies."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "# Plotting scatterplot comparing Company Size Before Layoffs and Percentage Laid Off\n",
24 | "plt.figure(figsize=(12,6))\n",
25 | "sns.scatterplot(x='Company_Size_before_Layoffs', y='Percentage', data=df, s=100)\n",
26 | "\n",
27 | "# Plotting line of best fit\n",
28 | "sns.regplot(x='Company_Size_before_Layoffs', y='Percentage', data=df, scatter=False, color='blue')\n",
29 | "\n",
30 | "plt.title('Company Size Before Layoffs vs. Percentage of Employees Laid Off', fontsize=20)\n",
31 | "plt.xlabel('Company Size Before Layoffs', fontsize=14)\n",
32 | "plt.ylabel('Percentage of Layoffs', fontsize=14)\n",
33 | "plt.show()"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "According to this scatterplot, there is a relationship between company size before layoffs and the percentage of employees laid off. However, this relationship appears negative, such that, as the size of the company before layoffs increases, the percentage of layoffs decreases. This contradicts our hypothesis and expectation that larger companies will have a higher percentage of layoffs. Given this initial relationship, we would like to measure the strength of association between company size before layoffs and the percentage of employees laid off."
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "#Creating a heat map to determine the different correlation coefficients for Company Size Before Layoffs, Percentage of Layoffs, Company Size After Layoffs, Money Raised in Million, and Year\n",
50 | "heatmap_data = df[['Company_Size_before_Layoffs', 'Percentage', 'Company_Size_after_layoffs', 'Money_Raised_in_$_mil', 'Year']]\n",
51 | "\n",
52 | "correlation_matrix = heatmap_data.corr()\n",
53 | "\n",
54 | "plt.figure(figsize=(12,6))\n",
55 | "sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=\".2f\")\n",
56 | "\n",
57 | "plt.title('Heatmap: Correlation Matrix of Variables')\n",
58 | "plt.show()"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "According to this heat map, there is a weak negative correlation between company size before layoffs and the percentage of layoffs (r = -0.11). This suggests that the company size before layoffs has only a minor impact on the percentage of layoffs. It seems that the correlation coefficient is the highest between the year of the layoffs and the percentage of layoffs. Given this week negative correlation, we are interested to see if there is a significant relationship between company size before layoffs and the percentage of employees laid off."
66 | ]
67 | },
68 | {
69 | "cell_type": "markdown",
70 | "metadata": {},
71 | "source": [
72 | "$H_o$: There is no relationship between company size before layoffs and the percentage of employees laid off ($\\beta = 0$)\n",
73 | "\n",
74 | "$H_a$: There is a relationship between company size before layoffs and the percentage of employees laid off ($\\beta \\ne 0$)"
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": null,
80 | "metadata": {},
81 | "outputs": [],
82 | "source": [
83 | "# Finding the p-value and t-value using the OLS regression\n",
84 | "outcome, predictors = patsy.dmatrices('Percentage ~ Company_Size_before_Layoffs', df)\n",
85 | "model = sm.OLS(outcome, predictors)\n",
86 | "results = model.fit()\n",
87 | "print(\"P-value:\", results.pvalues[1])\n",
88 | "print(\"T-test:\", results.tvalues[1])"
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {},
94 | "source": [
95 | "The p-value is 3.21e-05 (p < 0.05) and the t-value is -4.171, which demonstrates a statistically significant relationship between company size before layoffs and the percentage of employees laid off. Given this p-value, we reject the null hypothesis in favor of the alternative hypothesis, concluding that larger companies before layoffs have lower percentage of layoffs. \n",
96 | "\n",
97 | "Nonetheless, this finding is different from what we hypothesized, as we expected larger companies to have a higher percentage of employee layoffs. One reason for the inverse relationship is that compared to smaller companies, larger companies have more employees to begin with, so if larger companies layoff more people than smaller companies, the percentage is not going to be as high. It is also important to consider other factors, such as industry type, geographic location, and economic condiitons, that could influence the relationship between company size before layoffs and the percentage of layoffs. "
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {},
103 | "source": []
104 | }
105 | ],
106 | "metadata": {
107 | "kernelspec": {
108 | "display_name": "Python 3",
109 | "language": "python",
110 | "name": "python3"
111 | },
112 | "language_info": {
113 | "codemirror_mode": {
114 | "name": "ipython",
115 | "version": 3
116 | },
117 | "file_extension": ".py",
118 | "mimetype": "text/x-python",
119 | "name": "python",
120 | "nbconvert_exporter": "python",
121 | "pygments_lexer": "ipython3",
122 | "version": "3.10.4"
123 | }
124 | },
125 | "nbformat": 4,
126 | "nbformat_minor": 2
127 | }
128 |
--------------------------------------------------------------------------------
/Outlines/final_section_seb.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Analyzing Correlation Between Funding Stage and Percentage of Employees Laid Off"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "The funding stage column of our dataset contains information on which round of funding the company is on. Since a lot of these companies are startups, we have information from many companies across many different rounds of funding. We'd like to investigate if there is a correlation between these funding stages and the percentage of employees laid off. This correlation will help us determine if the funding stage of a company can be used to indicate if a company is more likely to perform large-scale layoffs."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": null,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "fig = plt.figure(figsize=(12, 5))\n",
24 | "sns.boxplot(x='Stage', y='Percentage', data=df, hue='Stage')\n",
25 | "plt.xticks(df['Stage'].unique(), rotation=45)\n",
26 | "plt.ylabel('Percentage Laid Off')\n",
27 | "plt.show()"
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "Looking at this plot, we can see that it appears as the companies goes through more rounds of funding, the distributions for their percentage of employees laid off tends to become smaller, and more centered between 0% - 20% of employees laid off. This is an interesting trend, so let's see if we can build a linear regressor to help identify this trend. Since the funding stage is a categorical column, we will first need to convert it into ordinal data. Ordinal data is a type of qualitative data where you can identify a clear scale between the different categories. Since funding stages happen in a sequence, and are in essence just a count of the amount of times a started has gotten funding, we can convert this to a numerical scale. To ensure it was created correctly, we will regenerate the same plot but with our numerized versions of the `Stage` column instead."
35 | ]
36 | },
37 | {
38 | "cell_type": "code",
39 | "execution_count": null,
40 | "metadata": {},
41 | "outputs": [],
42 | "source": [
43 | "series = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']\n",
44 | "stage_numerizer = dict(zip([f'Series {s}' for s in series], np.arange(1, len(series)+1, 1)))\n",
45 | "stage_numerizer['Seed'] = 0\n",
46 | "numerizer = defaultdict(lambda : np.nan)\n",
47 | "numerizer.update(stage_numerizer)\n",
48 | "clear_funding = df[['Percentage', 'Funding', 'Stage']]\n",
49 | "clear_funding.loc[:, 'Stage_numerized'] = clear_funding['Stage'].apply(lambda i: numerizer[i])\n",
50 | "\n",
51 | "fig = plt.figure(figsize=(12, 5))\n",
52 | "sns.boxplot(x='Stage_numerized', y='Percentage', data=clear_funding, hue='Stage')\n",
53 | "plt.xlabel('Stage Numerized')\n",
54 | "plt.ylabel('Percentage Laid Off')\n",
55 | "plt.legend().remove()\n",
56 | "plt.show()"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "Looks like we created it correctly. Lets see if we can perform a linear regression now and find a correlation between Stage Numerized and "
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {},
69 | "source": [
70 | "\n",
71 | "$ H_0: \\text{There is no satistically significant correlation between the Stage\\_numerized and Percentage columns} $\n",
72 | "\n",
73 | "$ H_1: \\text{There is a satistically significant correlation between the Stage\\_numerized and Percentage columns} $\n",
74 | "\n",
75 | "For this test, we will be using a p-value of 0.01. We will use a smaller p-value because we converted the data numerically, and are omitting some of the other funding stages that did not fit into this ordinal categorization. A smaller p-value will help eliminate some of the bias introduced by performing these modifications."
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": null,
81 | "metadata": {},
82 | "outputs": [],
83 | "source": [
84 | "outcome, predictors = patsy.dmatrices('Percentage ~ Stage_numerized', clear_funding)\n",
85 | "model = sm.OLS(outcome, predictors)\n",
86 | "results = model.fit()\n",
87 | "p_value = results.pvalues[1]\n",
88 | "t_value = results.tvalues[1]\n",
89 | "stages = np.sort(clear_funding['Stage_numerized'].unique())\n",
90 | "\n",
91 | "fig = plt.figure(figsize=(12, 5))\n",
92 | "sns.boxplot(x='Stage_numerized', y='Percentage', data=clear_funding, hue='Stage')\n",
93 | "\n",
94 | "predicted_percentage = results.params[0] + results.params[1] * stages\n",
95 | "sns.lineplot(x=stages, y=predicted_percentage, color='red', linewidth=2, label='Linear Regression')\n",
96 | "plt.xlabel('Stage Numerized')\n",
97 | "plt.ylabel('Percentage Laid Off')\n",
98 | "plt.legend(loc='upper right', handles=[\n",
99 | " mpatches.Patch(color='red', label='Linear Regression'),\n",
100 | " mpatches.Patch(color='none', label=f'P_value={np.round(p_value, 4)}'),\n",
101 | " mpatches.Patch(color='none', label=f'P_value={np.round(t_value, 4)}'),\n",
102 | "])\n",
103 | "plt.title('Linearly Regressed Layoff Percentage using Stage Numerized')\n",
104 | "plt.show()"
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | "We succesfully disproved the null hypothesis with a p-value of $0.0$, this suggests that there is a statistically significant correlation between the numerized stage and the percentage of employees laid off. With a t-value of $-11.5861$, we find that for every round of funding, a company that performs layoffs is expted to layoff around $10\\%$ less employees. This means that if you were looking to identify a company to be more likely to perform a large-scale layoff, it would be a company that is in the lower funding stages."
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {},
117 | "source": []
118 | }
119 | ],
120 | "metadata": {
121 | "language_info": {
122 | "name": "python"
123 | }
124 | },
125 | "nbformat": 4,
126 | "nbformat_minor": 2
127 | }
128 |
--------------------------------------------------------------------------------
/ProjectProposal_Group019_WI24.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# COGS 108 - Project Proposal"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Names\n",
15 | "\n",
16 | "- Sebastian Modafferi\n",
17 | "- McKayla David\n",
18 | "- Kirthin Rajkumar\n",
19 | "- Matthew Chan\n",
20 | "- Anna Potapenko"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "# Research Question"
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "Is a larger company more likely to lay off employees than a smaller company? If not, what indicators can be viewed to analyze what influences company layoffs?"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {},
40 | "source": [
41 | "## Background and Prior Work"
42 | ]
43 | },
44 | {
45 | "cell_type": "markdown",
46 | "metadata": {},
47 | "source": [
48 | "Layoffs refer to economic and organizational changes, and are a significant indicator of the success and development of companies. While they affect employees, layoff rates have a broader implication on the health of the economy, industry trends, and the state of the workforce. Understanding the trends and impacts of layoffs is vital not only on a global economic scale, but within communities of stakeholders and individual communities.\n",
49 | "\n",
50 | "Layoffs are important to study due to their relevance to both companies and employees. For a company, understanding indicators which can predict imminent layoffs can help them course correct before reaching a point of no return. On the other hand, employees understanding layoff indicators can help them in choosing the correct company for their next role, ensuring job safety. \n",
51 | "\n",
52 | "Research published in the Journal of the European Economic Association [1^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) explored the economic influences that cause layoffs, and inquired into how financial health and market factors influence layoff decisions. A similar study published in Journal of Labor Economics looks into the effects of layoffs on unemployment rates, and found that layoffs can have lasting impact on the job market and employee career trajectory.\n",
53 | "\n",
54 | "The journal of Labor Empirical Finance [3^](#https://doi.org/10.1016/s0927-5398\\(01\\)00024-x) also looks into the different firms and what caused their layoffs, giving insight into company restructuring and different technologies that help to modify the workforce requirements. Additionally, past precedent reviewed by JSTOR [2^](#https://www.jstor.org/stable/117002?casa_token=m7s1bFw7mY4AAAAA%3AhaYXwJWsj5E0Xo7vbnjns6omvUnSFYlenLVZ99nBhONKkQRCLyfLIdEk3ZJycob9If4HtLaMga7y7cQzrzAO6QfJYXTkccHfVciVYhTXREH7HSHuGN4) article explains the repetition of layoffs and how it correlates with economic cycles. This suggests that layoffs are an essential part of economic growth.\n",
55 | "\n",
56 | "Research on layoffs adopts an interdisciplinary approach, using economic theories, organizational behavior, and societal impacts. Overall, it is imperative to understand the factors that influence layoffs because knowledge about these factors can help researchers develop strategies to mitigate the negative effects of layoffs on employees and the economy at large. \n",
57 | "\n",
58 | "1. [^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) Gathmann, C., Helm, I., & Schönberg, U. (2018). Spillover effects of mass layoffs. Journal of the European Economic Association, 18(1), 427–468. https://doi.org/10.1093/jeea/jvy045\n",
59 | "2. [^](#cite_ref-2) Hallock, Kevin, (1998). Layoffs, top executive pay, and firm performance on JSTOR. (n.d.). www.jstor.org. https://www.jstor.org/stable/117002\n",
60 | "3. [^](#https://doi.org/10.1016/s0927-5398\\(01\\)00024-x) Chen, P., Mehrotra, V., Sivakumar, R., & Yu, W. (2001). Layoffs, shareholders’ wealth, and corporate performance. Journal of Empirical Finance, 8(2), 171–199. https://doi.org/10.1016/s0927-5398(01)00024-x\n"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "# Hypothesis\n"
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {},
73 | "source": [
74 | "We are inclined to believe this due to the fact that smaller companies already have less employees, so lay-offs are more likely to harm the business than benefit it. Additionally, larger companies are able to withstand more financial pressure, allowing them to perform large layoffs despite the impact on company performance given that they have enough capital with withstand the losses.\n"
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {},
80 | "source": [
81 | "# Data"
82 | ]
83 | },
84 | {
85 | "cell_type": "markdown",
86 | "metadata": {},
87 | "source": [
88 | "The ideal dataset would include revenue, number of employees, percent laid off, year, type of company, size of company, share price (public or private), growth, year before data, number of mass layoffs in companies life, etc. Some promising datasets we have found are\n",
89 | " - https://www.kaggle.com/datasets/ulrikeherold/tech-layoffs-2020-2024\n",
90 | " - https://layoffs.fyi/\n",
91 | " - https://www.kaggle.com/datasets/mysarahmadbhat/inc-5000-companies"
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {},
97 | "source": [
98 | "# Ethics & Privacy"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "Potential ethical concerns: \n",
106 | " - layoffs.fyi dataset has some data about fired employees with names (we should ignore that column)\n",
107 | " - Layoffs.fyi only pulls data from news articles, so it is a biased sample (information that is only accessible to the public)"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "How will we address these concerns?\n",
115 | " - Omit private employee information\n",
116 | " - Omit company name"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "metadata": {},
122 | "source": [
123 | "# Team Expectations "
124 | ]
125 | },
126 | {
127 | "cell_type": "markdown",
128 | "metadata": {},
129 | "source": [
130 | " - Be reliable in terms of completing one’s own work/contributions\n",
131 | " - Maintain open communication between team members\n",
132 | " - Be an active contributor to group discussions\n",
133 | " "
134 | ]
135 | },
136 | {
137 | "cell_type": "markdown",
138 | "metadata": {},
139 | "source": [
140 | "# Project Timeline Proposal"
141 | ]
142 | },
143 | {
144 | "cell_type": "markdown",
145 | "metadata": {},
146 | "source": [
147 | "| Meeting Date | Meeting Time| Completed Before Meeting | Discuss at Meeting |\n",
148 | "|---|---|---|---|\n",
149 | "| 02/04 | 1 PM | Read previous COGS 108 Final Projects | Complete previous quarters’ COGS 108 Final Project Analysis, plan meeting times, begin discussing project topics. | \n",
150 | "| 02/05 | 1 PM | Brainstorm project topics, potential data sources, and viability of research questions | Discuss and decide on final project topic; discuss hypothesis; begin background research Discuss ideal dataset(s) and ethics; draft project proposal| \n",
151 | "| 02/11 | 1 PM | Edit, finalize, and submit proposal; Search for datasets | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part |\n",
152 | "| 02/18 | 1 PM | Delegate Tasks and start wrangling | Go over what everyone has done. Make edits or revise things before. Also go over revisions and feedback from the proposal. |\n",
153 | "| 02/25 | 1 PM | Finalize wrangling/EDA; Begin Analysis | Meet for Checkpoint #1 |\n",
154 | "| 03/13 | 12 PM | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |\n",
155 | "| 03/20 | Before 11:59 PM | NA | Turn in Final Project & Group Project Surveys |"
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "metadata": {},
161 | "source": []
162 | }
163 | ],
164 | "metadata": {
165 | "kernelspec": {
166 | "display_name": "Python 3 (ipykernel)",
167 | "language": "python",
168 | "name": "python3"
169 | },
170 | "language_info": {
171 | "codemirror_mode": {
172 | "name": "ipython",
173 | "version": 3
174 | },
175 | "file_extension": ".py",
176 | "mimetype": "text/x-python",
177 | "name": "python",
178 | "nbconvert_exporter": "python",
179 | "pygments_lexer": "ipython3",
180 | "version": "3.9.7"
181 | }
182 | },
183 | "nbformat": 4,
184 | "nbformat_minor": 2
185 | }
186 |
--------------------------------------------------------------------------------
/Outlines/template.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import pandas as pd\n",
10 | "import numpy as np\n"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "## Loading in data"
18 | ]
19 | },
20 | {
21 | "cell_type": "code",
22 | "execution_count": 4,
23 | "metadata": {},
24 | "outputs": [
25 | {
26 | "data": {
27 | "text/html": [
28 | "
\n",
29 | "\n",
42 | "
\n",
43 | " \n",
44 | " \n",
45 | " | \n",
46 | " # | \n",
47 | " Company | \n",
48 | " Location_HQ | \n",
49 | " Country | \n",
50 | " Continent | \n",
51 | " Laid_Off | \n",
52 | " Date_layoffs | \n",
53 | " Percentage | \n",
54 | " Company_Size_before_Layoffs | \n",
55 | " Company_Size_after_layoffs | \n",
56 | " Industry | \n",
57 | " Stage | \n",
58 | " Money_Raised_in_$_mil | \n",
59 | " Year | \n",
60 | " lat | \n",
61 | " lng | \n",
62 | "
\n",
63 | " \n",
64 | " \n",
65 | " \n",
66 | " | 0 | \n",
67 | " 3 | \n",
68 | " ShareChat | \n",
69 | " Bengaluru | \n",
70 | " India | \n",
71 | " Asia | \n",
72 | " 200 | \n",
73 | " 2023-12-20 | \n",
74 | " 15.0 | \n",
75 | " 1333 | \n",
76 | " 1133 | \n",
77 | " Consumer | \n",
78 | " Series H | \n",
79 | " $1700 | \n",
80 | " 2023 | \n",
81 | " 12.97194 | \n",
82 | " 77.59369 | \n",
83 | "
\n",
84 | " \n",
85 | " | 1 | \n",
86 | " 4 | \n",
87 | " InSightec | \n",
88 | " Haifa | \n",
89 | " Israel | \n",
90 | " Asia | \n",
91 | " 100 | \n",
92 | " 2023-12-19 | \n",
93 | " 20.0 | \n",
94 | " 500 | \n",
95 | " 400 | \n",
96 | " Healthcare | \n",
97 | " Unknown | \n",
98 | " $733 | \n",
99 | " 2023 | \n",
100 | " 32.81841 | \n",
101 | " 34.98850 | \n",
102 | "
\n",
103 | " \n",
104 | " | 2 | \n",
105 | " 6 | \n",
106 | " Enphase Energy | \n",
107 | " San Francisco Bay Area | \n",
108 | " USA | \n",
109 | " North America | \n",
110 | " 350 | \n",
111 | " 2023-12-18 | \n",
112 | " 10.0 | \n",
113 | " 3500 | \n",
114 | " 3150 | \n",
115 | " Energy | \n",
116 | " Post-IPO | \n",
117 | " $116 | \n",
118 | " 2023 | \n",
119 | " 37.54827 | \n",
120 | " -121.98857 | \n",
121 | "
\n",
122 | " \n",
123 | " | 3 | \n",
124 | " 7 | \n",
125 | " Udaan | \n",
126 | " Bengaluru | \n",
127 | " India | \n",
128 | " Asia | \n",
129 | " 100 | \n",
130 | " 2023-12-18 | \n",
131 | " 10.0 | \n",
132 | " 1000 | \n",
133 | " 900 | \n",
134 | " Retail | \n",
135 | " Unknown | \n",
136 | " 1500 | \n",
137 | " 2023 | \n",
138 | " 12.97194 | \n",
139 | " 77.59369 | \n",
140 | "
\n",
141 | " \n",
142 | " | 4 | \n",
143 | " 14 | \n",
144 | " Cruise | \n",
145 | " San Francisco Bay Area | \n",
146 | " USA | \n",
147 | " North America | \n",
148 | " 900 | \n",
149 | " 2023-12-14 | \n",
150 | " 24.0 | \n",
151 | " 3750 | \n",
152 | " 2850 | \n",
153 | " Transportation | \n",
154 | " Acquired | \n",
155 | " $15000 | \n",
156 | " 2023 | \n",
157 | " 37.77493 | \n",
158 | " -122.41942 | \n",
159 | "
\n",
160 | " \n",
161 | "
\n",
162 | "
"
163 | ],
164 | "text/plain": [
165 | " # Company Location_HQ Country Continent \\\n",
166 | "0 3 ShareChat Bengaluru India Asia \n",
167 | "1 4 InSightec Haifa Israel Asia \n",
168 | "2 6 Enphase Energy San Francisco Bay Area USA North America \n",
169 | "3 7 Udaan Bengaluru India Asia \n",
170 | "4 14 Cruise San Francisco Bay Area USA North America \n",
171 | "\n",
172 | " Laid_Off Date_layoffs Percentage Company_Size_before_Layoffs \\\n",
173 | "0 200 2023-12-20 15.0 1333 \n",
174 | "1 100 2023-12-19 20.0 500 \n",
175 | "2 350 2023-12-18 10.0 3500 \n",
176 | "3 100 2023-12-18 10.0 1000 \n",
177 | "4 900 2023-12-14 24.0 3750 \n",
178 | "\n",
179 | " Company_Size_after_layoffs Industry Stage Money_Raised_in_$_mil \\\n",
180 | "0 1133 Consumer Series H $1700 \n",
181 | "1 400 Healthcare Unknown $733 \n",
182 | "2 3150 Energy Post-IPO $116 \n",
183 | "3 900 Retail Unknown 1500 \n",
184 | "4 2850 Transportation Acquired $15000 \n",
185 | "\n",
186 | " Year lat lng \n",
187 | "0 2023 12.97194 77.59369 \n",
188 | "1 2023 32.81841 34.98850 \n",
189 | "2 2023 37.54827 -121.98857 \n",
190 | "3 2023 12.97194 77.59369 \n",
191 | "4 2023 37.77493 -122.41942 "
192 | ]
193 | },
194 | "execution_count": 4,
195 | "metadata": {},
196 | "output_type": "execute_result"
197 | }
198 | ],
199 | "source": [
200 | "df = pd.read_excel('/Users/sebastian/Documents/code-things/Group019_WI24/data/tech_layoffs.xlsx')\n",
201 | "df.head()"
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "metadata": {},
207 | "source": [
208 | "## Correcting data types\n",
209 | "\n",
210 | "In order to prepare the data for exploratory analysis, we are going to correct the datatypes such that they are easily usable by plotting functions."
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 10,
216 | "metadata": {},
217 | "outputs": [
218 | {
219 | "data": {
220 | "text/plain": [
221 | "# int64\n",
222 | "Company object\n",
223 | "Location_HQ object\n",
224 | "Country object\n",
225 | "Continent object\n",
226 | "Laid_Off int64\n",
227 | "Date_layoffs datetime64[ns]\n",
228 | "Percentage float64\n",
229 | "Company_Size_before_Layoffs int64\n",
230 | "Company_Size_after_layoffs int64\n",
231 | "Industry object\n",
232 | "Stage object\n",
233 | "Money_Raised_in_$_mil object\n",
234 | "Year int64\n",
235 | "lat float64\n",
236 | "lng float64\n",
237 | "Funding float64\n",
238 | "dtype: object"
239 | ]
240 | },
241 | "execution_count": 10,
242 | "metadata": {},
243 | "output_type": "execute_result"
244 | }
245 | ],
246 | "source": [
247 | "df.dtypes"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": 9,
253 | "metadata": {},
254 | "outputs": [
255 | {
256 | "data": {
257 | "text/plain": [
258 | "0 1700.0\n",
259 | "1 733.0\n",
260 | "2 116.0\n",
261 | "3 500.0\n",
262 | "4 15000.0\n",
263 | "Name: Funding, dtype: float64"
264 | ]
265 | },
266 | "execution_count": 9,
267 | "metadata": {},
268 | "output_type": "execute_result"
269 | }
270 | ],
271 | "source": [
272 | "df['Funding'] = df['Money_Raised_in_$_mil'].apply(lambda s: np.float64(s[1:])) \n",
273 | "df['Funding'].head()"
274 | ]
275 | },
276 | {
277 | "cell_type": "code",
278 | "execution_count": null,
279 | "metadata": {},
280 | "outputs": [],
281 | "source": []
282 | }
283 | ],
284 | "metadata": {
285 | "kernelspec": {
286 | "display_name": "dsc80",
287 | "language": "python",
288 | "name": "python3"
289 | },
290 | "language_info": {
291 | "codemirror_mode": {
292 | "name": "ipython",
293 | "version": 3
294 | },
295 | "file_extension": ".py",
296 | "mimetype": "text/x-python",
297 | "name": "python",
298 | "nbconvert_exporter": "python",
299 | "pygments_lexer": "ipython3",
300 | "version": "3.8.16"
301 | }
302 | },
303 | "nbformat": 4,
304 | "nbformat_minor": 2
305 | }
306 |
--------------------------------------------------------------------------------
/Individual Uploads/McKayla_final_section_outline.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import pandas as pd\n",
10 | "import numpy as np\n"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "## Loading in data"
18 | ]
19 | },
20 | {
21 | "cell_type": "code",
22 | "execution_count": 4,
23 | "metadata": {},
24 | "outputs": [
25 | {
26 | "data": {
27 | "text/html": [
28 | "\n",
29 | "\n",
42 | "
\n",
43 | " \n",
44 | " \n",
45 | " | \n",
46 | " # | \n",
47 | " Company | \n",
48 | " Location_HQ | \n",
49 | " Country | \n",
50 | " Continent | \n",
51 | " Laid_Off | \n",
52 | " Date_layoffs | \n",
53 | " Percentage | \n",
54 | " Company_Size_before_Layoffs | \n",
55 | " Company_Size_after_layoffs | \n",
56 | " Industry | \n",
57 | " Stage | \n",
58 | " Money_Raised_in_$_mil | \n",
59 | " Year | \n",
60 | " lat | \n",
61 | " lng | \n",
62 | "
\n",
63 | " \n",
64 | " \n",
65 | " \n",
66 | " | 0 | \n",
67 | " 3 | \n",
68 | " ShareChat | \n",
69 | " Bengaluru | \n",
70 | " India | \n",
71 | " Asia | \n",
72 | " 200 | \n",
73 | " 2023-12-20 | \n",
74 | " 15.0 | \n",
75 | " 1333 | \n",
76 | " 1133 | \n",
77 | " Consumer | \n",
78 | " Series H | \n",
79 | " $1700 | \n",
80 | " 2023 | \n",
81 | " 12.97194 | \n",
82 | " 77.59369 | \n",
83 | "
\n",
84 | " \n",
85 | " | 1 | \n",
86 | " 4 | \n",
87 | " InSightec | \n",
88 | " Haifa | \n",
89 | " Israel | \n",
90 | " Asia | \n",
91 | " 100 | \n",
92 | " 2023-12-19 | \n",
93 | " 20.0 | \n",
94 | " 500 | \n",
95 | " 400 | \n",
96 | " Healthcare | \n",
97 | " Unknown | \n",
98 | " $733 | \n",
99 | " 2023 | \n",
100 | " 32.81841 | \n",
101 | " 34.98850 | \n",
102 | "
\n",
103 | " \n",
104 | " | 2 | \n",
105 | " 6 | \n",
106 | " Enphase Energy | \n",
107 | " San Francisco Bay Area | \n",
108 | " USA | \n",
109 | " North America | \n",
110 | " 350 | \n",
111 | " 2023-12-18 | \n",
112 | " 10.0 | \n",
113 | " 3500 | \n",
114 | " 3150 | \n",
115 | " Energy | \n",
116 | " Post-IPO | \n",
117 | " $116 | \n",
118 | " 2023 | \n",
119 | " 37.54827 | \n",
120 | " -121.98857 | \n",
121 | "
\n",
122 | " \n",
123 | " | 3 | \n",
124 | " 7 | \n",
125 | " Udaan | \n",
126 | " Bengaluru | \n",
127 | " India | \n",
128 | " Asia | \n",
129 | " 100 | \n",
130 | " 2023-12-18 | \n",
131 | " 10.0 | \n",
132 | " 1000 | \n",
133 | " 900 | \n",
134 | " Retail | \n",
135 | " Unknown | \n",
136 | " 1500 | \n",
137 | " 2023 | \n",
138 | " 12.97194 | \n",
139 | " 77.59369 | \n",
140 | "
\n",
141 | " \n",
142 | " | 4 | \n",
143 | " 14 | \n",
144 | " Cruise | \n",
145 | " San Francisco Bay Area | \n",
146 | " USA | \n",
147 | " North America | \n",
148 | " 900 | \n",
149 | " 2023-12-14 | \n",
150 | " 24.0 | \n",
151 | " 3750 | \n",
152 | " 2850 | \n",
153 | " Transportation | \n",
154 | " Acquired | \n",
155 | " $15000 | \n",
156 | " 2023 | \n",
157 | " 37.77493 | \n",
158 | " -122.41942 | \n",
159 | "
\n",
160 | " \n",
161 | "
\n",
162 | "
"
163 | ],
164 | "text/plain": [
165 | " # Company Location_HQ Country Continent \\\n",
166 | "0 3 ShareChat Bengaluru India Asia \n",
167 | "1 4 InSightec Haifa Israel Asia \n",
168 | "2 6 Enphase Energy San Francisco Bay Area USA North America \n",
169 | "3 7 Udaan Bengaluru India Asia \n",
170 | "4 14 Cruise San Francisco Bay Area USA North America \n",
171 | "\n",
172 | " Laid_Off Date_layoffs Percentage Company_Size_before_Layoffs \\\n",
173 | "0 200 2023-12-20 15.0 1333 \n",
174 | "1 100 2023-12-19 20.0 500 \n",
175 | "2 350 2023-12-18 10.0 3500 \n",
176 | "3 100 2023-12-18 10.0 1000 \n",
177 | "4 900 2023-12-14 24.0 3750 \n",
178 | "\n",
179 | " Company_Size_after_layoffs Industry Stage Money_Raised_in_$_mil \\\n",
180 | "0 1133 Consumer Series H $1700 \n",
181 | "1 400 Healthcare Unknown $733 \n",
182 | "2 3150 Energy Post-IPO $116 \n",
183 | "3 900 Retail Unknown 1500 \n",
184 | "4 2850 Transportation Acquired $15000 \n",
185 | "\n",
186 | " Year lat lng \n",
187 | "0 2023 12.97194 77.59369 \n",
188 | "1 2023 32.81841 34.98850 \n",
189 | "2 2023 37.54827 -121.98857 \n",
190 | "3 2023 12.97194 77.59369 \n",
191 | "4 2023 37.77493 -122.41942 "
192 | ]
193 | },
194 | "execution_count": 4,
195 | "metadata": {},
196 | "output_type": "execute_result"
197 | }
198 | ],
199 | "source": [
200 | "df = pd.read_excel('/Users/sebastian/Documents/code-things/Group019_WI24/data/tech_layoffs.xlsx')\n",
201 | "df.head()"
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "metadata": {},
207 | "source": [
208 | "## Correcting data types\n",
209 | "\n",
210 | "In order to prepare the data for exploratory analysis, we are going to correct the datatypes such that they are easily usable by plotting functions."
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 10,
216 | "metadata": {},
217 | "outputs": [
218 | {
219 | "data": {
220 | "text/plain": [
221 | "# int64\n",
222 | "Company object\n",
223 | "Location_HQ object\n",
224 | "Country object\n",
225 | "Continent object\n",
226 | "Laid_Off int64\n",
227 | "Date_layoffs datetime64[ns]\n",
228 | "Percentage float64\n",
229 | "Company_Size_before_Layoffs int64\n",
230 | "Company_Size_after_layoffs int64\n",
231 | "Industry object\n",
232 | "Stage object\n",
233 | "Money_Raised_in_$_mil object\n",
234 | "Year int64\n",
235 | "lat float64\n",
236 | "lng float64\n",
237 | "Funding float64\n",
238 | "dtype: object"
239 | ]
240 | },
241 | "execution_count": 10,
242 | "metadata": {},
243 | "output_type": "execute_result"
244 | }
245 | ],
246 | "source": [
247 | "df.dtypes"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": 9,
253 | "metadata": {},
254 | "outputs": [
255 | {
256 | "data": {
257 | "text/plain": [
258 | "0 1700.0\n",
259 | "1 733.0\n",
260 | "2 116.0\n",
261 | "3 500.0\n",
262 | "4 15000.0\n",
263 | "Name: Funding, dtype: float64"
264 | ]
265 | },
266 | "execution_count": 9,
267 | "metadata": {},
268 | "output_type": "execute_result"
269 | }
270 | ],
271 | "source": [
272 | "df['Funding'] = df['Money_Raised_in_$_mil'].apply(lambda s: np.float64(s[1:])) \n",
273 | "df['Funding'].head()"
274 | ]
275 | },
276 | {
277 | "cell_type": "code",
278 | "execution_count": null,
279 | "metadata": {},
280 | "outputs": [],
281 | "source": []
282 | }
283 | ],
284 | "metadata": {
285 | "kernelspec": {
286 | "display_name": "dsc80",
287 | "language": "python",
288 | "name": "python3"
289 | },
290 | "language_info": {
291 | "codemirror_mode": {
292 | "name": "ipython",
293 | "version": 3
294 | },
295 | "file_extension": ".py",
296 | "mimetype": "text/x-python",
297 | "name": "python",
298 | "nbconvert_exporter": "python",
299 | "pygments_lexer": "ipython3",
300 | "version": "3.8.16"
301 | }
302 | },
303 | "nbformat": 4,
304 | "nbformat_minor": 2
305 | }
306 |
--------------------------------------------------------------------------------
/DataCheckpoint_Group019_WI24.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback** \n",
8 | "\n",
9 | "Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.\n",
10 | "\n",
11 | "Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed."
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "# COGS 108 - Data Checkpoint"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "metadata": {},
24 | "source": [
25 | "# Names\n",
26 | "\n",
27 | "- McKayla David\n",
28 | "- Sebastian Modafferi\n",
29 | "- Anna Potapenko\n",
30 | "- Matthew Chan\n",
31 | "- Kirthin Rajkumar\n"
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {},
37 | "source": [
38 | "# Research Question"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "- Include a specific, clear data science question.\n",
46 | "- Make sure what you're measuring (variables) to answer the question is clear\n",
47 | "\n",
48 | "What is your research question? Include the specific question you're setting out to answer. This question should be specific, answerable with data, and clear. A general question with specific subquestions is permitted. (1-2 sentences)\n",
49 | "\n",
50 | "On a global economic scale, is a larger (on the basis of funding and quantity of employees) company more likely to lay off employees than a smaller company? If not, what indicators can be viewed to analyze what influences company layoffs?\n"
51 | ]
52 | },
53 | {
54 | "cell_type": "markdown",
55 | "metadata": {},
56 | "source": [
57 | "## Background and Prior Work"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "Layoffs refer to economic and organizational changes, and are a significant indicator of the success and development of companies. While they affect employees, layoff rates have a broader implication on the health of the economy, industry trends, and the state of the workforce. Understanding the trends and impacts of layoffs is vital not only on a global economic scale, but within communities of stakeholders and individual communities.\n",
65 | "\n",
66 | "Layoffs are important to study due to their relevance to both companies and employees. For a company, understanding indicators which can predict imminent layoffs can help them course correct before reaching a point of no return. On the other hand, employees understanding layoff indicators can help them in choosing the correct company for their next role, ensuring job safety. \n",
67 | "\n",
68 | "Research published in the Journal of the European Economic Association [1^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) explored the economic influences that cause layoffs, and inquired into how financial health and market factors influence layoff decisions. A similar study published in Journal of Labor Economics looks into the effects of layoffs on unemployment rates, and found that layoffs can have lasting impact on the job market and employee career trajectory.\n",
69 | "\n",
70 | "The journal of Labor Empirical Finance [3^](#https://doi.org/10.1016/s0927-5398\\(01\\)00024-x) also looks into the different firms and what caused their layoffs, giving insight into company restructuring and different technologies that help to modify the workforce requirements. Additionally, past precedent reviewed by JSTOR [2^](#https://www.jstor.org/stable/117002?casa_token=m7s1bFw7mY4AAAAA%3AhaYXwJWsj5E0Xo7vbnjns6omvUnSFYlenLVZ99nBhONKkQRCLyfLIdEk3ZJycob9If4HtLaMga7y7cQzrzAO6QfJYXTkccHfVciVYhTXREH7HSHuGN4) article explains the repetition of layoffs and how it correlates with economic cycles. This suggests that layoffs are an essential part of economic growth.\n",
71 | "\n",
72 | "Research on layoffs adopts an interdisciplinary approach, using economic theories, organizational behavior, and societal impacts. Overall, it is imperative to understand the factors that influence layoffs because knowledge about these factors can help researchers develop strategies to mitigate the negative effects of layoffs on employees and the economy at large. Existing work does not provide internal indicators for when a company is about to execute layoffs, so our research seeks to identify a correlation between company size and layoffs.\n",
73 | "\n",
74 | "1. [^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) Gathmann, C., Helm, I., & Schönberg, U. (2018). Spillover effects of mass layoffs. Journal of the European Economic Association, 18(1), 427–468. https://doi.org/10.1093/jeea/jvy045\n",
75 | "2. [^](#cite_ref-2) Hallock, Kevin, (1998). Layoffs, top executive pay, and firm performance on JSTOR. (n.d.). www.jstor.org. https://www.jstor.org/stable/117002\n",
76 | "3. [^](#https://doi.org/10.1016/s0927-5398\\(01\\)00024-x) Chen, P., Mehrotra, V., Sivakumar, R., & Yu, W. (2001). Layoffs, shareholders’ wealth, and corporate performance. Journal of Empirical Finance, 8(2), 171–199. https://doi.org/10.1016/s0927-5398(01)00024-x\n"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "# Hypothesis\n"
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "metadata": {},
89 | "source": [
90 | "We hypothesize that larger companies are more likely to lay off employees (especially amidst a recession) than smaller companies. We are inclined to believe this due to the fact that smaller companies already have less employees, so lay-offs are more likely to harm the business than benefit it. Additionally, larger companies are able to withstand more financial pressure, allowing them to perform large layoffs despite the impact on company performance given that they have enough capital with withstand the losses.\n"
91 | ]
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | "# Data"
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {},
103 | "source": [
104 | "## Data overview\n",
105 | "\n",
106 | "For each dataset include the following information\n",
107 | "- Dataset #1 - Kaggle\n",
108 | " - Dataset Name: \"Tech Layoffs 2020-2024\"\n",
109 | " - https://www.kaggle.com/datasets/ulrikeherold/tech-layoffs-2020-2024\n",
110 | " - Number of observations: 1418\n",
111 | " - Number of variables: 16\n",
112 | "\n",
113 | "This dataset was webscraped from layoffs.fyi. It contains layoff data over the past 4 years which was webscraped from news articles. The key data variables we will be using are `Money_Raised_in_$_mil`, `Percentage`, `Laid_Off`, `Funding`, and `Stage`. We are focusing analysis on these columns because they contain vital information about layoffs and how the company is performing. It comes fairly clean, and the only correction required is the `Money_Raised_in_$_mil` column, as it initally was stored as a string containing a dollar sign character."
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "## Dataset #1 (Layoffs.fyi)"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 7,
126 | "metadata": {},
127 | "outputs": [
128 | {
129 | "data": {
130 | "text/html": [
131 | "\n",
132 | "\n",
145 | "
\n",
146 | " \n",
147 | " \n",
148 | " | \n",
149 | " # | \n",
150 | " Company | \n",
151 | " Location_HQ | \n",
152 | " Country | \n",
153 | " Continent | \n",
154 | " Laid_Off | \n",
155 | " Date_layoffs | \n",
156 | " Percentage | \n",
157 | " Company_Size_before_Layoffs | \n",
158 | " Company_Size_after_layoffs | \n",
159 | " Industry | \n",
160 | " Stage | \n",
161 | " Money_Raised_in_$_mil | \n",
162 | " Year | \n",
163 | " lat | \n",
164 | " lng | \n",
165 | "
\n",
166 | " \n",
167 | " \n",
168 | " \n",
169 | " | 0 | \n",
170 | " 3 | \n",
171 | " ShareChat | \n",
172 | " Bengaluru | \n",
173 | " India | \n",
174 | " Asia | \n",
175 | " 200 | \n",
176 | " 2023-12-20 | \n",
177 | " 15.0 | \n",
178 | " 1333 | \n",
179 | " 1133 | \n",
180 | " Consumer | \n",
181 | " Series H | \n",
182 | " $1700 | \n",
183 | " 2023 | \n",
184 | " 12.97194 | \n",
185 | " 77.59369 | \n",
186 | "
\n",
187 | " \n",
188 | " | 1 | \n",
189 | " 4 | \n",
190 | " InSightec | \n",
191 | " Haifa | \n",
192 | " Israel | \n",
193 | " Asia | \n",
194 | " 100 | \n",
195 | " 2023-12-19 | \n",
196 | " 20.0 | \n",
197 | " 500 | \n",
198 | " 400 | \n",
199 | " Healthcare | \n",
200 | " Unknown | \n",
201 | " $733 | \n",
202 | " 2023 | \n",
203 | " 32.81841 | \n",
204 | " 34.98850 | \n",
205 | "
\n",
206 | " \n",
207 | " | 2 | \n",
208 | " 6 | \n",
209 | " Enphase Energy | \n",
210 | " San Francisco Bay Area | \n",
211 | " USA | \n",
212 | " North America | \n",
213 | " 350 | \n",
214 | " 2023-12-18 | \n",
215 | " 10.0 | \n",
216 | " 3500 | \n",
217 | " 3150 | \n",
218 | " Energy | \n",
219 | " Post-IPO | \n",
220 | " $116 | \n",
221 | " 2023 | \n",
222 | " 37.54827 | \n",
223 | " -121.98857 | \n",
224 | "
\n",
225 | " \n",
226 | " | 3 | \n",
227 | " 7 | \n",
228 | " Udaan | \n",
229 | " Bengaluru | \n",
230 | " India | \n",
231 | " Asia | \n",
232 | " 100 | \n",
233 | " 2023-12-18 | \n",
234 | " 10.0 | \n",
235 | " 1000 | \n",
236 | " 900 | \n",
237 | " Retail | \n",
238 | " Unknown | \n",
239 | " 1500 | \n",
240 | " 2023 | \n",
241 | " 12.97194 | \n",
242 | " 77.59369 | \n",
243 | "
\n",
244 | " \n",
245 | " | 4 | \n",
246 | " 14 | \n",
247 | " Cruise | \n",
248 | " San Francisco Bay Area | \n",
249 | " USA | \n",
250 | " North America | \n",
251 | " 900 | \n",
252 | " 2023-12-14 | \n",
253 | " 24.0 | \n",
254 | " 3750 | \n",
255 | " 2850 | \n",
256 | " Transportation | \n",
257 | " Acquired | \n",
258 | " $15000 | \n",
259 | " 2023 | \n",
260 | " 37.77493 | \n",
261 | " -122.41942 | \n",
262 | "
\n",
263 | " \n",
264 | "
\n",
265 | "
"
266 | ],
267 | "text/plain": [
268 | " # Company Location_HQ Country Continent \\\n",
269 | "0 3 ShareChat Bengaluru India Asia \n",
270 | "1 4 InSightec Haifa Israel Asia \n",
271 | "2 6 Enphase Energy San Francisco Bay Area USA North America \n",
272 | "3 7 Udaan Bengaluru India Asia \n",
273 | "4 14 Cruise San Francisco Bay Area USA North America \n",
274 | "\n",
275 | " Laid_Off Date_layoffs Percentage Company_Size_before_Layoffs \\\n",
276 | "0 200 2023-12-20 15.0 1333 \n",
277 | "1 100 2023-12-19 20.0 500 \n",
278 | "2 350 2023-12-18 10.0 3500 \n",
279 | "3 100 2023-12-18 10.0 1000 \n",
280 | "4 900 2023-12-14 24.0 3750 \n",
281 | "\n",
282 | " Company_Size_after_layoffs Industry Stage Money_Raised_in_$_mil \\\n",
283 | "0 1133 Consumer Series H $1700 \n",
284 | "1 400 Healthcare Unknown $733 \n",
285 | "2 3150 Energy Post-IPO $116 \n",
286 | "3 900 Retail Unknown 1500 \n",
287 | "4 2850 Transportation Acquired $15000 \n",
288 | "\n",
289 | " Year lat lng \n",
290 | "0 2023 12.97194 77.59369 \n",
291 | "1 2023 32.81841 34.98850 \n",
292 | "2 2023 37.54827 -121.98857 \n",
293 | "3 2023 12.97194 77.59369 \n",
294 | "4 2023 37.77493 -122.41942 "
295 | ]
296 | },
297 | "execution_count": 7,
298 | "metadata": {},
299 | "output_type": "execute_result"
300 | }
301 | ],
302 | "source": [
303 | "import pandas as pd\n",
304 | "import numpy as np\n",
305 | "df = pd.read_excel('./data/tech_layoffs.xlsx')\n",
306 | "df.head()"
307 | ]
308 | },
309 | {
310 | "cell_type": "code",
311 | "execution_count": 9,
312 | "metadata": {},
313 | "outputs": [],
314 | "source": [
315 | "#remove company name\n",
316 | "df = df.drop(columns=['Company'])"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": 10,
322 | "metadata": {},
323 | "outputs": [
324 | {
325 | "data": {
326 | "text/plain": [
327 | "dtype('O')"
328 | ]
329 | },
330 | "execution_count": 10,
331 | "metadata": {},
332 | "output_type": "execute_result"
333 | }
334 | ],
335 | "source": [
336 | "df['Money_Raised_in_$_mil'].dtypes"
337 | ]
338 | },
339 | {
340 | "cell_type": "code",
341 | "execution_count": 11,
342 | "metadata": {},
343 | "outputs": [
344 | {
345 | "data": {
346 | "text/plain": [
347 | "(1418, 16)"
348 | ]
349 | },
350 | "execution_count": 11,
351 | "metadata": {},
352 | "output_type": "execute_result"
353 | }
354 | ],
355 | "source": [
356 | "#clean the Money_Raised_in_$_mil column to be float instead of string\n",
357 | "df['Funding'] = df['Money_Raised_in_$_mil'].apply(lambda s: np.float64(s[1:])) \n",
358 | "df['Funding'].head()\n",
359 | "df.shape"
360 | ]
361 | },
362 | {
363 | "cell_type": "markdown",
364 | "metadata": {},
365 | "source": [
366 | "# Ethics & Privacy"
367 | ]
368 | },
369 | {
370 | "cell_type": "markdown",
371 | "metadata": {},
372 | "source": [
373 | "##### Potential ethical concerns and how we plan to address them: \n",
374 | "Our dataset is webscraped from Layoffs.fyi, which contains explicit personal information on individuals who were laid off. Without explicit documentation of informed consent, for the sake of privacy conservation, we will be omitting this information and focusing on the metadata (corporations over the individual). Additionally, Layoffs.fyi only pulls data from news articles, so it is a biased sample that is pulled from data that is only accessible to the public. This dataset is primarily constructed by data contained to the USA, which effectively neglects layoffs that occur in other regions of the world, leading to potentially biased analysis and results. As a result of unsatisfactory observations from foreign countries, we will be orienting our data analysis in the context of the USA's economy. However, we will still include models and representations of non-US observations to provide scope and a point of reference to our data. The timeframe of our data is 2020-2024, which unfortunately excludes a larger historical context regarding layoffs, compounding potential bias and lack of scope. Due to this, our analysis will be further oriented towards a COVID and post-COVID economy."
375 | ]
376 | },
377 | {
378 | "cell_type": "markdown",
379 | "metadata": {},
380 | "source": [
381 | "# Team Expectations "
382 | ]
383 | },
384 | {
385 | "cell_type": "markdown",
386 | "metadata": {},
387 | "source": [
388 | "We expect our team members to be reliable in terms of completing one’s own work/contributions. They should maintain open communication between team members and are expected to communicate any scheduling conflicts for team meetings. They are still expected to complete their work before the meeting even if they are not able to make it. During team meetings, we expect all members to be actively contributing to discussion, and to be professional when discussing conflicts between ideas. Each member will be assigned tasks by the end of the team meeting, and they are expected to arrive to the next team meeting with their task completed sufficiently, and uploaded to the repository, such that we are able to discuss progress and any issues we ran into."
389 | ]
390 | },
391 | {
392 | "cell_type": "markdown",
393 | "metadata": {},
394 | "source": [
395 | "# Project Timeline Proposal"
396 | ]
397 | },
398 | {
399 | "cell_type": "markdown",
400 | "metadata": {},
401 | "source": [
402 | "| Meeting Date | Meeting Time| Completed Before Meeting | Discuss at Meeting |\n",
403 | "|---|---|---|---|\n",
404 | "| 02/04 | 1 PM | Read previous COGS 108 Final Projects | Complete previous quarters’ COGS 108 Final Project Analysis, plan meeting times, begin discussing project topics. | \n",
405 | "| 02/05 | 1 PM | Brainstorm project topics, potential data sources, and viability of research questions | Discuss and decide on final project topic; discuss hypothesis; begin background research Discuss ideal dataset(s) and ethics; draft project proposal| \n",
406 | "| 02/11 | 1 PM | Edit, finalize, and submit proposal; Search for datasets | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part |\n",
407 | "| 02/18 | 1 PM | Delegate Tasks and start wrangling | Go over what everyone has done. Make edits or revise things before. Also go over revisions and feedback from the proposal. |\n",
408 | "| 02/25 | 1 PM | Finalize wrangling/EDA; Begin Analysis | Meet for Checkpoint #1 |\n",
409 | "| 03/03 | 1 PM | Finalize Data Viz and EDA; Begin Analysis | Meet for Checkpoint #2 |\n",
410 | "| 03/10 | 1 PM | Finalize quantitative analysis; Discuss approach to final video submission | Meet for video and final submission semantics |\n",
411 | "| 03/13 | 12 PM | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |\n",
412 | "| 03/20 | Before 11:59 PM | NA | Turn in Final Project & Group Project Surveys |"
413 | ]
414 | }
415 | ],
416 | "metadata": {
417 | "kernelspec": {
418 | "display_name": "Python 3 (ipykernel)",
419 | "language": "python",
420 | "name": "python3"
421 | },
422 | "language_info": {
423 | "codemirror_mode": {
424 | "name": "ipython",
425 | "version": 3
426 | },
427 | "file_extension": ".py",
428 | "mimetype": "text/x-python",
429 | "name": "python",
430 | "nbconvert_exporter": "python",
431 | "pygments_lexer": "ipython3",
432 | "version": "3.12.2"
433 | }
434 | },
435 | "nbformat": 4,
436 | "nbformat_minor": 2
437 | }
438 |
--------------------------------------------------------------------------------