├── Avocado_HassAvocadoBoard_20152023v1.0.1.csv ├── README.md ├── .gitignore └── ProjectProposal_Group043_WI24.ipynb /Avocado_HassAvocadoBoard_20152023v1.0.1.csv: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This is your group repo for your final project for COGS108. 2 | 3 | This repository is private, and is only visible to the course instructors and your group mates; it is not visible to anyone else. 4 | 5 | Template notebooks for each component are provided. Only work on the notebook prior to its due date. After each submission is due, move onto the next notebook (For example, after the proposal is due, start working in the Data Checkpoint notebook). 6 | 7 | This repository will be frozen on the final project due date. No further changes can be made after that time. 8 | 9 | Your project proposal and final project will be graded based solely on the corresponding project notebooks in this repository. 10 | 11 | Template Jupyter notebooks have been included, with your group number replacing the XXX in the following file names. For each due date, make sure you have a notebook present in this repository by each due date with the following name (where XXX is replaced by your group number): 12 | 13 | - `ProjectProposal_groupXXX.ipynb` 14 | - `DataCheckpoint_groupXXX.ipynb` 15 | - `EDACheckpoint_groupXXX.ipynb` 16 | - `FinalProject_groupXXX.ipynb` 17 | 18 | This is *your* repo. You are free to manage the repo as you see fit, edit this README, add data files, add scripts, etc. So long as there are the four files above on due dates with the required information, the rest is up to you all. 19 | 20 | Also, you are free and encouraged to share this project after the course and to add it to your portfolio. Just be sure to fork it to your GitHub at the end of the quarter! 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # poetry 98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 102 | #poetry.lock 103 | 104 | # pdm 105 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 106 | #pdm.lock 107 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 108 | # in version control. 109 | # https://pdm.fming.dev/#use-with-ide 110 | .pdm.toml 111 | 112 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 113 | __pypackages__/ 114 | 115 | # Celery stuff 116 | celerybeat-schedule 117 | celerybeat.pid 118 | 119 | # SageMath parsed files 120 | *.sage.py 121 | 122 | # Environments 123 | .env 124 | .venv 125 | env/ 126 | venv/ 127 | ENV/ 128 | env.bak/ 129 | venv.bak/ 130 | 131 | # Spyder project settings 132 | .spyderproject 133 | .spyproject 134 | 135 | # Rope project settings 136 | .ropeproject 137 | 138 | # mkdocs documentation 139 | /site 140 | 141 | # mypy 142 | .mypy_cache/ 143 | .dmypy.json 144 | dmypy.json 145 | 146 | # Pyre type checker 147 | .pyre/ 148 | 149 | # pytype static type analyzer 150 | .pytype/ 151 | 152 | # Cython debug symbols 153 | cython_debug/ 154 | 155 | # PyCharm 156 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 157 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 158 | # and can be added to the global gitignore or merged into this file. For a more nuclear 159 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 160 | #.idea/ 161 | 162 | -------------------------------------------------------------------------------- /ProjectProposal_Group043_WI24.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# COGS 108 - Project Proposal" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Names\n", 15 | "\n", 16 | "- Kaleigh Mogatas A17051705 kmogatas@ucsd.edu\n", 17 | "- Tairan Liu A17399714 tal012@ucsd.edu\n", 18 | "- Teresa Tian A16878664 shtian@ucsd.edu\n", 19 | "- Lynna Nguyen A16906910 lnn002@ucsd.edu\n", 20 | "- Ella Tung A16363333 etung@ucsd.edu\n" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "# Research Question" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "What impacts do regions, along with production methods (conventional vs. organic) have on U.S. avocado price fluctuations ranging from 2015 to 2023?", 35 | "\n" 36 | ] 37 | }, 38 | { 39 | 40 | "cell_type": "markdown", 41 | 42 | "metadata": {}, 43 | 44 | "source": [ 45 | 46 | "## Background and Prior Work" 47 | 48 | ] 49 | 50 | }, 51 | 52 | { 53 | 54 | "cell_type": "markdown", 55 | 56 | "metadata": {}, 57 | 58 | "source": [ 59 | 60 | "\n", 61 | 62 | "Avocados, often hailed as a superfruit, are packed with a wide array of vitamins and minerals. Owing to their extensive health benefits, they have sustained their popularity over the past decade, consistently featuring across various culinary presentations. The consumption of avocados have become significant over the past years, thus causing an increase in demand. Because of this, the price of avocados has been affected for a number of reasons. Factors such as the seasonality of avocado production, with specific periods favoring optimal growth, the type of avocado (organic or conventional), and the region of cultivation play a role in influencing avocado’s availability and quality.[1](#cite_note-1)\n", 63 | 64 | "\n", 65 | 66 | "Avocado prices fluctuate depending on the time of year. Tanmay Deshpande, the author of ‘Avocado Price Forecast,’ used Auto Regressive Integrated Moving Average (ARIMA) and Seasonal Auto Regressive Integrated Moving Average (SARIMA) model to find the correlation between the two types of avocado prices and the time of year. He discovered that the average prices of avocados fluctuate during a certain time of year as it would rise around September to November, but would drop again around December and January. This suggests that there may be more of a demand for avocados since they are seasonal fruits, which is why costs increase during a specific time of year in which they are produced.[2](#cite_note-2)\n", 67 | 68 | "\n", 69 | 70 | "However, the costs of avocados also vary depending on the region of which they are produced and sold for a number of reasons. Mario Caesar, author of ‘Avocado Price Regression w/ PyCaret & EDA,’ discusses how different areas around the United States play a big role in the contribution of avocado sales and prices, especially with the size of the bag of avocados. His article expresses how the West (excluding California) and California are more likely to purchase both conventional and organic avocados—more so conventional—in comparison to the other top five areas that purchase avocados—the Northeast, Great Lakes and South Central, which begs the question as to why there is such a difference in avocado consumption between different regions. Additionally, the type of avocado that is produced also affects cost and people’s decision to purchase as they are grown using different methods. Caesar concludes that conventional avocados that are within bags are more likely to be purchased in comparison to organic avocados since they tend to be cheaper and bigger in volume and size. Even though organic avocados are produced in a better environment, their prices are higher for that reason.[3](#cite_note-3)\n", 71 | 72 | "\n", 73 | 74 | "\n", 75 | 76 | 77 | 78 | "**Citations** \n", 79 | 80 | "1. [1](#cite_ref-1) Bastida, Olmo. Fresh Avocado Market in the US. *ProducePay*, 5 Oct 2023., https://producepay.com/blog/fresh-avocado-market-us/#:~:text=Price%20of%20imported%20fresh%20avocado,December%2C%20November%2C%20and%20October. Accessed 9 Feb. 2024. \n", 81 | 82 | "2. [2](#cite_ref-2) Caesar, Mario. Avocado Price Regression w/ PyCaret & EDA. *Kaggle*, 2022, www.kaggle.com/code/caesarmario/avocado-price-regression-w-pycaret-eda. Accessed 9 Feb 2024. \n", 83 | 84 | "3. [3](#cite_ref-3) Deshpande, Tanmay, Avocado Price Forecast- ARIMA & SARIMA. *Kaggle*, 2023, https://www.kaggle.com/code/tanmay111999/avocado-price-forecast-arima-sarima-detailed. Accessed 9 Feb. 2024." 85 | 86 | 87 | 88 | ] 89 | 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "# Hypothesis\n" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "\n", 103 | "In the United States, avocado prices are primarily affected by their production method, where organic avocados command higher prices than conventional ones. Additionally, regional variations and seasonal demand fluctuations play critical roles in price determination, leading to higher prices in regions with scarce supply or elevated demand during certain seasons.\n", 104 | "\n" 105 | 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "# Data" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "Data Overview:\n", "\n", 120 | "Dataset Name: Avocado_HassAvocadoBoard_20152023v1.0.1.csv\n","\n", 121 | "Link to the Dataset: \n","\n", 122 | "https://www.kaggle.com/datasets/vakhariapujan/avocado-prices-and-sales-volume-2015-2023\n","\n", 123 | "Number of Observations: 53415 \n","\n", 124 | "Number of Variables: 12 \n","\n", 125 | "This dataset provides comprehensive data on Hass avocado sales, sourced from the Hass Avocado Board. It builds on previous versions introduced on Kaggle by Justin Kiggins and later updated by Valentin Joseph to include data from 2015 up until 2021. This dataset categorizes sales data by various regions and key locations within the United States, including cities and sub-regions. Notably, the aggregation of location values doesn't equate to the total for regions, which are specified as California, West, Plains, South Central, Southeast, Midsouth, Great Lakes, and Northeast. This dataset offers valuable insights into avocado sales trends across different geographical areas. \n", 126 | "\n" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "# Ethics & Privacy" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "Our data doesn’t involve human subjects, so there is no concern for informed consent. They don’t have PII concerns. We believe the avocados will not request for their personal information to be removed. Overall, since we are not using human subjects, there should be no biases, privacy, or terms of concern. Furthermore, since we found our data online, we do not plan on deleting the data after using them. The price of avocados and their origins are transparent online. We do not consider securing the data since they are transparent and can be found online.\n", "\n", 141 | "One bias our dataset might have is that we can only include the avocados that are sold in bigger institutions, such as supermarkets. Smaller individuals, such as farmers selling small amounts of avocados might not be accounted for. However, since we are just focusing on the impact of regions and production methods on the prices, these small individual sellers shouldn’t do much on the statistics. Since they don’t have much impact on the overall avocado prices and they appear randomly, they can be disregarded in our data analysis. \n", "\n", 142 | "Another impact that our project will have is that we might come to conclusions of the avocados based on their origin. People might be inclined to buy avocados from one origin and discriminate against avocados from another origin. We will address this in our writeup by stating that our data is based on transparent data collected previously. Avocado prices in different areas might fluctuate in the future and become the opposite of our conclusions.\n" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "# Team Expectations " 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "\n", 157 | "\n", 158 | "* *Team Expectation 1*: Punctual Participation: All team members are expected to attend meetings promptly. Should illness or unavoidable circumstances arise, members are encouraged to participate virtually to maintain continuity.\n", 159 | "* *Team Expectation 2*: Effective Communication: We prioritize clear, timely, and constructive communication. Keeping all members informed and engaged is essential for our collective success.\n", 160 | "* *Team Expectation 3*: Mutual Respect Towards Teammates: Every team member deserves to be treated with dignity and respect. We commit to fostering an inclusive environment where diverse perspectives are valued and encouraged.\n", 161 | "* *Team Expectation 4*: Active Contribution and Collaboration: Each member is expected to actively contribute to our project by sharing ideas, taking on tasks, and collaborating with others. Our goal is to leverage our collective strengths to achieve our research objectives.\n", 162 | "\nBy adhering to these expectations, our group aims to create a productive, supportive, and respectful team dynamic conducive to our project's success.\n" 163 | 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "# Project Timeline Proposal" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "\n", 178 | "\n", 179 | "| Meeting Date | Meeting Time| Completed Before Meeting | Discuss at Meeting |\n", 180 | "|---|---|---|---|\n", 181 | "| 2/8 | 1 PM | Read & Think about COGS 108 expectations; brainstorm topics/questions | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | \n", 182 | "| 2/10 | 10 AM | Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal | \n", 183 | "| 2/13 | 3 PM | Read and understand checkpoint #1, brainstorm the project | Discuss how we can tackle our project as we continue the project | \n", 184 | "| 2/18 | 10 AM | Edit, finalize, and submit proposal; Search for datasets | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part for Checkpoint #1 |\n", 185 | "| 2/24 | 6 PM | Import & Wrangle Data (Ant Man); EDA (Hulk) | Review/Edit wrangling/EDA; Discuss Analysis Plan |\n", 186 | "| 2/29 | 12 PM | Finalize wrangling/EDA; Begin Analysis (Iron Man; Thor) | Discuss/edit Analysis; Complete project check-in |\n", 187 | "| 3/2 | 7 PM | Understand what we need to get done for checkpoint #2, prepare questions if needed for this part | Discuss what we should do for checkpoint #2 and assign members their part for this section | \n", 188 | "| 3/5 | 3 PM | Continue doing parts for checkpoint #2 | Discuss if someone needs extra help for their part and update each other on what we each need so we know what we need to focus on | \n", 189 | "| 3/9 | 1 PM | Review each other's parts and see what we have done for this checkpoint | Work collaboratively as a group during this checkpoint to get as much done to catch everyone up on their parts | \n", 190 | "| 3/10 | 1 PM | Continue to review each other's part to make sure that everyone is caught up and that we are not behind | Finalize Checkpoint #2 to make sure that it is ready to submit | \n", 191 | "| 3/11 | 6 PM | Read the final report requirements | Discussion about the final report and make sure members are aware of what we are doing | \n", 192 | "| 3/13 | 12 PM | Complete analysis; Draft results/conclusion/discussion (Wasp)| Discuss/edit full project |\n", 193 | "| 3/15 | 5 PM | Review each other's work for the Final Report | Comment on what else needs editing and finalization for what other members should do for their part and make sure that we have everything that is required for the Final Report. | \n", 194 | "| 3/19 | Before 11:59 PM | Review Final Report to make sure that it is ready to submit | Finalize the Final Report and turn in Final Project & Group Project Surveys | \n" 195 | ] 196 | } 197 | ], 198 | "metadata": { 199 | "kernelspec": { 200 | "display_name": "Python 3 (ipykernel)", 201 | "language": "python", 202 | "name": "python3" 203 | }, 204 | "language_info": { 205 | "codemirror_mode": { 206 | "name": "ipython", 207 | "version": 3 208 | }, 209 | "file_extension": ".py", 210 | "mimetype": "text/x-python", 211 | "name": "python", 212 | "nbconvert_exporter": "python", 213 | "pygments_lexer": "ipython3", 214 | "version": "3.9.7" 215 | } 216 | }, 217 | "nbformat": 4, 218 | "nbformat_minor": 2 219 | } 220 | --------------------------------------------------------------------------------