├── .github
└── ISSUE_TEMPLATE
│ ├── 01-insight-guide-proposal.yaml
│ └── 02-project-proposal.yaml
├── .gitignore
├── CHAOSS-Data-Science-Prospectus.pdf
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── challenges_survey
├── Challenges_Survey_Results_2023.pdf
├── README.md
├── challenge_comment_categories.md
├── draft_interpretations
│ ├── README.md
│ └── dawn_analysis
│ │ ├── Challenges_Survey_2023.pdf
│ │ └── challenge_comment_categories.md
└── raw_data
│ ├── Clean_CHAOSS_Understanding_Challenges_Survey.csv
│ ├── other_challenges_freeform_redacted.txt
│ ├── strengths_freeform_redacted.txt
│ └── understanding_challenges_graphs.pdf
├── data-ethics-statement.md
├── dataset
├── README.md
├── archive
│ ├── README.md
│ ├── archived-projects.py
│ └── data-files
│ │ └── archive_repos.csv
├── foundation-stats
│ ├── apacheURLtoTable.md
│ ├── apache_url_to_table.py
│ ├── dataset
│ │ └── foundation-stats
│ │ │ ├── structured_project_analysis.csv
│ │ │ └── structured_project_cleaned.json
│ ├── structured_project_analysis.csv
│ └── structured_project_cleaned.json
├── license-changes
│ ├── README.md
│ ├── dataset_notes.md
│ ├── fork-case-study
│ │ ├── README.md
│ │ ├── commits_people.py
│ │ ├── data-files
│ │ │ ├── OpenSearch2021-04-12T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch2021-04-12T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch2023-08-01T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch2023-09-16T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch2024-09-16T00:00:00.000+00:002025-03-16T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch_people_2021-04-12T00:00:00.000+00:002022-04-12T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch_people_2021-04-12T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch_people_2021-04-12T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch_people_2023-08-01T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch_people_2023-09-16T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl
│ │ │ ├── OpenSearch_people_2024-09-16T00:00:00.000+00:002025-03-16T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch2019-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch2020-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch2023-08-29T00:00:00.000+00:002024-08-29T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch2024-08-29T00:00:00.000+00:002025-02-29T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch_people_2019-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch_people_2020-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch_people_2021-02-03T00:00:00.000+00:002022-02-03T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch_people_2023-08-29T00:00:00.000+00:002024-08-29T00:00:00.000+00:00.pkl
│ │ │ ├── elasticsearch_people_2024-08-29T00:00:00.000+00:002025-02-29T00:00:00.000+00:00.pkl
│ │ │ ├── opentofu2023-09-05T00:00:00.000+00:002024-09-05T00:00:00.000+00:00.pkl
│ │ │ ├── opentofu_people_2023-09-05T00:00:00.000+00:002024-09-05T00:00:00.000+00:00.pkl
│ │ │ ├── redis2022-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl
│ │ │ ├── redis2023-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl
│ │ │ ├── redis2024-02-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl
│ │ │ ├── redis2024-03-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl
│ │ │ ├── redis2024-03-20T00:00:00.000+00:002024-09-20T00:00:00.000+00:00.pkl
│ │ │ ├── redis2024-03-20T00:00:00.000+00:002025-03-20T00:00:00.000+00:00.pkl
│ │ │ ├── redis_people_2022-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl
│ │ │ ├── redis_people_2023-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl
│ │ │ ├── redis_people_2024-03-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl
│ │ │ ├── redis_people_2024-03-20T00:00:00.000+00:002024-09-20T00:00:00.000+00:00.pkl
│ │ │ ├── redis_people_2024-03-20T00:00:00.000+00:002025-03-20T00:00:00.000+00:00.pkl
│ │ │ ├── stars-forks
│ │ │ │ ├── OpenSearch_forks.csv
│ │ │ │ ├── elasticsearch_forks.csv
│ │ │ │ ├── elasticsearch_stars.csv
│ │ │ │ ├── opensearch-stars.csv
│ │ │ │ ├── opentofu-forks.csv
│ │ │ │ ├── opentofu-stars.csv
│ │ │ │ ├── redis-forks.csv
│ │ │ │ ├── redis-stars.csv
│ │ │ │ ├── terraform-forks.csv
│ │ │ │ ├── terraform-stars.csv
│ │ │ │ ├── valkey-forks.csv
│ │ │ │ └── valkey-stars.csv
│ │ │ ├── terraform2021-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl
│ │ │ ├── terraform2022-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl
│ │ │ ├── terraform2023-08-10T00:00:00.000+00:002024-08-10T00:00:00.000+00:00.pkl
│ │ │ ├── terraform_people_2021-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl
│ │ │ ├── terraform_people_2022-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl
│ │ │ ├── terraform_people_2023-08-10T00:00:00.000+00:002024-08-10T00:00:00.000+00:00.pkl
│ │ │ ├── valkey2024-03-28T00:00:00.000+00:002024-08-20T00:00:00.000+00:00.pkl
│ │ │ ├── valkey2024-03-28T00:00:00.000+00:002024-09-28T00:00:00.000+00:00.pkl
│ │ │ ├── valkey2024-03-28T00:00:00.000+00:002025-03-28T00:00:00.000+00:00.pkl
│ │ │ ├── valkey_people_2024-03-28T00:00:00.000+00:002024-08-20T00:00:00.000+00:00.pkl
│ │ │ ├── valkey_people_2024-03-28T00:00:00.000+00:002024-09-28T00:00:00.000+00:00.pkl
│ │ │ └── valkey_people_2024-03-28T00:00:00.000+00:002025-03-28T00:00:00.000+00:00.pkl
│ │ └── notebooks
│ │ │ ├── OpenSearch.ipynb
│ │ │ ├── elasticsearch.ipynb
│ │ │ ├── opentofu.ipynb
│ │ │ ├── redis.ipynb
│ │ │ ├── stars-forks.ipynb
│ │ │ ├── terraform.ipynb
│ │ │ └── valkey.ipynb
│ ├── forks.csv
│ ├── generate-license-data.py
│ ├── inputdata.csv
│ ├── license_changes.csv
│ ├── more_forks.csv
│ ├── output.json
│ └── wikipedia_list.csv
├── releases
│ └── fork-relicense-jais-2025-04.tar.gz
└── taxonomies
│ ├── FOSDEM_ Do we need another open source software taxonomy_ (1).pdf
│ ├── README.md
│ └── Taxonomies - repostatus.csv
├── events
└── hackathon-june-2025.md
├── practitioner-guides
├── README.md
├── contributor-sustainability.md
├── diverse-leadership.md
├── images
│ ├── active-contrib-over-time-bar-trend.png
│ ├── active-organizations-over-time-by-data-source.png
│ ├── bus-factor-bar-balanced.png
│ ├── bus-factor-pie-one-person.png
│ ├── change_requests_abandoned.png
│ ├── closure-ratio-falling-behind.png
│ ├── closure-ratio-summer-gap.png
│ ├── commit-activity-by-domain-unclean.png
│ ├── commit-activity-by-domain-vmw.png
│ ├── contrib-by-data-source.png
│ ├── contributor-growth-by-engagement-bar.png
│ ├── issues_abandoned.png
│ ├── lead-time.png
│ ├── leadership-positions-istio-before-cncf-april-2022.png
│ ├── leadership-positions-istio-graduating-june-2023.png
│ ├── libyears.png
│ ├── ossf-badge-categories.png
│ ├── ossf-badge-criteria-example.png
│ ├── ossf-badge-curl.png
│ ├── releases.png
│ ├── time-to-close.png
│ └── time-to-first-response.png
├── introduction.md
├── organizational-participation.md
├── responsiveness.md
├── security.md
├── sunset.md
└── website-landing.md
└── publications
├── Foster-OFA-New-Dynamics-Open-Source-Relicensing-Forks-Community-Impact-2024.pdf
├── README.md
└── publication-guidelines.md
/.github/ISSUE_TEMPLATE/01-insight-guide-proposal.yaml:
--------------------------------------------------------------------------------
1 | name: Practitioner Guide Proposal
2 | title: "[Practitioner Guide]: TOPIC"
3 | description: Use this template to propose new CHAOSS Practitioner Guides
4 | labels: ['practitioner guide', 'proposal']
5 | assignees:
6 | - geekygirldawn
7 |
8 | body:
9 | - type: markdown
10 | attributes:
11 | value: |
12 | To avoid duplication and re-work, we ask you to use this template to propose new CHAOSS Practitioner Guides. While metrics models are designed with collections of metrics that can be implemented together, these Practitioner Guides are different from metrics models. Practitioner guides are designed to help us humans understand how to interpret metrics within a narrow topic and make improvements based on what is learned from that interpretation. Each Practitioner Guide should focus on 2-4 metrics, but can include a list of additional metrics in Step 3 - Gather Additional Data if needed.
13 |
14 | - type: markdown
15 | attributes:
16 | value: |
17 | ## Practitioner Guide Information
18 |
19 | - type: input
20 | attributes:
21 | label: Practitioner Guide Topic (1 - 3 words)
22 | validations:
23 | required: true
24 |
25 | - type: dropdown
26 | attributes:
27 | label: Is this a getting started guide or a more advanced guide for experienced users?
28 | options:
29 | - Getting Started
30 | - Advanced / Expert
31 | - Not sure
32 | validations:
33 | required: true
34 |
35 | - type: textarea
36 | attributes:
37 | label: Primary Metrics (2 - 4 metrics for Getting Started Guides)
38 | validations:
39 | required: true
40 |
41 | - type: textarea
42 | attributes:
43 | label: Why is this topic important? How will this help people improve their open source project and / or community? Who will benefit from this guide?
44 | validations:
45 | required: true
46 |
47 | - type: dropdown
48 | attributes:
49 | label: How would you like to see this guide developed?
50 | options:
51 | - I am interested in using this guide, but I do not want to write it myself
52 | - I have the experience and time available to write the first draft
53 | - I would like to help write the guide, but I need someone with more experience in the topic to help me
54 | - Other (please specify in the “Additional Notes” at the end of this form)
55 | validations:
56 | required: true
57 |
58 | - type: textarea
59 | attributes:
60 | label: Additional Notes
61 | validations:
62 | required: false
63 |
64 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/02-project-proposal.yaml:
--------------------------------------------------------------------------------
1 | name: Data Science Project Proposal
2 | title: "[Project]: NAME"
3 | description: Use this template to propose new CHAOSS data science projects
4 | labels: ['project', 'proposal']
5 |
6 | body:
7 | - type: markdown
8 | attributes:
9 | value: |
10 | The CHAOSS Data Science Community is interested in collaboratively working together on projects. The goal is to build a community of active data scientists within the CHAOSS project who are using CHAOSS data to help answer questions. To get the most value out of a CHAOSS Data Science Initiative, we need to scale our data science efforts and share best practices for using data science-based approaches to understanding project health. This requires building a diverse and inclusive community of people with different ideas, perspectives, and skills who can help define and promote our data science efforts. This will allow us to provide value as a project by using CHAOSS data and data science approaches to answer questions from the other WGs and elsewhere within the CHAOSS community while also allowing people to get more real world data science experience.
11 |
12 | - type: markdown
13 | attributes:
14 | value: |
15 | ## Project Information
16 |
17 | - type: input
18 | attributes:
19 | label: Project Name (1 - 3 words)
20 | validations:
21 | required: true
22 |
23 | - type: textarea
24 | attributes:
25 | label: Description
26 | description: Provide summary of this project including the question you hope to answer and how the question is important for the CHAOSS community
27 | validations:
28 | required: true
29 |
30 | - type: textarea
31 | attributes:
32 | label: Related Links
33 | description: If you have already created additional documents or have other information about this project, please include those links here.
34 | validations:
35 | required: false
36 |
37 | - type: markdown
38 | attributes:
39 | value: |
40 | Note that we also have a [Project Scope Template doc](https://docs.google.com/document/d/13iLNDfqJ8nuwBGEyJuFutcT7KRNT6JwFrSlJN_5f4o4/edit) that you can use to think about the project details if you find it useful (not required).
41 |
42 | - type: dropdown
43 | attributes:
44 | label: How would you like to be involved in this project?
45 | options:
46 | - I am interested in this project, but do not plan to work on it myself
47 | - I have the experience and time available to lead this project
48 | - I would like to help with this project, but would prefer for someone else to lead it
49 | - Other (please specify in the “Additional Notes” at the end of this form)
50 | validations:
51 | required: true
52 |
53 | - type: textarea
54 | attributes:
55 | label: Additional Notes.
56 | validations:
57 | required: false
58 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .ipynb_checkpoints
3 | .vscode
4 | __pycache__
5 | ~lock
6 |
--------------------------------------------------------------------------------
/CHAOSS-Data-Science-Prospectus.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/CHAOSS-Data-Science-Prospectus.pdf
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to Contribute
2 |
3 | ## Where can I contribute?
4 |
5 | Anyone can contribute to CHAOSS on any of our communication channels. See .
6 |
7 | * If you think something should be done (including a contribution by yourself), please open an issue in this repository. That will allow others to learn that you think some work should be done, and can comment on that. If you intend to do the job yourself, please say that.
8 |
9 | * Everyone with an opinion on the matter should comment on the issue, explaining how they support the idea, propose some change to it, or think it is not worth / it is not the moment for doing it.
10 |
11 | * If comments are positive, and a certain consensus is achieved, propose a pull request with the changes to the repository (new document, changes to existing documents).
12 |
13 | * Everyone with an opinion on the pull request should comment on it, and detailed reviews should be done, maybe asking for new versions of the pull request. Once comments and reviews are positive, the change will be merged in the repository.
14 |
15 | * If consensus is not reached at any of these points, or the process stalls, it can be raised during one of the Common Working Group meetings, or in the mailing list, to try to unblock it.
16 |
17 | ## Which channel should I use?
18 | 1. Slack channel #data-science for general discussions and questions
19 | 2. Issue submission for discussions about future work or bugs / issues with existing work
20 | 3. Pull requests to contribute directly to this repository (after discussing the work in meetings or an issue)
21 |
22 | ### Conversations and high-level contributions
23 |
24 | Strategic directions, clarifications of scope, and ideas in an early stage are best discussed on the #data-science Slack channel or in meetings. See .
25 |
26 | ### Bug report and feature request contributions (issue)
27 |
28 | Bug reports and specific feature requests are best discussed in an issue on the repository they pertain to.
29 |
30 | ### Code or document change contributions (pull request)
31 |
32 | In this process, make sure your [GitHub account][ssh] is setup [fork][fork] then locally [clone][clone] the repo:
33 |
34 | git clone git@github.com:/.git
35 |
36 | Create a [feature branch][fb] in your local repository:
37 |
38 | git checkout -b
39 |
40 | Make your change and commit the change:
41 |
42 | git add
43 | git commit -m ""
44 |
45 | Push to your fork on GitHub:
46 |
47 | git push origin
48 |
49 | Then, [submit a pull request][pr] on GitHub to the CHAOSS repository.
50 |
51 | [ssh]: https://help.github.com/articles/connecting-to-github-with-ssh/
52 | [fork]: https://help.github.com/articles/fork-a-repo/
53 | [fb]: https://www.atlassian.com/git/tutorials/comparing-workflows/feature-branch-workflow
54 | [pr]: https://github.com/thoughtbot/factory_girl_rails/compare/
55 | [clone]: https://help.github.com/articles/cloning-a-repository/
56 |
57 | At this point you are waiting on the CHAOSS repository maintainers. They will comment on your pull requests
58 | within three business days (and, typically, one business day).
59 |
60 | The CHAOSS repository maintainers will report on open issues and pull requests on the [calls and via the mail list][participate] to elicit feedback from the community.
61 |
62 | [participate]: https://chaoss.community/participate/
63 |
64 | ## Committing back to the repository
65 | ## DCO and Sign-Off for contributions
66 |
67 | The [CHAOSS Charter](https://github.com/chaoss/governance/blob/master/project-charter.md) requires that contributions
68 | are accompanied by a [Developer Certificate of Origin](http://developercertificate.org) sign-off.
69 | For ensuring it, a bot checks all incoming commits.
70 |
71 | For users of the git command line interface, a sign-off is accomplished with the `-s` as part of the commit command:
72 |
73 | ```
74 | git commit -s -m 'This is a commit message'
75 | ```
76 |
77 | For users of the GitHub interface (using the "edit" button on any file, and producing a commit from it),
78 | a sign-off is accomplished by writing
79 |
80 | ```
81 | Signed-off-by: Your Name
82 | ```
83 |
84 | in a single line, into the commit comment field. This can be automated by using a browser plugin like
85 | [DCO GitHub UI](https://github.com/scottrigby/dco-gh-ui).
86 |
87 | #### Steps to use the DCO browser plugin
88 | The [DCO browser plugin](https://github.com/scottrigby/dco-gh-ui) is a handy tool to automatically sign commits created using GitHub.
89 | To enable this plugin:
90 |
91 | - Go to the plugin page on the [chrome web store](https://chrome.google.com/webstore/detail/dco-github-ui/onhgmjhnaeipfgacbglaphlmllkpoijo).
92 | - Alternatively, you could go to the [firefox addon page](https://addons.mozilla.org/en-US/firefox/addon/scott-rigby/) to add the extension to your browser.
93 | - Once you add the extension, right click on the extension in the toolbar of your browser and select `Options`.
94 | - A dialog box will open up as shown below. Fill in your GitHub name (not the handle) and email-id.
95 |
96 | 
97 |
98 | - Then, whenever you perform a commit on GitHub, the line `Signed-off-by: Your Name ` will automatically appear in the commit description while making changes to a file as shown in the example below. A commit message can be added to the lines above the auto-generated sign-off.
99 |
100 | 
101 |
102 | - Once you perform the commit and send a pull request, the commit will be verified and approved by the DCO bot.
103 |
104 | 
105 |
106 |
107 |
108 | ## Who is a CHAOSS repository maintainer?
109 |
110 | The README.md of the repository contains a list of chairs, maintainers, and other roles.
111 |
112 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 CHAOSS
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # CHAOSS Data Science Working Group
2 |
3 | ## Table of Contents
4 |
5 | - [Introduction](#introduction)
6 | - [Participate](#participate)
7 | - [Practitioner Guides](#practitioner-guides)
8 | - [Projects](#projects)
9 | - [Contributing](#contributing)
10 | - [Contributors](#contributors)
11 | - [License](#license)
12 |
13 | ## Introduction
14 |
15 | ### Goal
16 |
17 | Build a community of data scientists to collaborate on the [CHAOSS Data Science Initiative](https://chaoss.community/inside-the-chaoss-data-science-working-group/)
18 |
19 | ### Purpose
20 |
21 | We will collaborate with data scientists and researchers to shape how we understand open source community health and make it easier for people to use CHAOSS tools, metrics, and metrics models to draw meaningful insights that they can use to improve open source project health using data science-based approaches.
22 |
23 | ### Who should join this working group?
24 |
25 | Anyone interested in data science and data analysis can join. You don't need to be an expert or know how to perform advanced techniques, like machine learning or artificial intelligence. We welcome data scientists, data analysts, researchers and others with an interest in data.
26 |
27 | ### Background
28 |
29 | This is a working group within the CHAOSS project to support our Data Science Initiative. If you work for a company who is interested in sponsoring some of our work, we have a [sponsorship prospectus](CHAOSS-Data-Science-Prospectus.pdf) with more details.
30 |
31 | ## Participate
32 |
33 | ### How to Join Us?
34 |
35 | Want to join the working group? Here is a simple step by step guide on how to join:
36 |
37 | - [Getting started as a new/first time contributor](https://chaoss.community/kb-getting-started/)
38 | - [Agenda/Meeting-Minutes](https://docs.google.com/document/d/1jkAfGt97OGRwcdEn8hh5YyHQwoXRnOW96ikc_Aluo6M/edit)
39 | - Join us in the #wg-data-science channel within the CHAOSS Slack Workspace.
40 | - Learn on the [Participate](https://chaoss.community/participate/) page on the website
41 |
42 | We follow the [CHAOSS Code of Conduct](https://github.com/chaoss/governance/blob/master/code-of-conduct.md)
43 |
44 | ## Practitioner Guides
45 |
46 | The CHAOSS Data Science Working Group develops a set of [Practitioner Guides](https://chaoss.community/about-chaoss-practitioner-guides/) to help individuals understand how to interpret data about an open source project, enabling them to develop insights that can improve the project's health. They are designed for Open Source Program Offices (OSPOS), project leads, community managers, maintainers, and anyone who wants to understand project health better and take action on what they learn from their metrics.
47 |
48 |
49 | If you are interested in [contributing to the practitioner guides](https://github.com/chaoss/wg-data-science/tree/main/practitioner-guides), you can find more details in the practitioner-guides folder here in the repo.
50 |
51 | ## Projects
52 |
53 | We are also working on several projects using CHAOSS metrics and tools to help answer people's questions about open source projects and their unique dynamics. You can find details about these projects in the WG's [GitHub Issues](https://github.com/chaoss/wg-data-science/issues?q=is%3Aissue+is%3Aopen+label%3Aproject).
54 |
55 | ## Contributing
56 |
57 | See the [CONTRIBUTING.md](CONTRIBUTING.md) for more info.
58 |
59 | ## Contributors
60 |
61 | ### Chairs
62 |
63 | - [Dawn Foster](https://github.com/geekygirldawn)
64 | - [Chan Voong](https://github.com/voongc)
65 |
66 | ### Amazing CHAOSS Project Contributors
67 |
68 | Link to the [contributors](https://chaoss.community/metrics/#user-content-chaoss-contributors-include) listed on the website.
69 |
70 | ## License
71 |
72 | See [LICENSE](LICENSE) file.
73 |
74 | Copyright © CHAOSS, a Linux Foundation Project
75 |
--------------------------------------------------------------------------------
/challenges_survey/Challenges_Survey_Results_2023.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/challenges_survey/Challenges_Survey_Results_2023.pdf
--------------------------------------------------------------------------------
/challenges_survey/README.md:
--------------------------------------------------------------------------------
1 | # Understanding Challenges Survey
2 |
3 | In August and September, the CHAOSS project ran a survey of existing and past users of CHAOSS tools and metrics designed to help us better understand the barriers and challenges that make it difficult for people to gain meaningful, empirically-driven community health insights using CHAOSS tools and metrics. The [blog post](https://chaoss.community/survey-help-the-chaoss-project-improve-our-tools-and-metrics) has more details about the survey.
4 |
5 | The results of this survey can be found in the [Challenges_Survey_Results_2023.pdf](Challenges_Survey_Results_2023.pdf) file, and there is also a file called [challenge_comment_categories.md](challenge_comment_categories.md) with the comments about the challenges coded into several different categories.
6 |
7 | The key takeaways include:
8 |
9 | * Installing our software continues to be the biggest challenge
10 | * Finding data and drawing insights from the data are also top challenges
11 | * OSPOs continue to be important users of CHAOSS tools with many using both tools
12 |
13 | In the spirit of open source, the raw data is also available to the CHAOSS data science community to encourage people to explore the data in more detail. The raw data with personally identifiable data redacted can be found in the raw_data directory in this repo. The CSV file contains everything except the free form responses. Because there were very few reponses from certain groups (e.g., universities, government, nonprofit), attaching the free form responses to rest of the data made it possible to identify the person / group responsible for certain comments, so the order of the free form comments in the text documents has had the order randomized and some text redacted to prevent identification. Here are the filenames with the raw data:
14 | * [Clean_CHAOSS_Understanding_Challenges_Survey.csv](raw_data/Clean_CHAOSS_Understanding_Challenges_Survey.csv) - responses to the quantitative parts of the survey
15 | * [other_challenges_freeform_redacted.txt](raw_data/other_challenges_freeform_redacted.txt) - free form text responses to the question, "What other challenges have you faced that weren’t in the above list or what else would you like to see us improve?"
16 | * [strengths_freeform_redacted.txt](raw_data/strengths_freeform_redacted.txt) - free form text responses to the question, "What do you see as CHAOSS project strengths (what do you love about CHAOSS)?"
17 | * [understanding_challenges_graphs.pdf](raw_data/understanding_challenges_graphs.pdf) - simple graphs produced by Google forms displaying the responses for each question
18 |
19 |
--------------------------------------------------------------------------------
/challenges_survey/challenge_comment_categories.md:
--------------------------------------------------------------------------------
1 | This document has responses grouped into categories from the question, "What other challenges have you faced that weren’t in the above list or what else would you like to see us improve?"
2 |
3 | Many comments fall into multiple categories, so you’ll see the same content multiple times. The full text for the responses can be found in the [repo](https://github.com/chaoss/wg-data-science/tree/main/challenges_survey/raw_data)
4 |
5 | **Category: Taking action on data / generating insights from the data (8 Quotations)**
6 | * ability to see and understand what other people are doing in real situation so I can leverage that work, and compare with my own
7 | * Any new metric should have use cases associated where the usefulness of the metric becomes evident
8 | * Overwhelming data.
9 | * No CHAOSS tools make it easy to compare a large number of repos.
10 | * Many CHAOSS metrics aren't quantitative, so evaluating them requires manual examination. CHAOSS metrics don't cover marketing metrics like social media mentions.
11 | * The most challenging part is to communicate them to C level.
12 | * Measuring the health and success of community (marketing) activities
13 | * In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide.
14 |
15 | **Category: CHAOSS Processes (4 Quotations)**
16 | * It's hard to understand what is "official" CHAOSS software (like what is the relationship to compass?) or how things move from the metrics/models into the software.
17 | * There also Augur meetings on the calendar but I'm not sure if they are actually happening, and I'm not sure if they are just for people doing development work on Augur, as opposed to users.
18 | * The focus on OSPOs was a surprise to me as I got more involved in the project, it's understandable but makes it more difficult for people not in an OSPO to be seen as an audience for metrics/software, which is unfortunate because I think there is still a lot of value in CHAOSS metrics for people who are not part of an official OSPO.
19 | * Overall I find it complex to understand what CHAOSS is actually about and how the tools relate to what is described
20 |
21 | **Category: Metric Definitions (3 Quotations)**
22 | * Any new metric should have use cases associated where the usefulness of the metric becomes evident
23 | * Many CHAOSS metrics aren't quantitative, so evaluating them requires manual examination. CHAOSS metrics don't cover marketing metrics like social media mentions.
24 | * Issues and PR backlog
25 |
26 | **Category: Software: Augur (3 Quotations)**
27 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
28 | * I was excited by the prospect of Jupyter notebooks but then it turned out those also rely on Augur which could be fine but then some better instruction on getting just the Augur db running would be good. A simplified Augur install that is just the bare minimum for working with a data snapshot would be nice. Multiple instances of documentation also make working with Augur hard. There also Augur meetings on the calendar but I'm not sure if they are actually happening, and I'm not sure if they are just for people doing development work on Augur, as opposed to users.
29 | * We find Augur easy to install and use.
30 |
31 | **Category: Software: Contributing (2 Quotations)**
32 | * Contributing has itself been a challenge, since documentation is woefully incomplete and (at times) inaccurate
33 | * There also Augur meetings on the calendar but I'm not sure if they are actually happening, and I'm not sure if they are just for people doing development work on Augur, as opposed to users.
34 |
35 | **Category: Software: GrimoireLab (9 Quotations)**
36 | * More clarity/documentation in data model (for comparison against other data sources); OpenSearch backend is incompatible with other internal tooling so our implementation is a bit of an island :/
37 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
38 | * Integrating grimorelab with custom dashboards and frontend/backend software seemed tricky as far as ive heard from the team that was responsible for that. I dont know the details though, sounded like there was some architectural tech debt (not sure what that means) that made it hard to for the developers to integrate.
39 | * This was more of a product issue (when I used Bitergia), but a few times we had infrastructure issues so that an up-to-date data wasn't available for an extended period. For corporate users (especially in technology companies), the main reason why they go with an outside vendor is so that they don't need to manage infrastructure/software.
40 | * Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
41 | * I worked with Cauldron.io and thought it was a great tool but could not get GL running locally so my experimenting was limited to what was offered there. As a [REDACTED] I don't really have a budget to pay for metrics as a service just to experiment or to try and get other people involved, so I greatly appreciated the cauldron service.
42 | * grimoire sigils should support opensearch / pivot away from kibiter asap
43 | * Bitergia tools are quite complicated and take a lot of time to set up properly.
44 | * Grimoirelab did not work well for the size of our OSPO.
45 |
46 | **Category: Software: Software / Metrics Relationship (1 Quotation)**
47 | * Understanding the relationship between the software and the metrics can be difficult, things aren't always named the same and the methods of calculation are not always transparent in the software so you can't tell if it is really doing what you think it is.
48 |
49 | **Category: Technical: Compatibility / Tech Stack (7 Quotations)**
50 | * More clarity/documentation in data model (for comparison against other data sources); OpenSearch backend is incompatible with other internal tooling so our implementation is a bit of an island :/
51 | * From an OSPO perspective, I feel that OpenSSF Scorecard and Open Source Review Toolkit are quite valuable tools and resources ("easy" to set up and get a report). It would be great to see CHAOSS tools at the same level of maturity and value, or even integration with such tools, to get more adoption in the industry. On the other hand, the diversity of tools and setups related to open source development and contribution makes it so hard to find "the right" tools to
52 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
53 | * Integrating grimorelab with custom dashboards and frontend/backend software seemed tricky as far as ive heard from the team that was responsible for that. I dont know the details though, sounded like there was some architectural tech debt (not sure what that means) that made it hard to for the developers to integrate.
54 | * Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
55 | * grimoire sigils should support opensearch / pivot away from kibiter asap
56 | * I was excited by the prospect of Jupyter notebooks but then it turned out those also rely on Augur which could be fine but then some better instruction on getting just the Augur db running would be good.
57 |
58 | **Category: Technical: Documentation (3 Quotations)**
59 | * Contributing has itself been a challenge, since documentation is woefully incomplete and (at times) inaccurate
60 | * Documentation seems really good but there are a LOT of steps - I have such a limited amount of time that by the time I read through the documents to remember where I left off last time, I've already run out of time to do any experimentation.
61 | * Multiple instances of documentation also make working with Augur hard.
62 |
63 | **Category: Technical: Ease of Use / UX (9 Quotations)**
64 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
65 | * Documentation seems really good but there are a LOT of steps - I have such a limited amount of time that by the time I read through the documents to remember where I left off last time, I've already run out of time to do any experimentation.
66 | * Estimating how much time and effort using tools would take, compared to ad hoc methods of assessing similar questions.
67 | * Overall I find it complex to understand what CHAOSS is actually about and how the tools relate to what is described. They all seem quite complex and difficult to get started or require very special data analysis expertise. Maybe adding pre-requirements to usem them?
68 | * Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
69 | * I would really like some lighter weight ways to play with the metrics. I was excited by the prospect of Jupyter notebooks but then it turned out those also rely on Augur which could be fine but then some better instruction on getting just the Augur db running would be good. A simplified Augur install that is just the bare minimum for working with a data snapshot would be nice.
70 | * From an OSPO perspective, I feel that OpenSSF Scorecard and Open Source Review Toolkit are quite valuable tools and resources ("easy" to set up and get a report). It would be great to see CHAOSS tools at the same level of maturity and value,
71 | * Overwhelming data.
72 | * No CHAOSS tools make it easy to compare a large number of repos.
73 |
74 | **Category: Technical: Installation (10 Quotations)**
75 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
76 | * Documentation seems really good but there are a LOT of steps - I have such a limited amount of time that by the time I read through the documents to remember where I left off last time, I've already run out of time to do any experimentation.
77 | * Overall I find it complex to understand what CHAOSS is actually about and how the tools relate to what is described. They all seem quite complex and difficult to get started or require very special data analysis expertise. Maybe adding pre-requirements to usem them?
78 | * Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
79 | * Getting either piece of software running locally proved impossible for me :)
80 | * I worked with Cauldron.io and thought it was a great tool but could not get GL running locally so my experimenting was limited to what was offered there.
81 | * From an OSPO perspective, I feel that OpenSSF Scorecard and Open Source Review Toolkit are quite valuable tools and resources ("easy" to set up and get a report). It would be great to see CHAOSS tools at the same level of maturity and value,
82 | * Bitergia tools are quite complicated and take a lot of time to set up properly.
83 | * self hosting is becoming really difficult, esp for remote only orgs
84 | * Docker compose
85 |
86 | **Category: Technical: Reliability (1 Quotation)**
87 | * This was more of a product issue (when I used Bitergia), but a few times we had infrastructure issues so that an up-to-date data wasn't available for an extended period. For corporate users (especially in technology companies), the main reason why they go with an outside vendor is so that they don't need to manage infrastructure/software.
88 |
89 | **Category: Technical: Relating to SaaS or need for SaaS solutions (3 Quotations)**
90 | * This was more of a product issue (when I used Bitergia), but a few times we had infrastructure issues so that an up-to-date data wasn't available for an extended period. For corporate users (especially in technology companies), the main reason why they go with an outside vendor is so that they don't need to manage infrastructure/software.
91 | * I worked with Cauldron.io and thought it was a great tool but could not get GL running locally so my experimenting was limited to what was offered there. As a [REDACTED] I don't really have a budget to pay for metrics as a service just to experiment or to try and get other people involved, so I greatly appreciated the cauldron service.
92 | * self hosting is becoming really difficult, esp for remote only orgs
93 |
94 |
--------------------------------------------------------------------------------
/challenges_survey/draft_interpretations/README.md:
--------------------------------------------------------------------------------
1 | # Draft Interpretations
2 |
3 | If you would like to share your interpretation of the survey data, please create a PR and place your files in this directory. If you have multiple files, please create a subdirectory.
4 |
--------------------------------------------------------------------------------
/challenges_survey/draft_interpretations/dawn_analysis/Challenges_Survey_2023.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/challenges_survey/draft_interpretations/dawn_analysis/Challenges_Survey_2023.pdf
--------------------------------------------------------------------------------
/challenges_survey/draft_interpretations/dawn_analysis/challenge_comment_categories.md:
--------------------------------------------------------------------------------
1 | This document has responses grouped into categories from the question, "What other challenges have you faced that weren’t in the above list or what else would you like to see us improve?"
2 |
3 | Many comments fall into multiple categories, so you’ll see the same content multiple times. The full text for the responses can be found in the [repo](https://github.com/chaoss/wg-data-science/tree/main/challenges_survey/raw_data)
4 |
5 | **Category: Taking action on data / generating insights from the data (8 Quotations)**
6 | * ability to see and understand what other people are doing in real situation so I can leverage that work, and compare with my own
7 | * Any new metric should have use cases associated where the usefulness of the metric becomes evident
8 | * Overwhelming data.
9 | * No CHAOSS tools make it easy to compare a large number of repos.
10 | * Many CHAOSS metrics aren't quantitative, so evaluating them requires manual examination. CHAOSS metrics don't cover marketing metrics like social media mentions.
11 | * The most challenging part is to communicate them to C level.
12 | * Measuring the health and success of community (marketing) activities
13 | * In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide.
14 |
15 | **Category: CHAOSS Processes (4 Quotations)**
16 | * It's hard to understand what is "official" CHAOSS software (like what is the relationship to compass?) or how things move from the metrics/models into the software.
17 | * There also Augur meetings on the calendar but I'm not sure if they are actually happening, and I'm not sure if they are just for people doing development work on Augur, as opposed to users.
18 | * The focus on OSPOs was a surprise to me as I got more involved in the project, it's understandable but makes it more difficult for people not in an OSPO to be seen as an audience for metrics/software, which is unfortunate because I think there is still a lot of value in CHAOSS metrics for people who are not part of an official OSPO.
19 | * Overall I find it complex to understand what CHAOSS is actually about and how the tools relate to what is described
20 |
21 | **Category: Metric Definitions (3 Quotations)**
22 | * Any new metric should have use cases associated where the usefulness of the metric becomes evident
23 | * Many CHAOSS metrics aren't quantitative, so evaluating them requires manual examination. CHAOSS metrics don't cover marketing metrics like social media mentions.
24 | * Issues and PR backlog
25 |
26 | **Category: Software: Augur (3 Quotations)**
27 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
28 | * I was excited by the prospect of Jupyter notebooks but then it turned out those also rely on Augur which could be fine but then some better instruction on getting just the Augur db running would be good. A simplified Augur install that is just the bare minimum for working with a data snapshot would be nice. Multiple instances of documentation also make working with Augur hard. There also Augur meetings on the calendar but I'm not sure if they are actually happening, and I'm not sure if they are just for people doing development work on Augur, as opposed to users.
29 | * We find Augur easy to install and use.
30 |
31 | **Category: Software: Contributing (2 Quotations)**
32 | * Contributing has itself been a challenge, since documentation is woefully incomplete and (at times) inaccurate
33 | * There also Augur meetings on the calendar but I'm not sure if they are actually happening, and I'm not sure if they are just for people doing development work on Augur, as opposed to users.
34 |
35 | **Category: Software: GrimoireLab (9 Quotations)**
36 | * More clarity/documentation in data model (for comparison against other data sources); OpenSearch backend is incompatible with other internal tooling so our implementation is a bit of an island :/
37 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
38 | * Integrating grimorelab with custom dashboards and frontend/backend software seemed tricky as far as ive heard from the team that was responsible for that. I dont know the details though, sounded like there was some architectural tech debt (not sure what that means) that made it hard to for the developers to integrate.
39 | * This was more of a product issue (when I used Bitergia), but a few times we had infrastructure issues so that an up-to-date data wasn't available for an extended period. For corporate users (especially in technology companies), the main reason why they go with an outside vendor is so that they don't need to manage infrastructure/software.
40 | * Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
41 | * I worked with Cauldron.io and thought it was a great tool but could not get GL running locally so my experimenting was limited to what was offered there. As a [REDACTED] I don't really have a budget to pay for metrics as a service just to experiment or to try and get other people involved, so I greatly appreciated the cauldron service.
42 | * grimoire sigils should support opensearch / pivot away from kibiter asap
43 | * Bitergia tools are quite complicated and take a lot of time to set up properly.
44 | * Grimoirelab did not work well for the size of our OSPO.
45 |
46 | **Category: Software: Software / Metrics Relationship (1 Quotation)**
47 | * Understanding the relationship between the software and the metrics can be difficult, things aren't always named the same and the methods of calculation are not always transparent in the software so you can't tell if it is really doing what you think it is.
48 |
49 | **Category: Technical: Compatibility / Tech Stack (7 Quotations)**
50 | * More clarity/documentation in data model (for comparison against other data sources); OpenSearch backend is incompatible with other internal tooling so our implementation is a bit of an island :/
51 | * From an OSPO perspective, I feel that OpenSSF Scorecard and Open Source Review Toolkit are quite valuable tools and resources ("easy" to set up and get a report). It would be great to see CHAOSS tools at the same level of maturity and value, or even integration with such tools, to get more adoption in the industry. On the other hand, the diversity of tools and setups related to open source development and contribution makes it so hard to find "the right" tools to
52 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
53 | * Integrating grimorelab with custom dashboards and frontend/backend software seemed tricky as far as ive heard from the team that was responsible for that. I dont know the details though, sounded like there was some architectural tech debt (not sure what that means) that made it hard to for the developers to integrate.
54 | * Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
55 | * grimoire sigils should support opensearch / pivot away from kibiter asap
56 | * I was excited by the prospect of Jupyter notebooks but then it turned out those also rely on Augur which could be fine but then some better instruction on getting just the Augur db running would be good.
57 |
58 | **Category: Technical: Documentation (3 Quotations)**
59 | * Contributing has itself been a challenge, since documentation is woefully incomplete and (at times) inaccurate
60 | * Documentation seems really good but there are a LOT of steps - I have such a limited amount of time that by the time I read through the documents to remember where I left off last time, I've already run out of time to do any experimentation.
61 | * Multiple instances of documentation also make working with Augur hard.
62 |
63 | **Category: Technical: Ease of Use / UX (9 Quotations)**
64 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
65 | * Documentation seems really good but there are a LOT of steps - I have such a limited amount of time that by the time I read through the documents to remember where I left off last time, I've already run out of time to do any experimentation.
66 | * Estimating how much time and effort using tools would take, compared to ad hoc methods of assessing similar questions.
67 | * Overall I find it complex to understand what CHAOSS is actually about and how the tools relate to what is described. They all seem quite complex and difficult to get started or require very special data analysis expertise. Maybe adding pre-requirements to usem them?
68 | * Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
69 | * I would really like some lighter weight ways to play with the metrics. I was excited by the prospect of Jupyter notebooks but then it turned out those also rely on Augur which could be fine but then some better instruction on getting just the Augur db running would be good. A simplified Augur install that is just the bare minimum for working with a data snapshot would be nice.
70 | * From an OSPO perspective, I feel that OpenSSF Scorecard and Open Source Review Toolkit are quite valuable tools and resources ("easy" to set up and get a report). It would be great to see CHAOSS tools at the same level of maturity and value,
71 | * Overwhelming data.
72 | * No CHAOSS tools make it easy to compare a large number of repos.
73 |
74 | **Category: Technical: Installation (10 Quotations)**
75 | * Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
76 | * Documentation seems really good but there are a LOT of steps - I have such a limited amount of time that by the time I read through the documents to remember where I left off last time, I've already run out of time to do any experimentation.
77 | * Overall I find it complex to understand what CHAOSS is actually about and how the tools relate to what is described. They all seem quite complex and difficult to get started or require very special data analysis expertise. Maybe adding pre-requirements to usem them?
78 | * Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
79 | * Getting either piece of software running locally proved impossible for me :)
80 | * I worked with Cauldron.io and thought it was a great tool but could not get GL running locally so my experimenting was limited to what was offered there.
81 | * From an OSPO perspective, I feel that OpenSSF Scorecard and Open Source Review Toolkit are quite valuable tools and resources ("easy" to set up and get a report). It would be great to see CHAOSS tools at the same level of maturity and value,
82 | * Bitergia tools are quite complicated and take a lot of time to set up properly.
83 | * self hosting is becoming really difficult, esp for remote only orgs
84 | * Docker compose
85 |
86 | **Category: Technical: Reliability (1 Quotation)**
87 | * This was more of a product issue (when I used Bitergia), but a few times we had infrastructure issues so that an up-to-date data wasn't available for an extended period. For corporate users (especially in technology companies), the main reason why they go with an outside vendor is so that they don't need to manage infrastructure/software.
88 |
89 | **Category: Technical: Relating to SaaS or need for SaaS solutions (3 Quotations)**
90 | * This was more of a product issue (when I used Bitergia), but a few times we had infrastructure issues so that an up-to-date data wasn't available for an extended period. For corporate users (especially in technology companies), the main reason why they go with an outside vendor is so that they don't need to manage infrastructure/software.
91 | * I worked with Cauldron.io and thought it was a great tool but could not get GL running locally so my experimenting was limited to what was offered there. As a [REDACTED] I don't really have a budget to pay for metrics as a service just to experiment or to try and get other people involved, so I greatly appreciated the cauldron service.
92 | * self hosting is becoming really difficult, esp for remote only orgs
93 |
94 |
--------------------------------------------------------------------------------
/challenges_survey/raw_data/Clean_CHAOSS_Understanding_Challenges_Survey.csv:
--------------------------------------------------------------------------------
1 | What type of organization do you work for?,Do you work in an Open Source Program Office (OSPO) or similar open source team?,Which of these best describes your role or position?,"Have you contributed to the CHAOSS project (including non-code contributions, e.g., metrics definitions, documentation, presentations, blog posts, meeting attendance)? ",Which of these have you used or tried to use?,How long have you been using CHAOSS tools or custom code that implements CHAOSS metrics?,"Rank order the challenges you have faced using CHAOSS tools. If any don’t apply
2 | to you, select the option labeled ‘NA’ (Not Applicable).
3 | 1 is most challenging and 7 is least challenging. [Installing / configuring software]","Rank order the challenges you have faced using CHAOSS tools. If any don’t apply
4 | to you, select the option labeled ‘NA’ (Not Applicable).
5 | 1 is most challenging and 7 is least challenging. [Maintaining software over time]","Rank order the challenges you have faced using CHAOSS tools. If any don’t apply
6 | to you, select the option labeled ‘NA’ (Not Applicable).
7 | 1 is most challenging and 7 is least challenging. [Cleaning up the data (e.g., merge duplicate contributors, company affiliation)]","Rank order the challenges you have faced using CHAOSS tools. If any don’t apply
8 | to you, select the option labeled ‘NA’ (Not Applicable).
9 | 1 is most challenging and 7 is least challenging. [Finding the data / metrics you want to use]","Rank order the challenges you have faced using CHAOSS tools. If any don’t apply
10 | to you, select the option labeled ‘NA’ (Not Applicable).
11 | 1 is most challenging and 7 is least challenging. [Drawing meaningful insights out of the data]","Rank order the challenges you have faced using CHAOSS tools. If any don’t apply
12 | to you, select the option labeled ‘NA’ (Not Applicable).
13 | 1 is most challenging and 7 is least challenging. [Communicating meaningful insights to others, including executives]","Rank order the challenges you have faced using CHAOSS tools. If any don’t apply
14 | to you, select the option labeled ‘NA’ (Not Applicable).
15 | 1 is most challenging and 7 is least challenging. [Getting others within your company / community to use the software]"
16 | University or other academic institution,Yes,"Development or operations focused (e.g., developer, sys admin)",Currently contributing to the CHAOSS project,Augur,Less than 1 year,3,2,4,1,NA,NA,NA
17 | For-profit company,Yes,"Leadership (e.g., primarily manage other people)","Past contributor, but no longer contributing",GrimoireLab (including users of Bitergia’s platform and Cauldron),More than 3 years,1,2,3,7,4,6,2
18 | None of the above,No,consultancy,"Past contributor, but no longer contributing",GrimoireLab (including users of Bitergia’s platform and Cauldron),More than 3 years,5,NA,7,3,2,1,4
19 | University or other academic institution,Yes,"Leadership (e.g., primarily manage other people)",Currently contributing to the CHAOSS project,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",2 - 3 years,1,2,3,5,4,NA,NA
20 | For-profit company,Yes,"Development or operations focused (e.g., developer, sys admin)",Currently contributing to the CHAOSS project,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",1 - 2 years,6,5,1,4,3,7,NA
21 | For-profit company,Yes,"Leadership (e.g., primarily manage other people)",Currently contributing to the CHAOSS project,Augur,1 - 2 years,2,1,1,1,1,3,4
22 | University or other academic institution,No,"Community focused (e.g., community manager)",Currently contributing to the CHAOSS project,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",Less than 1 year,1,NA,4,2,7,7,NA
23 | None of the above,No,"Development or operations focused (e.g., developer, sys admin)",Never contributed to CHAOSS,GrimoireLab (including users of Bitergia’s platform and Cauldron),I haven’t used CHAOSS tools,NA,NA,4,4,5,NA,NA
24 | For-profit company,Yes,"Leadership (e.g., primarily manage other people)",Currently contributing to the CHAOSS project,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",More than 3 years,2,6,5,7,7,7,5
25 | For-profit company,Yes,"Community focused (e.g., community manager)","Past contributor, but no longer contributing",GrimoireLab (including users of Bitergia’s platform and Cauldron),2 - 3 years,1,NA,2,1,2,2,2
26 | None of the above,No,Consultant,"Past contributor, but no longer contributing","Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",More than 3 years,1,7,6,5,3,2,4
27 | For-profit company,Yes,"OSPO Lead, direct activities and liaison with CISO/Eng/Legal",Never contributed to CHAOSS,Commercial tools with project Health metrics like Snyk,I haven’t used CHAOSS tools,3,2,NA,NA,NA,NA,2
28 | For-profit company,No,"Community focused (e.g., community manager)",Currently contributing to the CHAOSS project,GrimoireLab (including users of Bitergia’s platform and Cauldron),2 - 3 years,NA,NA,1,3,1,3,4
29 | For-profit company,Yes,"Development or operations focused (e.g., developer, sys admin)","Past contributor, but no longer contributing","Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",1 - 2 years,5,NA,NA,2,1,1,1
30 | For-profit company,Yes,Program Manager,Currently contributing to the CHAOSS project,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron), DEI.md",1 - 2 years,3,NA,NA,3,3,3,NA
31 | Nonprofit,Yes,"Community focused (e.g., community manager)",Currently contributing to the CHAOSS project,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",Less than 1 year,NA,NA,NA,6,6,6,2
32 | None of the above,Yes,"Development or operations focused (e.g., developer, sys admin)",Never contributed to CHAOSS,GrimoireLab (including users of Bitergia’s platform and Cauldron),More than 3 years,6,3,1,4,2,1,7
33 | For-profit company,Yes,"Community focused (e.g., community manager)",Currently contributing to the CHAOSS project,"MergeStat, CNCF DevStats",I haven’t used CHAOSS tools,4,6,5,1,2,3,7
34 | For-profit company,Yes,"Data focused (e.g., data analysis, data science)",Currently contributing to the CHAOSS project,GrimoireLab (including users of Bitergia’s platform and Cauldron),2 - 3 years,7,7,7,7,7,7,3
35 | Government,Yes,"Community focused (e.g., community manager)",Never contributed to CHAOSS,"Don’t know, not sure, or haven’t used any tools",I haven’t used CHAOSS tools,NA,NA,NA,NA,NA,NA,NA
36 | Nonprofit,Yes,"Community focused (e.g., community manager)",Never contributed to CHAOSS,Augur,Less than 1 year,NA,NA,NA,3,3,NA,NA
37 | For-profit company,Yes,the first three,Currently contributing to the CHAOSS project,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",More than 3 years,7,7,7,7,3,3,7
38 | For-profit company,Yes,"Data focused (e.g., data analysis, data science)",Currently contributing to the CHAOSS project,GrimoireLab (including users of Bitergia’s platform and Cauldron),2 - 3 years,NA,NA,NA,2,2,2,1
39 | For-profit company,No,"Data focused (e.g., data analysis, data science)",Currently contributing to the CHAOSS project,GrimoireLab (including users of Bitergia’s platform and Cauldron),2 - 3 years,3,6,2,5,6,2,1
40 | For-profit company,Yes,"Community focused (e.g., community manager)",Never contributed to CHAOSS,"Don’t know, not sure, or haven’t used any tools",I haven’t used CHAOSS tools,2,3,1,1,1,1,2
41 | For-profit company,No,"Community focused (e.g., community manager)",Never contributed to CHAOSS,GrimoireLab (including users of Bitergia’s platform and Cauldron),More than 3 years,1,4,3,3,6,6,2
42 | For-profit company,Yes,"Community focused (e.g., community manager)",Currently contributing to the CHAOSS project,"Don’t know, not sure, or haven’t used any tools",I haven’t used CHAOSS tools,7,NA,NA,NA,NA,NA,7
43 | For-profit company,Yes,"Leadership (e.g., primarily manage other people)",Currently contributing to the CHAOSS project,Augur,1 - 2 years,7,7,5,7,6,7,6
44 | For-profit company,Yes,"Leadership (e.g., primarily manage other people)",Never contributed to CHAOSS,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",Less than 1 year,2,3,6,4,4,6,4
45 | For-profit company,Yes,"Leadership (e.g., primarily manage other people)",Never contributed to CHAOSS,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",1 - 2 years,5,5,5,5,5,5,5
46 | None of the above,No,"Data focused (e.g., data analysis, data science)",Currently contributing to the CHAOSS project,"Augur, GrimoireLab (including users of Bitergia’s platform and Cauldron)",More than 3 years,1,2,3,6,5,7,4
47 |
--------------------------------------------------------------------------------
/challenges_survey/raw_data/other_challenges_freeform_redacted.txt:
--------------------------------------------------------------------------------
1 | What other challenges have you faced that weren’t in the above list or what else would you like to see us improve?
2 |
3 | Getting either piece of software running locally proved impossible for me :) It's hard to understand what is "official" CHAOSS software (like what is the relationship to compass?) or how things move from the metrics/models into the software. Understanding the relationship between the software and the metrics can be difficult, things aren't always named the same and the methods of calculation are not always transparent in the software so you can't tell if it is really doing what you think it is. I would really like some lighter weight ways to play with the metrics. I was excited by the prospect of Jupyter notebooks but then it turned out those also rely on Augur which could be fine but then some better instruction on getting just the Augur db running would be good. A simplified Augur install that is just the bare minimum for working with a data snapshot would be nice. Multiple instances of documentation also make working with Augur hard. There also Augur meetings on the calendar but I'm not sure if they are actually happening, and I'm not sure if they are just for people doing development work on Augur, as opposed to users. I worked with Cauldron.io and thought it was a great tool but could not get GL running locally so my experimenting was limited to what was offered there. As a [REDACTED] I don't really have a budget to pay for metrics as a service just to experiment or to try and get other people involved, so I greatly appreciated the cauldron service. The focus on OSPOs was a surprise to me as I got more involved in the project, it's understandable but makes it more difficult for people not in an OSPO to be seen as an audience for metrics/software, which is unfortunate because I think there is still a lot of value in CHAOSS metrics for people who are not part of an official OSPO.
4 |
5 | More clarity/documentation in data model (for comparison against other data sources); OpenSearch backend is incompatible with other internal tooling so our implementation is a bit of an island :/
6 |
7 | Contributing has itself been a challenge, since documentation is woefully incomplete and (at times) inaccurate
8 |
9 | From an OSPO perspective, I feel that OpenSSF Scorecard and Open Source Review Toolkit are quite valuable tools and resources ("easy" to set up and get a report). It would be great to see CHAOSS tools at the same level of maturity and value, or even integration with such tools, to get more adoption in the industry. On the other hand, the diversity of tools and setups related to open source development and contribution makes it so hard to find "the right" tools to get valuable information about people, activity, and "performance".
10 |
11 | Any new metric should have use cases associated where the usefulness of the metric becomes evident
12 |
13 | Both GrimoireLab and Augur don't really provide a UX that is fit for a product. Rather they are fairly complicated back-ends that require a significant amount of configuration to set up. For a well funded OSPO that has a clear list of the data sources they want to track, this may be useful, but for something like [REDACTED], this means that many of our needs are not met. In our case as [REDACTED], we need to provide an experience for [REDACTED] that encourages them to track the community health of projects they use and participate in. This requires a certain kind of UX as well as better integration with other data sources and services that we provide. Particularly with GrimoireLab we found that building on top of the current architecture was fairly difficult.
14 |
15 | grimoire sigils should support opensearch / pivot away from kibiter asap
16 |
17 | Overwhelming data.
18 |
19 | Integrating grimorelab with custom dashboards and frontend/backend software seemed tricky as far as ive heard from the team that was responsible for that. I dont know the details though, sounded like there was some architectural tech debt (not sure what that means) that made it hard to for the developers to integrate.
20 |
21 | Bitergia tools are quite complicated and take a lot of time to set up properly.
22 |
23 | self hosting is becoming really difficult, esp for remote only orgs
24 |
25 | This was more of a product issue (when I used Bitergia), but a few times we had infrastructure issues so that an up-to-date data wasn't available for an extended period. For corporate users (especially in technology companies), the main reason why they go with an outside vendor is so that they don't need to manage infrastructure/software.
26 |
27 | No CHAOSS tools make it easy to compare a large number of repos. Many CHAOSS metrics aren't quantitative, so evaluating them requires manual examination. CHAOSS metrics don't cover marketing metrics like social media mentions.
28 |
29 | Documentation seems really good but there are a LOT of steps - I have such a limited amount of time that by the time I read through the documents to remember where I left off last time, I've already run out of time to do any experimentation.
30 |
31 | The most challenging part is to communicate them to C level.
32 |
33 | Estimating how much time and effort using tools would take, compared to ad hoc methods of assessing similar questions.
34 |
35 | ability to see and understand what other people are doing in real situation so I can leverage that work, and compare with my own
36 |
37 | Issues and PR backlog
38 |
39 | Measuring the health and success of community (marketing) activities
40 |
41 | Overall I find it complex to understand what CHAOSS is actually about and how the tools relate to what is described. They all seem quite complex and difficult to get started or require very special data analysis expertise. Maybe adding pre-requirements to usem them?
42 |
43 | Docker compose
44 |
45 | Creating production environments (security, set up backups, durability), updating, migrating to newer components (OpenSearch).
46 |
47 | Grimoirelab did not work well for the size of our OSPO. We find Augur easy to install and use.
48 |
--------------------------------------------------------------------------------
/challenges_survey/raw_data/strengths_freeform_redacted.txt:
--------------------------------------------------------------------------------
1 | What do you see as CHAOSS project strengths (what do you love about CHAOSS)?
2 |
3 | The depth of expertise in the community
4 |
5 | CHAOSS is exceptional at community health. The community generating community health metrics is unusually healthy.
6 | It is a model for good practices.
7 |
8 | Being committed to open-source is becoming frustratingly less common, so I'm glad to see an organization insisting that their work will belong to the people, and not to any hypothetical corporate backers.
9 |
10 | The passion and commitment of the CHAOSS community to provide a set of valuable metrics to describe an open source projects.
11 |
12 | Agreements over definitions
13 |
14 | CHAOSS has an active and welcoming community. The projects are interesting and useful.
15 |
16 | ease of community involvement and insight into metrics
17 |
18 | Standard metrics, and good tools
19 |
20 | The project is open and welcoming and is doing really important work! I really appreciate the extent to which newbies are able to participate.
21 |
22 | Seems interesting. Havent interacted with them much though, but it seems like they have a good mission/goal. If CHAOSS is at all related to OSI, maybe this thing a coworker sent me recently could be applicable? https://yakshav.es/non-thoughts-on-the-osi/
23 |
24 | - Creating a welcoming space for open source project leaders to connect and discuss community health indicators. - Educating community members on the existing CHAOSS metrics and metrics models and empowering them to develop new ones to address their needs.
25 |
26 | The existence of a unified, standardized set of metrics is invaluable for determining community health and value.
27 |
28 | Great mission, great educational outreach, great conference presentations, genuinely nice people.
29 |
30 | open source, community based, non-commercial
31 |
32 | Well respected/regarded in open source communities. Good community culture.
33 |
34 | The promise of getting metadata about the health of my open source.
35 |
36 | Great community, solid ideas and great tooling available
37 |
38 | The community!
39 |
40 | News Ideas, and family like.
41 |
42 | Only place I know of where these OS community metrics exist.
43 |
44 | great community, lots of smart people, diverse, energetic, welcoming
45 |
46 | Community, peer discussions, implementation stories
47 |
48 | Such a welcoming and loving community
49 |
50 | Project health reporting.
51 |
52 | Community-driven with real-world examples
53 |
54 | It is always improving and focusing on finding meaningful metrics to drive communities' health forward for a more equitable open source ecosystem (this is how I see CHAOSS <3)
55 |
56 | Augur really makes metrics highly visible on a large scale like I need
57 |
58 | Once set up, grimoirelab is a veeery impressive tool. I'm glad it's hosted under CHAOSS.
59 |
60 | Standards
61 |
62 | Such an amazing community of lovely people
63 |
--------------------------------------------------------------------------------
/challenges_survey/raw_data/understanding_challenges_graphs.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/challenges_survey/raw_data/understanding_challenges_graphs.pdf
--------------------------------------------------------------------------------
/data-ethics-statement.md:
--------------------------------------------------------------------------------
1 | _The usage and dissemination of health metrics may lead to privacy violations. Organizations may be exposed to risks. These risks may flow from compliance with the GDPR in the EU, with state law in the US, or with other laws. There may also be contractual risks flowing from terms of service for data providers such as GitHub and GitLab. The usage of metrics must be examined for risk and potential data ethics problems. Please see [CHAOSS Data Ethics document](https://github.com/chaoss/community/blob/main/data-use-statement.md) for additional guidance._
2 |
--------------------------------------------------------------------------------
/dataset/README.md:
--------------------------------------------------------------------------------
1 | # CHAOSS Data Science WG Datasets
2 |
3 | This is where we can store datasets for [projects](https://github.com/chaoss/wg-data-science/issues?q=is%3Aissue+is%3Aopen+label%3Aproject) being worked on within the Data Science Working Group.
4 |
5 | If you would like to contribute a dataset, please create a new subfolder containing your data, and please follow our [Contribution guidelines](https://github.com/chaoss/wg-data-science/blob/main/CONTRIBUTING.md) including DCO sign-off for all commits.
6 |
7 | So far, we have one dataset right now for analysis of projects making [license changes](license-changes) to their open source projects.
8 |
9 |
--------------------------------------------------------------------------------
/dataset/archive/README.md:
--------------------------------------------------------------------------------
1 | # Archival of Open Source Projects (aka Sudden Archival)
2 |
3 | Archival is often used as an indicator that a project is no longer being maintained and will not be updated (including security updates). We know that a lot of projects are archived when they are abandoned and people stop working on them, but what about projects that are archived for other reasons?
4 |
5 | More Details about this project:
6 | * Tracked in [Issue #45](https://github.com/chaoss/wg-data-science/issues/45)
7 | * [Project Planning document](https://docs.google.com/document/d/18audPynKQg_n7ZdspeUGtPSF7cMzc9O7CVcrEdJAhtk/edit?usp=sharing)
8 | * [WIP datasets](https://github.com/chaoss/wg-data-science/tree/main/dataset/archive)
9 | * data-files/archive_repos.csv: contains 733 repos - all GH repos with > 1000 stars and > 100 forks with an open source license.
10 | * archived_projects.py: the script containing the GraphQL API query to generate data-files/archive_repos.csv
11 |
12 | Current status:
13 | * Initial [WIP datasets](https://github.com/chaoss/wg-data-science/tree/main/dataset/archive) being gathered.
14 | * We'll start this project in June during the [Data Science Hackathon](https://chaoss.community/chaoss-data-science-hackathon-2025/), so the current work will be in getting the initial dataset organized and into a format that will be useful in a hackathon setting.
15 |
16 |
--------------------------------------------------------------------------------
/dataset/archive/archived-projects.py:
--------------------------------------------------------------------------------
1 | # Copyright Dawn M. Foster
2 | # SPDX-License-Identifier: MIT
3 |
4 | """This script collects data about the archived repositories on GitHub with the most
5 | stars / forks using the GitHub GraphQL Search API. Results with no license or
6 | license = Other are discarded to store only results for open source repositories.
7 |
8 | Inputs via command line arguments - see below or use -h for a list:
9 | * file containing a GitHub access token
10 | * threshold for stars (any project with more than that number of stars) - default to 1000
11 | * threshold for forks (any project with more than that number of forks) - default to 100
12 |
13 | As of May 22, 2025, using the default thresholds collects data on 1001 repositories,
14 | and stores 733 repositories after filtering by license as described above.
15 |
16 | Outputs:
17 | * GitHub API response code (success is "") - printed to the screen
18 | * csv file with the data stored as data-files/archive_repos.csv
19 | """
20 |
21 | import sys
22 | import csv
23 | import argparse
24 | import requests
25 | import json
26 |
27 | # Read arguments and store from command line
28 | parser = argparse.ArgumentParser()
29 |
30 | parser.add_argument("-t", "--token", dest="api_token_file", help="Filename of a file containing a GitHub Personal Access Token")
31 | parser.add_argument("-s", "-stars", dest="num_stars", help="Collect data on projects with more than this number of stars. Default is 1000",
32 | default=1000)
33 | parser.add_argument("-f", "--forks", dest="num_forks", help="Collect data on projects with more than this number of forks. Default is 100",
34 | default=100)
35 |
36 | args = parser.parse_args()
37 |
38 | api_token_file = args.api_token_file
39 | num_stars = args.num_stars
40 | num_forks = args.num_forks
41 |
42 | def make_query(after_cursor = None):
43 | """ This function contains the GraphQL query
44 | """
45 | return """
46 | query archived ($search_string: String!) {
47 | search(
48 | type:REPOSITORY,
49 | query:$search_string,
50 | first: 50 after:AFTER) {
51 | pageInfo {
52 | hasNextPage
53 | endCursor
54 | }
55 | repos: edges{
56 | repo:node{
57 | ... on Repository {
58 | url
59 | homepageUrl
60 | shortDescriptionHTML
61 | isFork
62 | isInOrganization
63 | createdAt
64 | updatedAt
65 | archivedAt
66 | stargazerCount
67 | forkCount
68 | primaryLanguage{
69 | name
70 | }
71 | latestRelease{
72 | publishedAt
73 | }
74 | licenseInfo{
75 | name
76 | }
77 | }
78 | }
79 | }
80 | }
81 | }""".replace(
82 | "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
83 | )
84 |
85 | # Read GitHub key from file
86 | try:
87 | with open(api_token_file, 'r') as kf:
88 | api_token = kf.readline().rstrip() # remove newline & trailing whitespace
89 |
90 | except:
91 | print("Error reading GH Key. This script depends on the existence of a file containing your GitHub API token. Exiting")
92 | sys.exit()
93 |
94 | # Set up the variables needed for the GraphQL query
95 | url = 'https://api.github.com/graphql'
96 | headers = {'Authorization': 'token %s' % api_token}
97 | search_string = "archived:True stars:>" + str(num_stars) + " forks:>" + str(num_forks)
98 |
99 | # Variable initialization
100 | results = []
101 | has_next_page = True
102 | after_cursor = None
103 |
104 | # Run the GraphQL query for each page of results from the API
105 | while has_next_page:
106 |
107 | query = make_query(after_cursor)
108 |
109 | variables = {"search_string": search_string}
110 |
111 | r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
112 | print(r) # This prints the response so that you can see if the query fails with an error
113 |
114 | json_data = json.loads(r.text)
115 |
116 | results.append(json_data)
117 |
118 | has_next_page = json_data['data']['search']["pageInfo"]["hasNextPage"]
119 |
120 | after_cursor = json_data['data']['search']["pageInfo"]["endCursor"]
121 |
122 | # Create csv output file
123 | with open("data-files/archive_repos.csv", "w", newline="") as f:
124 |
125 | w = csv.DictWriter(f, results[0]['data']['search']['repos'][0]['repo'].keys())
126 | w.writeheader()
127 |
128 | # Loops through the results list and writes the csv file by row.
129 | # This also unpacks the nested JSON structures to allow them to
130 | # be more readable in the csv file.
131 | for element in results:
132 |
133 | for repo_dict in element['data']['search']['repos']:
134 | try:
135 | repo_dict['repo']['latestRelease'] = repo_dict['repo']['latestRelease']['publishedAt']
136 | except:
137 | repo_dict['repo']['latestRelease'] = None
138 | try:
139 | repo_dict['repo']['primaryLanguage'] = repo_dict['repo']['primaryLanguage']['name']
140 | except:
141 | repo_dict['repo']['primaryLanguage'] = None
142 | try:
143 | repo_dict['repo']['licenseInfo'] = repo_dict['repo']['licenseInfo']['name']
144 | except:
145 | repo_dict['repo']['licenseInfo'] = None
146 |
147 | if repo_dict['repo']['licenseInfo'] != None and repo_dict['repo']['licenseInfo'] != "Other":
148 | # this only writes repos with a specified license into the csv file
149 | w.writerow(repo_dict['repo'])
--------------------------------------------------------------------------------
/dataset/foundation-stats/apacheURLtoTable.md:
--------------------------------------------------------------------------------
1 | # README: Apache URL to Table Data Processor
2 |
3 | ## **Overview**
4 | The `apache_url_to_table.py` script fetches, processes, and normalizes
5 | Apache project data from multiple sources. The goal is to create a
6 | structured dataset that allows researchers to analyze the transition of
7 | corporate projects into open-source foundations.
8 |
9 | ## **Features**
10 | - Pulls project data from **six different Apache Foundation sources**.
11 | - Normalizes data for consistency across all projects.
12 | - Saves structured data in **CSV and JSON** formats.
13 | - Includes an optional **force update** mode to fetch the latest data.
14 | - Prevents redundant downloads by using existing files when possible.
15 |
16 | ## **Data Sources**
17 | This script retrieves data from the following sources:
18 | - [Apache Projects Overview](https://projects.apache.org/)
19 | - [Apache Projects
20 | JSON](https://projects.apache.org/json/foundation/projects.json)
21 | - [Podlings
22 | JSON](https://projects.apache.org/json/foundation/podlings.json)
23 | - [Podlings History
24 | JSON](https://projects.apache.org/json/foundation/podlings-history.json)
25 | - [Retired Committees
26 | JSON](https://projects.apache.org/json/foundation/committees-retired.json)
27 | - [Repositories
28 | JSON](https://projects.apache.org/json/foundation/repositories.json)
29 |
30 | ## **Installation**
31 | Ensure Python is installed and install required dependencies:
32 | ```sh
33 | pip install pandas requests
34 | ```
35 |
36 | ## **How to Run the Script**
37 | ### **Using Existing Data (if Available)**
38 | If the script has been run before, it will use previously downloaded
39 | files:
40 | ```sh
41 | python apache_url_to_table.py
42 | ```
43 |
44 | ### **Forcing an Update (Fetching the Latest Data)**
45 | To ensure you get the most up-to-date project data, use:
46 | ```sh
47 | python apache_url_to_table.py --force-update
48 | ```
49 | - This will **overwrite existing files**.
50 | - The script will **prompt you for confirmation** before replacing data.
51 | - Type **'y'** and press **Enter** to proceed.
52 | - Press **Enter** without typing anything to cancel the update.
53 |
54 | ## **Output Files**
55 | - **`structured_project_analysis.csv`** → CSV file containing the
56 | processed project data.
57 | - **`structured_project_cleaned.json`** → JSON file containing the
58 | structured project data.
59 |
60 | ## **Confirming the Data for Research**
61 | This script provides **a foundational dataset** for analyzing how
62 | corporate-owned open-source projects evolve once moved into foundations.
63 | **You can confirm the script's success by:**
64 | 1. Checking if **`structured_project_analysis.csv`** contains structured
65 | data.
66 | 2. Opening **`structured_project_cleaned.json`** to see if all project
67 | details were captured.
68 | 3. Manually inspecting **Apache source URLs** in a browser to ensure they
69 | are still available.
70 |
71 | ## **What’s Missing & Next Steps**
72 | This dataset **still requires human research** to be fully complete.
73 | Recommended steps include:
74 |
75 | ### **1. Verifying Company Contributions**
76 | - Manually research which corporations originally contributed each
77 | project.
78 | - Compare corporate vs. community contributions over time.
79 |
80 | ### **2. Analyzing Governance Changes**
81 | - Investigate if project governance structures changed post-transition.
82 |
83 | ### **3. Improving Data Quality**
84 | - Identify missing or inconsistent project data.
85 | - Cross-reference project status with Apache’s live data.
86 |
87 | ---
88 | This script provides **a structured and repeatable method** for collecting
89 | across foundations, and should be revised with that intent. Have fun -
90 | this is an initial script and your edits could make you an author on an
91 | invaluable script for Open Source!
92 |
--------------------------------------------------------------------------------
/dataset/foundation-stats/apache_url_to_table.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 | import pandas as pd
4 | import requests
5 |
6 | # Step 1: Define script directory and output file names
7 | # This ensures all files are stored in the same location as the script.
8 | SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
9 | OUTPUT_CSV = os.path.join(SCRIPT_DIR, "structured_project_analysis.csv") # Clearer purpose
10 | OUTPUT_JSON = os.path.join(SCRIPT_DIR, "structured_project_cleaned.json") # More descriptive
11 |
12 | # Step 2: Define web scraping source URL
13 | # This is the source URL where we will fetch the latest project data.
14 | SOURCE_URL = "https://incubator.apache.org/projects.json" # Example data source
15 |
16 | # Step 3: Function to fetch data from the web
17 | def fetch_data_from_web(url):
18 | """
19 | Fetch JSON data from a specified web URL.
20 | Handles errors related to network failures and invalid JSON responses.
21 | """
22 | try:
23 | response = requests.get(url, timeout=10) # 10-second timeout
24 | response.raise_for_status() # Raise error for HTTP issues
25 | return response.json()
26 | except requests.exceptions.RequestException as e:
27 | print(f"Error fetching data from {url}: {e}")
28 | return []
29 | except json.JSONDecodeError:
30 | print("Error: Failed to decode JSON from web response.")
31 | return []
32 |
33 | # Step 4: Function to normalize and clean data
34 | def normalize_data(data):
35 | """
36 | Standardizes JSON structure to ensure consistent fields across all records.
37 | """
38 | structured_list = []
39 | for item in data:
40 | structured_list.append({
41 | "Project ID": item.get("id", "Unknown"),
42 | "Project Name": item.get("name", "Unknown"),
43 | "Category": item.get("category", "Unknown"),
44 | "Description": item.get("description", "Unknown"),
45 | "Homepage URL": item.get("homepage", "Unknown"),
46 | "PMC": item.get("pmc", "Unknown"),
47 | "Podling": item.get("podling", False),
48 | "Start Date": item.get("started", "Unknown"),
49 | "Status": item.get("status", "Unknown")
50 | })
51 | return structured_list
52 |
53 | # Step 5: Add user option to force update for most up-to-date data
54 | import argparse
55 | parser = argparse.ArgumentParser()
56 | parser.add_argument("--force-update", action="store_true", help="Force update by fetching the latest data from the web")
57 | args = parser.parse_args()
58 |
59 | # Step 6: Check if the output files already exist
60 | # If the files exist and --force-update is not set, we avoid re-downloading the data.
61 | if os.path.exists(OUTPUT_CSV) and os.path.exists(OUTPUT_JSON) and not args.force_update:
62 | print(f"Using existing files: {OUTPUT_CSV} and {OUTPUT_JSON}. To force an update, run with --force-update.")
63 | else:
64 | # Step 7: Confirm before overwriting existing files
65 | if args.force_update and os.path.exists(OUTPUT_CSV) and os.path.exists(OUTPUT_JSON):
66 | confirm = input("Warning: This will overwrite existing files. Do you want to proceed? (yes/no): ").strip().lower()
67 | if confirm != "yes":
68 | print("Update canceled. Using existing files.")
69 | exit()
70 |
71 | print("Fetching the most up-to-date data from the web.")
72 | # Fetch and process data
73 | data = fetch_data_from_web(SOURCE_URL)
74 | normalized_data = normalize_data(data)
75 |
76 | # Convert to Pandas DataFrame
77 | df = pd.DataFrame(normalized_data)
78 |
79 | # Save outputs
80 | df.to_csv(OUTPUT_CSV, index=False)
81 | with open(OUTPUT_JSON, "w", encoding="utf-8") as f:
82 | json.dump(normalized_data, f, indent=4)
83 |
84 | # Step 8: Notify user of successful completion
85 | print(f"Processing complete.\nSaved structured data to: {OUTPUT_CSV} and {OUTPUT_JSON}")
86 |
--------------------------------------------------------------------------------
/dataset/foundation-stats/dataset/foundation-stats/structured_project_analysis.csv:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/dataset/foundation-stats/dataset/foundation-stats/structured_project_cleaned.json:
--------------------------------------------------------------------------------
1 | []
--------------------------------------------------------------------------------
/dataset/license-changes/README.md:
--------------------------------------------------------------------------------
1 | # License Change Dataset
2 |
3 | The idea behind this dataset is to use it as a starting point to see if we can we predict the likelihood of a license change for an open source project from an open source license to a non-open source or more restrictive license. The project itself is being tracked in this [Issue](https://github.com/chaoss/wg-data-science/issues/47). Note that we have a related dataset for forks (see below).
4 |
5 | Because we want to do some analysis on the repositories, any project without a GitHub repository has been excluded from the dataset (see section below for more details).
6 |
7 | ## The Dataset
8 |
9 | [license_changes.csv](license_changes.csv)
10 |
11 | The dataset can be found in the license_changes.csv file. It contains the following fields:
12 | * project: project name
13 | * relicense_date: relicense date
14 | * orig_license: original license name
15 | * new_license: new license name or description
16 | * org: GitHub organization where the project can be found
17 | * repo: GitHub repository where the project can be found
18 | * license_file: license filename in the GitHub repo
19 |
20 | The starting point for this dataset came from this [Wikipedia List of formerly open source software](https://en.wikipedia.org/wiki/List_of_formerly_open-source_or_free_software) page. You can find this list converted to a csv file in this folder called [wikipedia_list.csv](wikipedia_list.csv). Because we want to analyze what happens in a repo both before and after a license change, projects where the repo couldn't be found or where there were other issues that made the data suspect were excluded from license_changes.csv. Those issues are documented in a file in this folder called [dataset_notes.md](dataset_notes.md). If you want to help improve the dataset in license_changes.csv, the dataset_notes.md file would be an excellent place to start.
21 |
22 |
23 | [more_forks.csv](more_forks.csv)
24 |
25 | This dataset includes some additional examples of forks that were not created due to relicensing but most often either a) the original got bought by an entity the community found suspicious and decided to fork, or b) the original became unmaintained and someone new picked it up. The fields of the CSV match the structure of the license_changes dataset above.
26 |
27 | ## Contributions
28 |
29 | This dataset is still a bit rough and incomplete, so contributions are welcome to help us improve it. See details above for ideas about where to contribute, and please follow our [Contribution guidelines](https://github.com/chaoss/wg-data-science/blob/main/CONTRIBUTING.md) including DCO sign-off for all commits.
30 |
31 | If you learn of new projects that have been relicensed or older ones that we've missed, please feel free to submit a PR against license_changes.csv with the data. If you don't have time to create the PR, please file an issue to let us know, and we can add it to the dataset.
32 |
33 | ## Next Steps
34 |
35 | This is a very basic dataset that is meant to be the starting point to create more robust data sets. By keeping license_changes.csv simple and limited to the basic information required to know where / when a license change took place, we can easily build on it in so many ways. In particular, it would be interesting to put this data into Augur and GrimoireLab for further study. We can also use the GitHub API to gather additional data as in the example below in the Additional Data section.
36 |
37 | ## Additional Data
38 |
39 | There are some additional files in this folder.
40 |
41 | * [generate-license-data.py](generate-license-data.py) is a script that was used to make it easier to find the date of the commit where the license change occurred. It also serves as an example of how you might use this dataset as a starting point to gather more data that can be used to learn about a license change.
42 | * [output.json](output.json) is an autogenerated file created by generate-license-data.py and should not be edited. This is the output that was used to learn more about each license change in license_changes.csv.
43 | * wikipedia_list.csv is a convenience file where the data from the Wikipedia page was stored that was used as a starting point. This isn't used anywhere else and should not be updated, since it just contains historical data. Any updates should be make in license_changes.csv.
44 |
45 | # Forks Dataset
46 |
47 | [forks.csv](forks.csv)
48 |
49 | We are just beginning work on a dataset containing forks of open source projects, so right now it is very incomplete, but contributions are welcome! Please follow our [Contribution guidelines](https://github.com/chaoss/wg-data-science/blob/main/CONTRIBUTING.md) including DCO sign-off for all commits. You can also file issues or let us know via other channels if you see a mistake and want to let us know, but don't plan to make the change yourself.
50 |
51 | Many recent forks are a result of license changes, so we are keeping the dataset here to make it easy for people to find.
52 |
53 | Category column definitions:
54 | * acquisition: primarily the result of one company acquiring another
55 | * relicense: relicensing of a project, usually to a more restrictive license
56 | * feature: this generally refers to issues with getting contributions (features) included in the original project and could be a result of disagreements, governance issues, or other community dynamics.
57 |
58 | We know that open source projects are complex, and many forks don't fit into a single category, so we've attempted to pick the primary category and add any additional details in the notes.
59 |
--------------------------------------------------------------------------------
/dataset/license-changes/dataset_notes.md:
--------------------------------------------------------------------------------
1 | # Data from Wikipedia page
2 | Source: https://en.wikipedia.org/wiki/List_of_formerly_open-source_or_free_software
3 |
4 | Excluded:
5 | * Couchbase Server,2010,2021,Apache-2.0,Business Source License - only found Apache license in https://github.com/couchbase/manifest
6 | * Couchbase Mobile,,2022,Apache-2.0,Business Source License - not sure where this project is - maybe https://github.com/couchbase/couchbase-lite-ios but that's Apache
7 | * Emby,2014,2018,GPL-2.0,"Source code closed on December 8, 2018" - source code no longer available
8 | * FBReader,2013,2015,GPL-2.0-or-later,"Apparently the number of devs was limited, and they all agreed to relicense it" - couldn't find license file - source code archived at https://github.com/geometer/FBReader
9 | * LiveJournal,1999,2014,GPL-2.0-or-later,The source code was made private in 2014 - couldn't find source code repo
10 | * Nexuiz,2005,2012,GPL-2.0-or-later,"Game abandoned in favour of a commercial video game of the same name, which licensed the Nexuiz title but is not based on its engine." - couldn't find source code repo
11 | * OctoberCMS,2014,2021,MIT,Cited the sustainability of its open source model as a factor. - couldn't find source code repo
12 | * Paint.NET,2004,2007,MIT,freeware license that prohibits modification or resale - couldn't find source code repo
13 | * PyMOL,2010,MIT-CMU,Custom,schrodinger,pymol-open-source,LICENSE - I couldn't find evidence that this was ever under an OSI license
14 | * Reddit,2008,2017,CPAL-1.0,"Source code was made private in 2017, as the internal codebase had already diverged significantly from the public one." - couldn't find source code repo
15 | * Sourcegraph,2013,2023,Apache-2.0,proprietary - - couldn't find source code repo
16 | * Tux Racer,2000,2002,GPL-2.0-or-later,"Commercial expansion by original authors, also called Tux Racer."
17 |
18 | # Other data
19 |
20 | Excluded:
21 | * MariaDB MaxScale is inder BSL1, but I couldn't find the repo or info about the license change: https://mariadb.com/projects-using-bsl-11/
22 |
23 |
24 | # Important Notes
25 |
26 | For the projects where the repo could not be found, you might be able to find a fork from someone else's account to analyze. I didn't attempt to find these. Also, I didn't spend much time looking - someone else should confirm these because I could have easily just missed them.
27 |
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/README.md:
--------------------------------------------------------------------------------
1 | This folder is where we can collaborate on a research report that contains several case studies of open source projects that resulted in hard forks after the project relicensed.
2 |
3 | * Elasticsearch -> OpenSearch
4 | * Redis -> Valkey
5 | * Terraform -> OpenTofu
6 |
7 | If you just want an overview of the results so far, here are some summaries:
8 | * [The New Stack: What Happens to Relicensed Open Source Projects and Their Forks?](https://thenewstack.io/what-happens-to-relicensed-open-source-projects-and-their-forks/)
9 | * [State of Open Con 7 minute keynote video](https://www.youtube.com/watch?v=rphZFv9QbV0&list=PL0U2cL1JGPZdJTUooEjFMb_djIzreUxGM&index=4)
10 | * Additional presentations: [FOSDEM Panel](https://fosdem.org/2025/schedule/event/fosdem-2025-5258-forked-communities-project-re-licensing-and-community-impact/), [State of Open Con Panel](https://www.youtube.com/watch?v=DSTiQil10GQ&list=PL0U2cL1JGPZdfn4ODuMVouXMh9lDsiPGh&index=4), [OpenUK Meetup](https://www.youtube.com/watch?v=wliDVF3FpI0)
11 | * [Academic paper](https://github.com/chaoss/wg-data-science/tree/main/publications) with the results that were presented at the OpenForum Academy Symposium in November 2024.
12 |
13 | The current WIP draft of the research report can be found in this [Google doc for the Report](https://docs.google.com/document/d/1sYlUn9UsY7ynmzc3MVJTtktNgaLFQDOZ8W9fhYarWNo/edit). At this point, it's mostly an outline that still needs a lot of work.
14 |
15 | The [notebooks](notebooks) folder contains basic analysis of the organizational affiliation data for the contributors per open source project.
16 |
17 | The [data-files](data-files) folder contains data files (pickled) for each project with commits for a specific time period being studied (e.g., 1 year before the relicense).
18 |
19 | We still have a lot of work to do. Here are a few next steps that people can begin working on:
20 | * **Writing**: Write more of the Introduction and Context sections for the report (see doc above) - No data science experience required, and the links in the "Helpful articles" section at the top of the doc should help someone get started with this work. Much of this can probably be taken from the [OFA paper](https://docs.google.com/document/d/1hdLqLhQjPGwOpwMgH5dpTFMioSTRRZEGdQ5-lEZ9o_Q/edit?usp=sharing) as a start, but it will need to be heavily edited so that it isn't in an academic style, since the final output will be a report that is more in the style of an LF report (see the report Google doc above for more on style).
21 | * **Collect Data & Metrics Analysis**: Select several project health metrics from the CHAOSS project (ideally based on research) that might be used to answer some of the research questions listed in the doc. Ideally, these metrics should be implemented in Augur / 8Knot and / or GrimoireLab so that we can use CHAOSS projects for the visualizations. Several people could work on this at the same time. We have a start on this that can be found in the Appendix / Notes section of the report doc, but it still needs quite a bit of work.
22 | * **Validation**: Validate the data for the 6 projects by talking to people who are directly involved in those projects. @geekygirldawn has started this work and contacted people from the projects.
23 |
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/commits_people.py:
--------------------------------------------------------------------------------
1 | # Copyright Dawn M. Foster
2 | # SPDX-License-Identifier: MIT
3 |
4 | """Gets Commit Data
5 | This is aggregated per person for a repo between two specified dates.
6 | I'm currently using this to better understand who contributes to a project
7 | before and after a key time in the project (relicense / fork) with a focus on
8 | understanding organizational diversity.
9 |
10 | Output (files are stored in the data-files directory)
11 | * GitHub API response code (should be "")
12 | * Commit data pickle file containing a dataframe
13 | * Person pickle file containing a dictionary
14 | """
15 |
16 | import sys
17 | import pandas as pd
18 | import argparse
19 | import requests
20 | import json
21 |
22 | # Read arguments from command line
23 | parser = argparse.ArgumentParser()
24 |
25 | parser.add_argument("-t", "--token", dest = "gh_key", help="GitHub Personal Access Token")
26 | parser.add_argument("-u", "--url", dest = "gh_url", help="URL for a GitHub repository")
27 | parser.add_argument("-b", "--begin_date", dest = "begin_date", help="Date in the format YYYY-MM-DD - gather commits after this begin date")
28 | parser.add_argument("-e", "--end_date", dest = "end_date", help="Date in the format YYYY-MM-DD - gather commits up until this end date")
29 |
30 | args = parser.parse_args()
31 |
32 | gh_url = args.gh_url
33 | gh_key = args.gh_key
34 | since_date = args.begin_date + "T00:00:00.000+00:00"
35 | until_date = args.end_date + "T00:00:00.000+00:00"
36 |
37 | url_parts = gh_url.strip('/').split('/')
38 | org_name = url_parts[3]
39 | repo_name = url_parts[4]
40 |
41 | # Read GitHub key from file
42 | try:
43 | with open(gh_key, 'r') as kf:
44 | api_token = kf.readline().rstrip() # remove newline & trailing whitespace
45 |
46 | except:
47 | print("Error reading GH Key. This script depends on the existence of a file called gh_key containing your GitHub API token. Exiting")
48 | sys.exit()
49 |
50 | pickle_file = 'data-files/' + repo_name + str(since_date) + str(until_date) + '.pkl'
51 |
52 | def make_query(after_cursor = None):
53 | return """query repo_commits($org_name: String!, $repo_name: String!, $since_date: GitTimestamp!, $until_date: GitTimestamp!){
54 | repository(owner: $org_name, name: $repo_name) {
55 | ... on Repository{
56 | defaultBranchRef{
57 | target{
58 | ... on Commit{
59 | history(since: $since_date, until: $until_date, first: 100 after: AFTER){
60 | pageInfo {
61 | hasNextPage
62 | endCursor
63 | }
64 | edges{
65 | node{
66 | ... on Commit{
67 | committedDate
68 | deletions
69 | additions
70 | oid
71 | authors(first:100) {
72 | nodes {
73 | date
74 | email
75 | user {
76 | login
77 | company
78 | email
79 | name
80 | }
81 | }
82 | }
83 | }
84 | }
85 | }
86 | }
87 | }
88 | }
89 | }
90 | }
91 | }
92 | }""".replace(
93 | "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
94 | )
95 |
96 | def get_data(api_token, org_name, repo_name, since_date, until_date):
97 | """Executes the GraphQL query to get data from one GitHub repo.
98 |
99 | Returns
100 | -------
101 | repo_info_df : pandas.core.frame.DataFrame
102 | """
103 |
104 | url = 'https://api.github.com/graphql'
105 | headers = {'Authorization': 'token %s' % api_token}
106 |
107 | repo_info_df = pd.DataFrame()
108 |
109 | has_next_page = True
110 | after_cursor = None
111 |
112 | while has_next_page:
113 |
114 | query = make_query(after_cursor)
115 |
116 | variables = {"org_name": org_name, "repo_name": repo_name, "since_date": since_date, "until_date": until_date}
117 | r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
118 | print(r)
119 | json_data = json.loads(r.text)
120 |
121 | df_temp = pd.DataFrame(json_data['data']['repository']['defaultBranchRef']['target']['history']['edges'])
122 |
123 | repo_info_df = repo_info_df.append(df_temp, ignore_index=True)
124 |
125 | has_next_page = json_data['data']['repository']['defaultBranchRef']['target']['history']["pageInfo"]["hasNextPage"]
126 |
127 | after_cursor = json_data['data']['repository']['defaultBranchRef']['target']['history']["pageInfo"]["endCursor"]
128 |
129 | return repo_info_df
130 |
131 | repo_info_df = get_data(api_token, org_name, repo_name, since_date, until_date)
132 |
133 | def expand_commits(commits):
134 | if pd.isnull(commits):
135 | commits_list = [None, None, None, None, None]
136 | else:
137 | node = commits
138 | try:
139 | commit_date = node['committedDate']
140 | except:
141 | commit_date = None
142 | try:
143 | dels = node['deletions']
144 | except:
145 | dels = None
146 | try:
147 | adds = node['additions']
148 | except:
149 | adds = None
150 | try:
151 | oid = node['oid']
152 | except:
153 | oid = None
154 | try:
155 | author = node['authors']['nodes']
156 | except:
157 | author = None
158 | commits_list = [commit_date, dels, adds, oid, author]
159 | return commits_list
160 |
161 | repo_info_df['commits_list'] = repo_info_df['node'].apply(expand_commits)
162 | repo_info_df[['commit_date','deletions', 'additions','oid','author']] = pd.DataFrame(repo_info_df.commits_list.tolist(), index= repo_info_df.index)
163 | #repo_info_df = repo_info_df.drop(columns=['commits_list'])
164 | repo_info_df
165 | repo_info_df.to_pickle(pickle_file)
166 |
167 | def create_person_dict(pickle_file, repo_name, since_date, until_date):
168 | import collections
169 | import pickle
170 |
171 | repo_info_df = pd.read_pickle(pickle_file)
172 |
173 | output_pickle = 'data-files/' + repo_name + '_people_' + str(since_date) + str(until_date) + '.pkl'
174 |
175 | # Create a dictionary for each person with the key being the gh login
176 | # Create a dict for commits that aren't tied to a gh login (gh user = None)
177 | person_dict=collections.defaultdict(dict)
178 | fail_person_dict=collections.defaultdict(dict)
179 |
180 | for x in repo_info_df.iterrows():
181 | data = x[1]
182 |
183 | for y in data['author']:
184 | try:
185 | login = y['user']['login']
186 | company = y['user']['company']
187 | commit_email = y['email']
188 | login_email = y['user']['email']
189 | name = y['user']['name']
190 |
191 | if person_dict[login]:
192 | person_dict[login]['commits'] = person_dict[login]['commits'] + 1
193 | person_dict[login]['additions'] = person_dict[login]['additions'] + data['additions']
194 | person_dict[login]['deletions'] = person_dict[login]['deletions'] + data['deletions']
195 | if commit_email not in person_dict[login]['email']:
196 | person_dict[login]['email'].append(commit_email)
197 | else:
198 | person_dict[login]['company'] = company
199 | person_dict[login]['name'] = name
200 | person_dict[login]['commits'] = 1
201 | person_dict[login]['additions'] = data['additions']
202 | person_dict[login]['deletions'] = data['deletions']
203 | if len(login_email) == 0:
204 | person_dict[login]['email'] = [commit_email]
205 | elif commit_email == login_email:
206 | person_dict[login]['email'] = [commit_email]
207 | else:
208 | person_dict[login]['email'] = [commit_email,login_email]
209 | except:
210 | try:
211 | if fail_person_dict[commit_email]:
212 | fail_person_dict[commit_email]['commits'] = fail_person_dict[commit_email]['commits'] + 1
213 | fail_person_dict[commit_email]['additions'] = fail_person_dict[commit_email]['additions'] + data['additions']
214 | fail_person_dict[commit_email]['deletions'] = fail_person_dict[commit_email]['deletions'] + data['deletions']
215 | else:
216 | fail_person_dict[commit_email]['commits'] = 1
217 | fail_person_dict[commit_email]['additions'] = data['additions']
218 | fail_person_dict[commit_email]['deletions'] = data['deletions']
219 | except:
220 | print("Unknown Exception on", y)
221 |
222 | # For every email that didn't have a GH login / user, search for that email in the
223 | # person_dict and if found, add the commits, additions, and deletions to the proper user
224 | # Print error message if not found (above items for testing of that case)
225 | for f_key, f_value in fail_person_dict.items():
226 | found = False
227 | for key, value in person_dict.items():
228 | if f_key in value['email']:
229 | person_dict[key]['commits'] = person_dict[key]['commits'] + f_value['commits']
230 | person_dict[key]['additions'] = person_dict[key]['additions'] + f_value['additions']
231 | person_dict[key]['deletions'] = person_dict[key]['deletions'] + f_value['deletions']
232 | found = True
233 | if found == False:
234 | print('Not found - no person with this email',f_key,f_value)
235 |
236 | with open(output_pickle, 'wb') as f:
237 | pickle.dump(person_dict, f)
238 |
239 | print('Commit data stored in', pickle_file)
240 | print('People Dictionary stored in', output_pickle)
241 |
242 | create_person_dict(pickle_file, repo_name, since_date, until_date)
243 |
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch2021-04-12T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch2021-04-12T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch2021-04-12T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch2021-04-12T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch2023-08-01T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch2023-08-01T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch2023-09-16T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch2023-09-16T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch2024-09-16T00:00:00.000+00:002025-03-16T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch2024-09-16T00:00:00.000+00:002025-03-16T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2021-04-12T00:00:00.000+00:002022-04-12T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2021-04-12T00:00:00.000+00:002022-04-12T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2021-04-12T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2021-04-12T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2021-04-12T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2021-04-12T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2023-08-01T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2023-08-01T00:00:00.000+00:002024-08-01T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2023-09-16T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2023-09-16T00:00:00.000+00:002024-09-16T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2024-09-16T00:00:00.000+00:002025-03-16T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/OpenSearch_people_2024-09-16T00:00:00.000+00:002025-03-16T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch2019-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch2019-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch2020-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch2020-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch2023-08-29T00:00:00.000+00:002024-08-29T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch2023-08-29T00:00:00.000+00:002024-08-29T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch2024-08-29T00:00:00.000+00:002025-02-29T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch2024-08-29T00:00:00.000+00:002025-02-29T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2019-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2019-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2020-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2020-02-03T00:00:00.000+00:002021-02-03T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2021-02-03T00:00:00.000+00:002022-02-03T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2021-02-03T00:00:00.000+00:002022-02-03T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2023-08-29T00:00:00.000+00:002024-08-29T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2023-08-29T00:00:00.000+00:002024-08-29T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2024-08-29T00:00:00.000+00:002025-02-29T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/elasticsearch_people_2024-08-29T00:00:00.000+00:002025-02-29T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/opentofu2023-09-05T00:00:00.000+00:002024-09-05T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/opentofu2023-09-05T00:00:00.000+00:002024-09-05T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/opentofu_people_2023-09-05T00:00:00.000+00:002024-09-05T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/opentofu_people_2023-09-05T00:00:00.000+00:002024-09-05T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis2022-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis2022-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis2023-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis2023-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis2024-02-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis2024-02-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis2024-03-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis2024-03-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis2024-03-20T00:00:00.000+00:002024-09-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis2024-03-20T00:00:00.000+00:002024-09-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis2024-03-20T00:00:00.000+00:002025-03-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis2024-03-20T00:00:00.000+00:002025-03-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis_people_2022-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis_people_2022-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis_people_2023-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis_people_2023-03-20T00:00:00.000+00:002024-03-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis_people_2024-03-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis_people_2024-03-20T00:00:00.000+00:002024-08-21T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis_people_2024-03-20T00:00:00.000+00:002024-09-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis_people_2024-03-20T00:00:00.000+00:002024-09-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/redis_people_2024-03-20T00:00:00.000+00:002025-03-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/redis_people_2024-03-20T00:00:00.000+00:002025-03-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/stars-forks/opentofu-forks.csv:
--------------------------------------------------------------------------------
1 | yr,mon,d,forks
2 | 2023,9,20,46
3 | 2023,9,21,64
4 | 2023,9,22,56
5 | 2023,9,23,30
6 | 2023,9,24,14
7 | 2023,9,25,12
8 | 2023,9,26,9
9 | 2023,9,27,9
10 | 2023,9,28,5
11 | 2023,9,29,3
12 | 2023,9,30,4
13 | 2023,10,1,3
14 | 2023,10,2,3
15 | 2023,10,3,1
16 | 2023,10,4,12
17 | 2023,10,5,4
18 | 2023,10,6,7
19 | 2023,10,7,1
20 | 2023,10,8,2
21 | 2023,10,9,3
22 | 2023,10,10,4
23 | 2023,10,11,4
24 | 2023,10,12,7
25 | 2023,10,13,6
26 | 2023,10,14,1
27 | 2023,10,15,3
28 | 2023,10,16,2
29 | 2023,10,17,2
30 | 2023,10,18,5
31 | 2023,10,19,2
32 | 2023,10,20,4
33 | 2023,10,21,3
34 | 2023,10,23,2
35 | 2023,10,24,1
36 | 2023,10,25,6
37 | 2023,10,26,3
38 | 2023,10,27,3
39 | 2023,10,28,1
40 | 2023,10,29,2
41 | 2023,10,31,4
42 | 2023,11,1,1
43 | 2023,11,3,1
44 | 2023,11,4,1
45 | 2023,11,5,1
46 | 2023,11,6,2
47 | 2023,11,7,2
48 | 2023,11,8,1
49 | 2023,11,9,3
50 | 2023,11,10,4
51 | 2023,11,12,1
52 | 2023,11,13,3
53 | 2023,11,14,3
54 | 2023,11,16,2
55 | 2023,11,17,2
56 | 2023,11,20,2
57 | 2023,11,24,1
58 | 2023,11,25,1
59 | 2023,11,27,2
60 | 2023,11,28,3
61 | 2023,11,29,2
62 | 2023,11,30,1
63 | 2023,12,1,1
64 | 2023,12,7,1
65 | 2023,12,8,1
66 | 2023,12,9,1
67 | 2023,12,10,1
68 | 2023,12,13,2
69 | 2023,12,14,1
70 | 2023,12,15,3
71 | 2023,12,17,1
72 | 2023,12,18,3
73 | 2023,12,19,1
74 | 2023,12,20,1
75 | 2023,12,21,1
76 | 2023,12,22,1
77 | 2023,12,23,1
78 | 2023,12,24,1
79 | 2023,12,25,1
80 | 2023,12,27,2
81 | 2023,12,28,5
82 | 2023,12,31,1
83 | 2024,1,1,1
84 | 2024,1,2,2
85 | 2024,1,3,1
86 | 2024,1,4,2
87 | 2024,1,6,1
88 | 2024,1,7,1
89 | 2024,1,9,1
90 | 2024,1,10,2
91 | 2024,1,11,3
92 | 2024,1,12,4
93 | 2024,1,13,2
94 | 2024,1,14,4
95 | 2024,1,15,3
96 | 2024,1,16,2
97 | 2024,1,17,1
98 | 2024,1,19,2
99 | 2024,1,20,2
100 | 2024,1,21,1
101 | 2024,1,22,2
102 | 2024,1,23,2
103 | 2024,1,24,6
104 | 2024,1,25,2
105 | 2024,1,26,2
106 | 2024,1,27,1
107 | 2024,1,28,2
108 | 2024,1,29,4
109 | 2024,1,30,2
110 | 2024,1,31,1
111 | 2024,2,1,2
112 | 2024,2,2,2
113 | 2024,2,4,2
114 | 2024,2,5,3
115 | 2024,2,6,3
116 | 2024,2,8,3
117 | 2024,2,9,3
118 | 2024,2,10,1
119 | 2024,2,11,1
120 | 2024,2,12,1
121 | 2024,2,13,2
122 | 2024,2,14,1
123 | 2024,2,15,1
124 | 2024,2,16,1
125 | 2024,2,19,2
126 | 2024,2,20,3
127 | 2024,2,22,1
128 | 2024,2,24,2
129 | 2024,2,25,2
130 | 2024,2,26,3
131 | 2024,2,27,1
132 | 2024,2,29,1
133 | 2024,3,2,2
134 | 2024,3,3,1
135 | 2024,3,4,2
136 | 2024,3,5,1
137 | 2024,3,6,1
138 | 2024,3,7,1
139 | 2024,3,10,1
140 | 2024,3,12,2
141 | 2024,3,13,1
142 | 2024,3,14,1
143 | 2024,3,15,2
144 | 2024,3,16,2
145 | 2024,3,17,1
146 | 2024,3,18,1
147 | 2024,3,19,2
148 | 2024,3,22,1
149 | 2024,3,23,1
150 | 2024,3,24,1
151 | 2024,3,26,2
152 | 2024,3,27,2
153 | 2024,3,28,2
154 | 2024,3,29,1
155 | 2024,4,1,1
156 | 2024,4,2,1
157 | 2024,4,3,2
158 | 2024,4,4,3
159 | 2024,4,5,1
160 | 2024,4,6,3
161 | 2024,4,8,1
162 | 2024,4,10,1
163 | 2024,4,11,3
164 | 2024,4,12,1
165 | 2024,4,13,4
166 | 2024,4,14,3
167 | 2024,4,15,2
168 | 2024,4,16,2
169 | 2024,4,17,1
170 | 2024,4,18,2
171 | 2024,4,19,1
172 | 2024,4,20,2
173 | 2024,4,21,1
174 | 2024,4,22,2
175 | 2024,4,23,2
176 | 2024,4,24,2
177 | 2024,4,25,3
178 | 2024,4,26,9
179 | 2024,4,27,3
180 | 2024,4,28,1
181 | 2024,4,29,2
182 | 2024,5,1,3
183 | 2024,5,3,5
184 | 2024,5,5,2
185 | 2024,5,7,2
186 | 2024,5,8,1
187 | 2024,5,9,3
188 | 2024,5,10,2
189 | 2024,5,11,2
190 | 2024,5,12,1
191 | 2024,5,14,1
192 | 2024,5,15,1
193 | 2024,5,16,2
194 | 2024,5,17,1
195 | 2024,5,18,5
196 | 2024,5,20,2
197 | 2024,5,21,1
198 | 2024,5,22,2
199 | 2024,5,24,1
200 | 2024,5,25,1
201 | 2024,5,26,2
202 | 2024,5,27,1
203 | 2024,5,28,1
204 | 2024,5,29,2
205 | 2024,5,31,1
206 | 2024,6,1,2
207 | 2024,6,2,2
208 | 2024,6,3,2
209 | 2024,6,4,2
210 | 2024,6,5,1
211 | 2024,6,6,2
212 | 2024,6,10,2
213 | 2024,6,12,2
214 | 2024,6,14,3
215 | 2024,6,17,1
216 | 2024,6,18,3
217 | 2024,6,20,2
218 | 2024,6,23,2
219 | 2024,6,24,2
220 | 2024,6,26,1
221 | 2024,6,27,1
222 | 2024,6,28,1
223 | 2024,6,30,1
224 | 2024,7,1,1
225 | 2024,7,3,1
226 | 2024,7,4,1
227 | 2024,7,5,1
228 | 2024,7,12,2
229 | 2024,7,15,1
230 | 2024,7,16,1
231 | 2024,7,17,2
232 | 2024,7,18,1
233 | 2024,7,19,1
234 | 2024,7,22,2
235 | 2024,7,23,3
236 | 2024,7,24,2
237 | 2024,7,25,1
238 | 2024,7,27,3
239 | 2024,7,29,2
240 | 2024,7,30,2
241 | 2024,7,31,2
242 | 2024,8,1,1
243 | 2024,8,3,1
244 | 2024,8,4,2
245 | 2024,8,5,3
246 | 2024,8,8,1
247 | 2024,8,12,2
248 | 2024,8,13,2
249 | 2024,8,14,1
250 | 2024,8,16,2
251 | 2024,8,20,2
252 | 2024,8,22,1
253 | 2024,8,23,1
254 | 2024,8,24,1
255 | 2024,8,26,1
256 | 2024,8,27,3
257 | 2024,8,28,2
258 | 2024,8,29,2
259 | 2024,8,30,2
260 | 2024,9,1,1
261 | 2024,9,2,1
262 | 2024,9,3,2
263 | 2024,9,4,1
264 | 2024,9,5,1
265 | 2024,9,7,2
266 | 2024,9,9,2
267 | 2024,9,10,1
268 | 2024,9,12,1
269 | 2024,9,13,1
270 | 2024,9,14,1
271 | 2024,9,18,3
272 | 2024,9,19,2
273 | 2024,9,20,1
274 | 2024,9,21,2
275 | 2024,9,22,2
276 | 2024,9,24,2
277 | 2024,9,26,2
278 | 2024,9,27,3
279 | 2024,9,28,1
280 | 2024,9,29,1
281 | 2024,9,30,2
282 | 2024,10,1,1
283 | 2024,10,2,2
284 | 2024,10,3,1
285 | 2024,10,4,1
286 | 2024,10,5,1
287 | 2024,10,6,1
288 | 2024,10,10,3
289 | 2024,10,12,2
290 | 2024,10,16,3
291 | 2024,10,18,1
292 | 2024,10,21,4
293 | 2024,10,23,1
294 | 2024,10,25,1
295 | 2024,10,26,2
296 | 2024,10,28,3
297 | 2024,10,29,2
298 | 2024,10,30,1
299 | 2024,10,31,2
300 | 2024,11,3,1
301 | 2024,11,4,3
302 | 2024,11,5,4
303 | 2024,11,7,1
304 | 2024,11,9,1
305 | 2024,11,12,1
306 | 2024,11,14,1
307 | 2024,11,15,1
308 | 2024,11,18,1
309 | 2024,11,20,1
310 | 2024,11,24,1
311 | 2024,11,25,1
312 | 2024,11,26,2
313 | 2024,11,27,1
314 | 2024,11,29,3
315 | 2024,12,1,1
316 | 2024,12,2,2
317 | 2024,12,3,1
318 | 2024,12,4,2
319 | 2024,12,6,3
320 | 2024,12,11,1
321 | 2024,12,12,2
322 | 2024,12,16,1
323 | 2024,12,17,3
324 | 2024,12,18,2
325 | 2024,12,21,1
326 | 2024,12,23,1
327 | 2024,12,27,1
328 | 2024,12,28,1
329 | 2025,1,3,1
330 | 2025,1,4,2
331 | 2025,1,8,1
332 | 2025,1,9,1
333 | 2025,1,14,1
334 | 2025,1,16,1
335 | 2025,1,17,1
336 | 2025,1,21,1
337 | 2025,1,22,2
338 | 2025,1,24,1
339 | 2025,1,25,1
340 | 2025,1,26,1
341 | 2025,1,27,1
342 | 2025,1,29,1
343 | 2025,1,30,1
344 | 2025,1,31,1
345 | 2025,2,1,1
346 | 2025,2,5,1
347 | 2025,2,8,1
348 | 2025,2,9,1
349 | 2025,2,12,1
350 | 2025,2,13,1
351 | 2025,2,15,1
352 | 2025,2,25,1
353 | 2025,2,26,1
354 | 2025,3,1,1
355 | 2025,3,2,1
356 | 2025,3,4,1
357 | 2025,3,6,2
358 | 2025,3,9,1
359 | 2025,3,10,1
360 | 2025,3,13,1
361 | 2025,3,14,1
362 | 2025,3,16,1
363 | 2025,3,17,1
364 | 2025,3,18,1
365 | 2025,3,20,1
366 | 2025,3,22,1
367 | 2025,3,24,1
368 | 2025,3,25,3
369 | 2025,3,26,4
370 | 2025,3,29,1
371 | 2025,3,30,2
372 | 2025,4,1,2
373 | 2025,4,2,1
374 |
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/stars-forks/opentofu-stars.csv:
--------------------------------------------------------------------------------
1 | yr,mon,d,stars
2 | 2023,9,20,1134
3 | 2023,9,21,1558
4 | 2023,9,22,1770
5 | 2023,9,23,872
6 | 2023,9,24,378
7 | 2023,9,25,344
8 | 2023,9,26,258
9 | 2023,9,27,160
10 | 2023,9,28,153
11 | 2023,9,29,128
12 | 2023,9,30,71
13 | 2023,10,1,53
14 | 2023,10,2,92
15 | 2023,10,3,85
16 | 2023,10,4,101
17 | 2023,10,5,105
18 | 2023,10,6,100
19 | 2023,10,7,51
20 | 2023,10,8,68
21 | 2023,10,9,85
22 | 2023,10,10,67
23 | 2023,10,11,73
24 | 2023,10,12,65
25 | 2023,10,13,57
26 | 2023,10,14,28
27 | 2023,10,15,43
28 | 2023,10,16,83
29 | 2023,10,17,73
30 | 2023,10,18,65
31 | 2023,10,19,68
32 | 2023,10,20,44
33 | 2023,10,21,20
34 | 2023,10,22,32
35 | 2023,10,23,48
36 | 2023,10,24,43
37 | 2023,10,25,53
38 | 2023,10,26,64
39 | 2023,10,27,74
40 | 2023,10,28,38
41 | 2023,10,29,20
42 | 2023,10,30,36
43 | 2023,10,31,27
44 | 2023,11,1,44
45 | 2023,11,2,45
46 | 2023,11,3,33
47 | 2023,11,4,15
48 | 2023,11,5,27
49 | 2023,11,6,33
50 | 2023,11,7,38
51 | 2023,11,8,33
52 | 2023,11,9,42
53 | 2023,11,10,13
54 | 2023,11,11,18
55 | 2023,11,12,21
56 | 2023,11,13,32
57 | 2023,11,14,27
58 | 2023,11,15,29
59 | 2023,11,16,21
60 | 2023,11,17,28
61 | 2023,11,18,13
62 | 2023,11,19,24
63 | 2023,11,20,25
64 | 2023,11,21,19
65 | 2023,11,22,25
66 | 2023,11,23,22
67 | 2023,11,24,19
68 | 2023,11,25,8
69 | 2023,11,26,17
70 | 2023,11,27,15
71 | 2023,11,28,18
72 | 2023,11,29,32
73 | 2023,11,30,15
74 | 2023,12,1,11
75 | 2023,12,2,11
76 | 2023,12,3,12
77 | 2023,12,4,10
78 | 2023,12,5,21
79 | 2023,12,6,12
80 | 2023,12,7,24
81 | 2023,12,8,19
82 | 2023,12,9,37
83 | 2023,12,10,23
84 | 2023,12,11,24
85 | 2023,12,12,27
86 | 2023,12,13,39
87 | 2023,12,14,21
88 | 2023,12,15,23
89 | 2023,12,16,25
90 | 2023,12,17,21
91 | 2023,12,18,35
92 | 2023,12,19,37
93 | 2023,12,20,25
94 | 2023,12,21,30
95 | 2023,12,22,21
96 | 2023,12,23,15
97 | 2023,12,24,6
98 | 2023,12,25,9
99 | 2023,12,26,24
100 | 2023,12,27,16
101 | 2023,12,28,14
102 | 2023,12,29,9
103 | 2023,12,30,8
104 | 2023,12,31,10
105 | 2024,1,1,9
106 | 2024,1,2,20
107 | 2024,1,3,16
108 | 2024,1,4,18
109 | 2024,1,5,23
110 | 2024,1,6,13
111 | 2024,1,7,7
112 | 2024,1,8,20
113 | 2024,1,9,31
114 | 2024,1,10,100
115 | 2024,1,11,141
116 | 2024,1,12,130
117 | 2024,1,13,63
118 | 2024,1,14,37
119 | 2024,1,15,45
120 | 2024,1,16,38
121 | 2024,1,17,57
122 | 2024,1,18,48
123 | 2024,1,19,34
124 | 2024,1,20,20
125 | 2024,1,21,31
126 | 2024,1,22,65
127 | 2024,1,23,40
128 | 2024,1,24,41
129 | 2024,1,25,30
130 | 2024,1,26,28
131 | 2024,1,27,14
132 | 2024,1,28,21
133 | 2024,1,29,25
134 | 2024,1,30,28
135 | 2024,1,31,32
136 | 2024,2,1,22
137 | 2024,2,2,24
138 | 2024,2,3,17
139 | 2024,2,4,10
140 | 2024,2,5,19
141 | 2024,2,6,29
142 | 2024,2,7,25
143 | 2024,2,8,18
144 | 2024,2,9,28
145 | 2024,2,10,20
146 | 2024,2,11,18
147 | 2024,2,12,20
148 | 2024,2,13,23
149 | 2024,2,14,21
150 | 2024,2,15,26
151 | 2024,2,16,23
152 | 2024,2,17,14
153 | 2024,2,18,12
154 | 2024,2,19,15
155 | 2024,2,20,23
156 | 2024,2,21,21
157 | 2024,2,22,22
158 | 2024,2,23,13
159 | 2024,2,24,12
160 | 2024,2,25,4
161 | 2024,2,26,18
162 | 2024,2,27,19
163 | 2024,2,28,14
164 | 2024,2,29,14
165 | 2024,3,1,15
166 | 2024,3,2,10
167 | 2024,3,3,15
168 | 2024,3,4,21
169 | 2024,3,5,17
170 | 2024,3,6,19
171 | 2024,3,7,11
172 | 2024,3,8,12
173 | 2024,3,9,11
174 | 2024,3,10,4
175 | 2024,3,11,16
176 | 2024,3,12,21
177 | 2024,3,13,12
178 | 2024,3,14,27
179 | 2024,3,15,32
180 | 2024,3,16,23
181 | 2024,3,17,23
182 | 2024,3,18,25
183 | 2024,3,19,22
184 | 2024,3,20,23
185 | 2024,3,21,23
186 | 2024,3,22,33
187 | 2024,3,23,19
188 | 2024,3,24,13
189 | 2024,3,25,18
190 | 2024,3,26,15
191 | 2024,3,27,26
192 | 2024,3,28,20
193 | 2024,3,29,24
194 | 2024,3,30,17
195 | 2024,3,31,12
196 | 2024,4,1,30
197 | 2024,4,2,23
198 | 2024,4,3,25
199 | 2024,4,4,56
200 | 2024,4,5,39
201 | 2024,4,6,40
202 | 2024,4,7,30
203 | 2024,4,8,23
204 | 2024,4,9,25
205 | 2024,4,10,16
206 | 2024,4,11,54
207 | 2024,4,12,54
208 | 2024,4,13,25
209 | 2024,4,14,21
210 | 2024,4,15,41
211 | 2024,4,16,32
212 | 2024,4,17,29
213 | 2024,4,18,31
214 | 2024,4,19,21
215 | 2024,4,20,11
216 | 2024,4,21,10
217 | 2024,4,22,17
218 | 2024,4,23,22
219 | 2024,4,24,50
220 | 2024,4,25,115
221 | 2024,4,26,151
222 | 2024,4,27,53
223 | 2024,4,28,46
224 | 2024,4,29,47
225 | 2024,4,30,64
226 | 2024,5,1,69
227 | 2024,5,2,57
228 | 2024,5,3,38
229 | 2024,5,4,20
230 | 2024,5,5,19
231 | 2024,5,6,22
232 | 2024,5,7,24
233 | 2024,5,8,30
234 | 2024,5,9,24
235 | 2024,5,10,22
236 | 2024,5,11,14
237 | 2024,5,12,13
238 | 2024,5,13,28
239 | 2024,5,14,27
240 | 2024,5,15,57
241 | 2024,5,16,35
242 | 2024,5,17,34
243 | 2024,5,18,18
244 | 2024,5,19,13
245 | 2024,5,20,21
246 | 2024,5,21,30
247 | 2024,5,22,20
248 | 2024,5,23,19
249 | 2024,5,24,21
250 | 2024,5,25,11
251 | 2024,5,26,6
252 | 2024,5,27,18
253 | 2024,5,28,16
254 | 2024,5,29,20
255 | 2024,5,30,24
256 | 2024,5,31,20
257 | 2024,6,1,13
258 | 2024,6,2,14
259 | 2024,6,3,17
260 | 2024,6,4,16
261 | 2024,6,5,19
262 | 2024,6,6,14
263 | 2024,6,7,15
264 | 2024,6,8,5
265 | 2024,6,9,14
266 | 2024,6,10,17
267 | 2024,6,11,16
268 | 2024,6,12,12
269 | 2024,6,13,18
270 | 2024,6,14,9
271 | 2024,6,15,4
272 | 2024,6,16,6
273 | 2024,6,17,17
274 | 2024,6,18,19
275 | 2024,6,19,18
276 | 2024,6,20,13
277 | 2024,6,21,24
278 | 2024,6,22,6
279 | 2024,6,23,6
280 | 2024,6,24,13
281 | 2024,6,25,24
282 | 2024,6,26,21
283 | 2024,6,27,19
284 | 2024,6,28,9
285 | 2024,6,29,9
286 | 2024,6,30,3
287 | 2024,7,1,13
288 | 2024,7,2,13
289 | 2024,7,3,10
290 | 2024,7,4,8
291 | 2024,7,5,7
292 | 2024,7,6,11
293 | 2024,7,7,10
294 | 2024,7,8,7
295 | 2024,7,9,15
296 | 2024,7,10,12
297 | 2024,7,11,12
298 | 2024,7,12,11
299 | 2024,7,13,8
300 | 2024,7,14,11
301 | 2024,7,15,18
302 | 2024,7,16,17
303 | 2024,7,17,9
304 | 2024,7,18,11
305 | 2024,7,19,10
306 | 2024,7,20,4
307 | 2024,7,21,8
308 | 2024,7,22,17
309 | 2024,7,23,17
310 | 2024,7,24,17
311 | 2024,7,25,16
312 | 2024,7,26,16
313 | 2024,7,27,3
314 | 2024,7,28,8
315 | 2024,7,29,22
316 | 2024,7,30,23
317 | 2024,7,31,20
318 | 2024,8,1,17
319 | 2024,8,2,18
320 | 2024,8,3,9
321 | 2024,8,4,10
322 | 2024,8,5,11
323 | 2024,8,6,14
324 | 2024,8,7,19
325 | 2024,8,8,11
326 | 2024,8,9,8
327 | 2024,8,10,8
328 | 2024,8,11,6
329 | 2024,8,12,15
330 | 2024,8,13,12
331 | 2024,8,14,8
332 | 2024,8,15,17
333 | 2024,8,16,9
334 | 2024,8,17,11
335 | 2024,8,18,10
336 | 2024,8,19,8
337 | 2024,8,20,13
338 | 2024,8,21,9
339 | 2024,8,22,10
340 | 2024,8,23,16
341 | 2024,8,24,14
342 | 2024,8,25,5
343 | 2024,8,26,3
344 | 2024,8,27,7
345 | 2024,8,28,14
346 | 2024,8,29,16
347 | 2024,8,30,18
348 | 2024,8,31,7
349 | 2024,9,1,11
350 | 2024,9,2,12
351 | 2024,9,3,10
352 | 2024,9,4,18
353 | 2024,9,5,19
354 | 2024,9,6,21
355 | 2024,9,7,18
356 | 2024,9,8,8
357 | 2024,9,9,17
358 | 2024,9,10,9
359 | 2024,9,11,12
360 | 2024,9,12,12
361 | 2024,9,13,7
362 | 2024,9,14,2
363 | 2024,9,15,12
364 | 2024,9,16,8
365 | 2024,9,17,10
366 | 2024,9,18,19
367 | 2024,9,19,13
368 | 2024,9,20,20
369 | 2024,9,21,16
370 | 2024,9,22,6
371 | 2024,9,23,12
372 | 2024,9,24,16
373 | 2024,9,25,15
374 | 2024,9,26,14
375 | 2024,9,27,16
376 | 2024,9,28,13
377 | 2024,9,29,6
378 | 2024,9,30,10
379 | 2024,10,1,15
380 | 2024,10,2,13
381 | 2024,10,3,21
382 | 2024,10,4,15
383 | 2024,10,5,13
384 | 2024,10,6,19
385 | 2024,10,7,24
386 | 2024,10,8,12
387 | 2024,10,9,20
388 | 2024,10,10,14
389 | 2024,10,11,6
390 | 2024,10,12,8
391 | 2024,10,13,3
392 | 2024,10,14,18
393 | 2024,10,15,13
394 | 2024,10,16,9
395 | 2024,10,17,8
396 | 2024,10,18,8
397 | 2024,10,19,9
398 | 2024,10,20,9
399 | 2024,10,21,15
400 | 2024,10,22,8
401 | 2024,10,23,15
402 | 2024,10,24,15
403 | 2024,10,25,12
404 | 2024,10,26,6
405 | 2024,10,27,3
406 | 2024,10,28,35
407 | 2024,10,29,26
408 | 2024,10,30,24
409 | 2024,10,31,16
410 | 2024,11,1,15
411 | 2024,11,2,7
412 | 2024,11,3,3
413 | 2024,11,4,19
414 | 2024,11,5,12
415 | 2024,11,6,11
416 | 2024,11,7,17
417 | 2024,11,8,8
418 | 2024,11,9,9
419 | 2024,11,10,4
420 | 2024,11,11,10
421 | 2024,11,12,20
422 | 2024,11,13,13
423 | 2024,11,14,12
424 | 2024,11,15,8
425 | 2024,11,16,11
426 | 2024,11,17,11
427 | 2024,11,18,11
428 | 2024,11,19,17
429 | 2024,11,20,12
430 | 2024,11,21,10
431 | 2024,11,22,8
432 | 2024,11,23,10
433 | 2024,11,24,7
434 | 2024,11,25,9
435 | 2024,11,26,11
436 | 2024,11,27,12
437 | 2024,11,28,12
438 | 2024,11,29,11
439 | 2024,11,30,8
440 | 2024,12,1,6
441 | 2024,12,2,16
442 | 2024,12,3,8
443 | 2024,12,4,11
444 | 2024,12,5,9
445 | 2024,12,6,8
446 | 2024,12,7,3
447 | 2024,12,8,6
448 | 2024,12,9,7
449 | 2024,12,10,11
450 | 2024,12,11,14
451 | 2024,12,12,10
452 | 2024,12,13,8
453 | 2024,12,14,9
454 | 2024,12,15,9
455 | 2024,12,16,9
456 | 2024,12,17,13
457 | 2024,12,18,7
458 | 2024,12,19,8
459 | 2024,12,20,12
460 | 2024,12,21,6
461 | 2024,12,22,5
462 | 2024,12,23,6
463 | 2024,12,24,3
464 | 2024,12,25,8
465 | 2024,12,26,9
466 | 2024,12,27,11
467 | 2024,12,28,5
468 | 2024,12,29,7
469 | 2024,12,30,9
470 | 2024,12,31,9
471 | 2025,1,1,3
472 | 2025,1,2,9
473 | 2025,1,3,15
474 | 2025,1,4,10
475 | 2025,1,5,11
476 | 2025,1,6,6
477 | 2025,1,7,12
478 | 2025,1,8,7
479 | 2025,1,9,10
480 | 2025,1,10,13
481 | 2025,1,11,7
482 | 2025,1,12,10
483 | 2025,1,13,9
484 | 2025,1,14,10
485 | 2025,1,15,13
486 | 2025,1,16,6
487 | 2025,1,17,5
488 | 2025,1,18,9
489 | 2025,1,19,5
490 | 2025,1,20,15
491 | 2025,1,21,7
492 | 2025,1,22,18
493 | 2025,1,23,21
494 | 2025,1,24,13
495 | 2025,1,25,18
496 | 2025,1,26,12
497 | 2025,1,27,15
498 | 2025,1,28,34
499 | 2025,1,29,26
500 | 2025,1,30,20
501 | 2025,1,31,18
502 | 2025,2,1,8
503 | 2025,2,2,12
504 | 2025,2,3,21
505 | 2025,2,4,12
506 | 2025,2,5,20
507 | 2025,2,6,14
508 | 2025,2,7,18
509 | 2025,2,8,10
510 | 2025,2,9,14
511 | 2025,2,10,15
512 | 2025,2,11,7
513 | 2025,2,12,10
514 | 2025,2,13,15
515 | 2025,2,14,14
516 | 2025,2,15,5
517 | 2025,2,16,2
518 | 2025,2,17,10
519 | 2025,2,18,8
520 | 2025,2,19,6
521 | 2025,2,20,9
522 | 2025,2,21,6
523 | 2025,2,22,8
524 | 2025,2,23,8
525 | 2025,2,24,8
526 | 2025,2,25,12
527 | 2025,2,26,8
528 | 2025,2,27,23
529 | 2025,2,28,20
530 | 2025,3,1,17
531 | 2025,3,2,7
532 | 2025,3,3,31
533 | 2025,3,4,28
534 | 2025,3,5,17
535 | 2025,3,6,14
536 | 2025,3,7,10
537 | 2025,3,8,12
538 | 2025,3,9,8
539 | 2025,3,10,11
540 | 2025,3,11,9
541 | 2025,3,12,10
542 | 2025,3,13,13
543 | 2025,3,14,16
544 | 2025,3,15,3
545 | 2025,3,16,12
546 | 2025,3,17,11
547 | 2025,3,18,17
548 | 2025,3,19,14
549 | 2025,3,20,9
550 | 2025,3,21,7
551 | 2025,3,22,5
552 | 2025,3,23,5
553 | 2025,3,24,19
554 | 2025,3,25,16
555 | 2025,3,26,9
556 | 2025,3,27,13
557 | 2025,3,28,10
558 | 2025,3,29,7
559 | 2025,3,30,8
560 | 2025,3,31,12
561 | 2025,4,1,11
562 | 2025,4,2,19
563 |
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/stars-forks/valkey-forks.csv:
--------------------------------------------------------------------------------
1 | yr,mon,d,forks
2 | 2024,3,28,14
3 | 2024,3,29,36
4 | 2024,3,30,37
5 | 2024,3,31,76
6 | 2024,4,1,46
7 | 2024,4,2,32
8 | 2024,4,3,12
9 | 2024,4,4,15
10 | 2024,4,5,8
11 | 2024,4,6,4
12 | 2024,4,7,3
13 | 2024,4,8,7
14 | 2024,4,9,3
15 | 2024,4,10,1
16 | 2024,4,11,2
17 | 2024,4,12,3
18 | 2024,4,13,4
19 | 2024,4,14,5
20 | 2024,4,15,1
21 | 2024,4,16,5
22 | 2024,4,17,11
23 | 2024,4,18,25
24 | 2024,4,19,38
25 | 2024,4,20,8
26 | 2024,4,21,1
27 | 2024,4,22,5
28 | 2024,4,23,4
29 | 2024,4,24,4
30 | 2024,4,25,8
31 | 2024,4,26,6
32 | 2024,4,27,1
33 | 2024,4,28,5
34 | 2024,4,29,2
35 | 2024,4,30,1
36 | 2024,5,2,2
37 | 2024,5,3,2
38 | 2024,5,4,1
39 | 2024,5,5,6
40 | 2024,5,6,3
41 | 2024,5,7,1
42 | 2024,5,8,3
43 | 2024,5,9,7
44 | 2024,5,10,2
45 | 2024,5,11,1
46 | 2024,5,12,1
47 | 2024,5,13,2
48 | 2024,5,14,1
49 | 2024,5,15,1
50 | 2024,5,16,2
51 | 2024,5,17,1
52 | 2024,5,18,2
53 | 2024,5,20,2
54 | 2024,5,21,2
55 | 2024,5,24,1
56 | 2024,5,25,2
57 | 2024,5,26,2
58 | 2024,5,28,2
59 | 2024,6,1,1
60 | 2024,6,2,1
61 | 2024,6,3,1
62 | 2024,6,4,1
63 | 2024,6,5,1
64 | 2024,6,6,3
65 | 2024,6,8,1
66 | 2024,6,9,3
67 | 2024,6,12,2
68 | 2024,6,13,1
69 | 2024,6,14,2
70 | 2024,6,15,1
71 | 2024,6,19,2
72 | 2024,6,20,2
73 | 2024,6,21,1
74 | 2024,6,22,1
75 | 2024,6,23,2
76 | 2024,6,24,4
77 | 2024,6,25,1
78 | 2024,6,26,1
79 | 2024,6,27,1
80 | 2024,6,28,2
81 | 2024,7,1,1
82 | 2024,7,2,1
83 | 2024,7,3,1
84 | 2024,7,11,4
85 | 2024,7,15,1
86 | 2024,7,16,1
87 | 2024,7,18,2
88 | 2024,7,20,1
89 | 2024,7,22,1
90 | 2024,7,23,1
91 | 2024,7,25,1
92 | 2024,7,29,2
93 | 2024,7,30,2
94 | 2024,7,31,1
95 | 2024,8,1,2
96 | 2024,8,2,2
97 | 2024,8,4,2
98 | 2024,8,6,1
99 | 2024,8,7,2
100 | 2024,8,8,1
101 | 2024,8,9,1
102 | 2024,8,10,1
103 | 2024,8,13,1
104 | 2024,8,14,3
105 | 2024,8,15,2
106 | 2024,8,16,1
107 | 2024,8,17,1
108 | 2024,8,20,1
109 | 2024,8,21,1
110 | 2024,8,25,3
111 | 2024,8,26,2
112 | 2024,8,27,1
113 | 2024,8,28,1
114 | 2024,8,30,2
115 | 2024,8,31,1
116 | 2024,9,2,2
117 | 2024,9,3,1
118 | 2024,9,4,2
119 | 2024,9,5,1
120 | 2024,9,6,1
121 | 2024,9,7,1
122 | 2024,9,8,1
123 | 2024,9,9,1
124 | 2024,9,11,2
125 | 2024,9,12,2
126 | 2024,9,13,2
127 | 2024,9,14,2
128 | 2024,9,16,3
129 | 2024,9,17,6
130 | 2024,9,18,6
131 | 2024,9,19,2
132 | 2024,9,20,6
133 | 2024,9,21,1
134 | 2024,9,23,2
135 | 2024,9,26,3
136 | 2024,9,27,2
137 | 2024,10,1,1
138 | 2024,10,2,1
139 | 2024,10,5,1
140 | 2024,10,7,1
141 | 2024,10,8,3
142 | 2024,10,10,1
143 | 2024,10,11,2
144 | 2024,10,14,2
145 | 2024,10,15,2
146 | 2024,10,16,2
147 | 2024,10,17,1
148 | 2024,10,18,1
149 | 2024,10,21,1
150 | 2024,10,22,1
151 | 2024,10,23,3
152 | 2024,10,24,1
153 | 2024,10,25,2
154 | 2024,10,27,1
155 | 2024,10,28,2
156 | 2024,10,30,1
157 | 2024,11,1,1
158 | 2024,11,2,1
159 | 2024,11,4,2
160 | 2024,11,8,2
161 | 2024,11,10,1
162 | 2024,11,11,2
163 | 2024,11,12,2
164 | 2024,11,13,1
165 | 2024,11,15,1
166 | 2024,11,18,1
167 | 2024,11,21,2
168 | 2024,11,22,1
169 | 2024,11,26,1
170 | 2024,11,27,4
171 | 2024,11,28,1
172 | 2024,11,29,1
173 | 2024,11,30,2
174 | 2024,12,1,2
175 | 2024,12,3,2
176 | 2024,12,4,4
177 | 2024,12,5,1
178 | 2024,12,6,2
179 | 2024,12,8,1
180 | 2024,12,9,1
181 | 2024,12,10,2
182 | 2024,12,11,1
183 | 2024,12,12,2
184 | 2024,12,16,1
185 | 2024,12,17,3
186 | 2024,12,19,1
187 | 2024,12,25,1
188 | 2024,12,27,1
189 | 2024,12,30,1
190 | 2025,1,6,2
191 | 2025,1,9,1
192 | 2025,1,10,1
193 | 2025,1,12,1
194 | 2025,1,13,1
195 | 2025,1,15,2
196 | 2025,1,16,1
197 | 2025,1,17,1
198 | 2025,1,24,1
199 | 2025,1,25,1
200 | 2025,1,29,1
201 | 2025,2,3,1
202 | 2025,2,5,3
203 | 2025,2,6,1
204 | 2025,2,7,3
205 | 2025,2,8,2
206 | 2025,2,11,2
207 | 2025,2,14,3
208 | 2025,2,15,1
209 | 2025,2,17,2
210 | 2025,2,18,1
211 | 2025,2,20,1
212 | 2025,2,21,3
213 | 2025,2,23,1
214 | 2025,2,26,1
215 | 2025,2,27,2
216 | 2025,2,28,1
217 | 2025,3,1,1
218 | 2025,3,4,4
219 | 2025,3,5,1
220 | 2025,3,7,1
221 | 2025,3,9,1
222 | 2025,3,10,1
223 | 2025,3,12,2
224 | 2025,3,14,1
225 | 2025,3,17,1
226 | 2025,3,21,4
227 | 2025,3,22,1
228 | 2025,3,23,1
229 | 2025,3,24,1
230 | 2025,3,25,1
231 | 2025,3,26,1
232 | 2025,3,28,2
233 | 2025,3,29,2
234 | 2025,3,31,1
235 |
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/stars-forks/valkey-stars.csv:
--------------------------------------------------------------------------------
1 | yr,mon,d,stars
2 | 2024,3,28,518
3 | 2024,3,29,1482
4 | 2024,3,30,1090
5 | 2024,3,31,1083
6 | 2024,4,1,850
7 | 2024,4,2,383
8 | 2024,4,3,245
9 | 2024,4,4,897
10 | 2024,4,5,464
11 | 2024,4,6,306
12 | 2024,4,7,176
13 | 2024,4,8,186
14 | 2024,4,9,417
15 | 2024,4,10,199
16 | 2024,4,11,238
17 | 2024,4,12,277
18 | 2024,4,13,116
19 | 2024,4,14,68
20 | 2024,4,15,78
21 | 2024,4,16,148
22 | 2024,4,17,328
23 | 2024,4,18,593
24 | 2024,4,19,549
25 | 2024,4,20,214
26 | 2024,4,21,95
27 | 2024,4,22,128
28 | 2024,4,23,117
29 | 2024,4,24,105
30 | 2024,4,25,84
31 | 2024,4,26,67
32 | 2024,4,27,67
33 | 2024,4,28,66
34 | 2024,4,29,66
35 | 2024,4,30,68
36 | 2024,5,1,46
37 | 2024,5,2,66
38 | 2024,5,3,62
39 | 2024,5,4,36
40 | 2024,5,5,33
41 | 2024,5,6,35
42 | 2024,5,7,42
43 | 2024,5,8,32
44 | 2024,5,9,38
45 | 2024,5,10,35
46 | 2024,5,11,25
47 | 2024,5,12,23
48 | 2024,5,13,38
49 | 2024,5,14,27
50 | 2024,5,15,32
51 | 2024,5,16,16
52 | 2024,5,17,19
53 | 2024,5,18,14
54 | 2024,5,19,14
55 | 2024,5,20,35
56 | 2024,5,21,19
57 | 2024,5,22,23
58 | 2024,5,23,18
59 | 2024,5,24,25
60 | 2024,5,25,22
61 | 2024,5,26,17
62 | 2024,5,27,35
63 | 2024,5,28,43
64 | 2024,5,29,19
65 | 2024,5,30,23
66 | 2024,5,31,15
67 | 2024,6,1,19
68 | 2024,6,2,11
69 | 2024,6,3,16
70 | 2024,6,4,23
71 | 2024,6,5,29
72 | 2024,6,6,25
73 | 2024,6,7,15
74 | 2024,6,8,11
75 | 2024,6,9,7
76 | 2024,6,10,15
77 | 2024,6,11,21
78 | 2024,6,12,19
79 | 2024,6,13,17
80 | 2024,6,14,12
81 | 2024,6,15,13
82 | 2024,6,16,6
83 | 2024,6,17,16
84 | 2024,6,18,25
85 | 2024,6,19,22
86 | 2024,6,20,26
87 | 2024,6,21,18
88 | 2024,6,22,16
89 | 2024,6,23,9
90 | 2024,6,24,20
91 | 2024,6,25,13
92 | 2024,6,26,11
93 | 2024,6,27,18
94 | 2024,6,28,16
95 | 2024,6,29,15
96 | 2024,6,30,5
97 | 2024,7,1,15
98 | 2024,7,2,23
99 | 2024,7,3,13
100 | 2024,7,4,17
101 | 2024,7,5,10
102 | 2024,7,6,4
103 | 2024,7,7,21
104 | 2024,7,8,30
105 | 2024,7,9,16
106 | 2024,7,10,14
107 | 2024,7,11,28
108 | 2024,7,12,17
109 | 2024,7,13,9
110 | 2024,7,14,13
111 | 2024,7,15,13
112 | 2024,7,16,12
113 | 2024,7,17,16
114 | 2024,7,18,15
115 | 2024,7,19,8
116 | 2024,7,20,6
117 | 2024,7,21,11
118 | 2024,7,22,18
119 | 2024,7,23,14
120 | 2024,7,24,14
121 | 2024,7,25,7
122 | 2024,7,26,13
123 | 2024,7,27,4
124 | 2024,7,28,9
125 | 2024,7,29,20
126 | 2024,7,30,14
127 | 2024,7,31,17
128 | 2024,8,1,14
129 | 2024,8,2,5
130 | 2024,8,3,14
131 | 2024,8,4,12
132 | 2024,8,5,11
133 | 2024,8,6,9
134 | 2024,8,7,16
135 | 2024,8,8,16
136 | 2024,8,9,10
137 | 2024,8,10,9
138 | 2024,8,11,5
139 | 2024,8,12,9
140 | 2024,8,13,12
141 | 2024,8,14,16
142 | 2024,8,15,14
143 | 2024,8,16,9
144 | 2024,8,17,21
145 | 2024,8,18,18
146 | 2024,8,19,11
147 | 2024,8,20,11
148 | 2024,8,21,9
149 | 2024,8,22,9
150 | 2024,8,23,13
151 | 2024,8,24,5
152 | 2024,8,25,7
153 | 2024,8,26,9
154 | 2024,8,27,29
155 | 2024,8,28,26
156 | 2024,8,29,20
157 | 2024,8,30,18
158 | 2024,8,31,15
159 | 2024,9,1,12
160 | 2024,9,2,28
161 | 2024,9,3,30
162 | 2024,9,4,33
163 | 2024,9,5,26
164 | 2024,9,6,25
165 | 2024,9,7,8
166 | 2024,9,8,11
167 | 2024,9,9,24
168 | 2024,9,10,10
169 | 2024,9,11,14
170 | 2024,9,12,14
171 | 2024,9,13,19
172 | 2024,9,14,38
173 | 2024,9,15,14
174 | 2024,9,16,67
175 | 2024,9,17,83
176 | 2024,9,18,150
177 | 2024,9,19,62
178 | 2024,9,20,44
179 | 2024,9,21,36
180 | 2024,9,22,38
181 | 2024,9,23,44
182 | 2024,9,24,46
183 | 2024,9,25,15
184 | 2024,9,26,33
185 | 2024,9,27,26
186 | 2024,9,28,14
187 | 2024,9,29,11
188 | 2024,9,30,16
189 | 2024,10,1,10
190 | 2024,10,2,20
191 | 2024,10,3,15
192 | 2024,10,4,21
193 | 2024,10,5,18
194 | 2024,10,6,8
195 | 2024,10,7,16
196 | 2024,10,8,21
197 | 2024,10,9,74
198 | 2024,10,10,37
199 | 2024,10,11,60
200 | 2024,10,12,28
201 | 2024,10,13,8
202 | 2024,10,14,47
203 | 2024,10,15,33
204 | 2024,10,16,29
205 | 2024,10,17,24
206 | 2024,10,18,22
207 | 2024,10,19,16
208 | 2024,10,20,7
209 | 2024,10,21,16
210 | 2024,10,22,27
211 | 2024,10,23,19
212 | 2024,10,24,13
213 | 2024,10,25,22
214 | 2024,10,26,15
215 | 2024,10,27,15
216 | 2024,10,28,15
217 | 2024,10,29,16
218 | 2024,10,30,20
219 | 2024,10,31,18
220 | 2024,11,1,8
221 | 2024,11,2,11
222 | 2024,11,3,8
223 | 2024,11,4,13
224 | 2024,11,5,20
225 | 2024,11,6,18
226 | 2024,11,7,12
227 | 2024,11,8,12
228 | 2024,11,9,13
229 | 2024,11,10,10
230 | 2024,11,11,14
231 | 2024,11,12,15
232 | 2024,11,13,17
233 | 2024,11,14,20
234 | 2024,11,15,13
235 | 2024,11,16,8
236 | 2024,11,17,11
237 | 2024,11,18,7
238 | 2024,11,19,9
239 | 2024,11,20,21
240 | 2024,11,21,27
241 | 2024,11,22,9
242 | 2024,11,23,5
243 | 2024,11,24,9
244 | 2024,11,25,14
245 | 2024,11,26,45
246 | 2024,11,27,66
247 | 2024,11,28,49
248 | 2024,11,29,29
249 | 2024,11,30,27
250 | 2024,12,1,21
251 | 2024,12,2,32
252 | 2024,12,3,46
253 | 2024,12,4,41
254 | 2024,12,5,34
255 | 2024,12,6,26
256 | 2024,12,7,13
257 | 2024,12,8,6
258 | 2024,12,9,17
259 | 2024,12,10,19
260 | 2024,12,11,36
261 | 2024,12,12,30
262 | 2024,12,13,23
263 | 2024,12,14,10
264 | 2024,12,15,18
265 | 2024,12,16,18
266 | 2024,12,17,27
267 | 2024,12,18,11
268 | 2024,12,19,11
269 | 2024,12,20,18
270 | 2024,12,21,7
271 | 2024,12,22,9
272 | 2024,12,23,17
273 | 2024,12,24,10
274 | 2024,12,25,18
275 | 2024,12,26,19
276 | 2024,12,27,12
277 | 2024,12,28,10
278 | 2024,12,29,13
279 | 2024,12,30,11
280 | 2024,12,31,16
281 | 2025,1,1,23
282 | 2025,1,2,21
283 | 2025,1,3,14
284 | 2025,1,4,16
285 | 2025,1,5,10
286 | 2025,1,6,12
287 | 2025,1,7,9
288 | 2025,1,8,8
289 | 2025,1,9,15
290 | 2025,1,10,8
291 | 2025,1,11,6
292 | 2025,1,12,7
293 | 2025,1,13,20
294 | 2025,1,14,19
295 | 2025,1,15,9
296 | 2025,1,16,15
297 | 2025,1,17,9
298 | 2025,1,18,4
299 | 2025,1,19,7
300 | 2025,1,20,14
301 | 2025,1,21,10
302 | 2025,1,22,12
303 | 2025,1,23,13
304 | 2025,1,24,10
305 | 2025,1,25,16
306 | 2025,1,26,10
307 | 2025,1,27,11
308 | 2025,1,28,19
309 | 2025,1,29,22
310 | 2025,1,30,10
311 | 2025,1,31,9
312 | 2025,2,1,11
313 | 2025,2,2,5
314 | 2025,2,3,10
315 | 2025,2,4,12
316 | 2025,2,5,17
317 | 2025,2,6,13
318 | 2025,2,7,16
319 | 2025,2,8,16
320 | 2025,2,9,9
321 | 2025,2,10,15
322 | 2025,2,11,10
323 | 2025,2,12,12
324 | 2025,2,13,15
325 | 2025,2,14,20
326 | 2025,2,15,15
327 | 2025,2,16,8
328 | 2025,2,17,33
329 | 2025,2,18,15
330 | 2025,2,19,30
331 | 2025,2,20,16
332 | 2025,2,21,43
333 | 2025,2,22,36
334 | 2025,2,23,28
335 | 2025,2,24,23
336 | 2025,2,25,26
337 | 2025,2,26,26
338 | 2025,2,27,22
339 | 2025,2,28,17
340 | 2025,3,1,22
341 | 2025,3,2,15
342 | 2025,3,3,17
343 | 2025,3,4,23
344 | 2025,3,5,19
345 | 2025,3,6,23
346 | 2025,3,7,20
347 | 2025,3,8,12
348 | 2025,3,9,15
349 | 2025,3,10,14
350 | 2025,3,11,14
351 | 2025,3,12,12
352 | 2025,3,13,14
353 | 2025,3,14,19
354 | 2025,3,15,9
355 | 2025,3,16,21
356 | 2025,3,17,31
357 | 2025,3,18,13
358 | 2025,3,19,17
359 | 2025,3,20,17
360 | 2025,3,21,9
361 | 2025,3,22,6
362 | 2025,3,23,14
363 | 2025,3,24,12
364 | 2025,3,25,24
365 | 2025,3,26,32
366 | 2025,3,27,17
367 | 2025,3,28,18
368 | 2025,3,29,15
369 | 2025,3,30,10
370 | 2025,3,31,11
371 | 2025,4,1,29
372 | 2025,4,2,26
373 |
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/terraform2021-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/terraform2021-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/terraform2022-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/terraform2022-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/terraform2023-08-10T00:00:00.000+00:002024-08-10T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/terraform2023-08-10T00:00:00.000+00:002024-08-10T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/terraform_people_2021-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/terraform_people_2021-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/terraform_people_2022-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/terraform_people_2022-08-10T00:00:00.000+00:002023-08-10T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/terraform_people_2023-08-10T00:00:00.000+00:002024-08-10T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/terraform_people_2023-08-10T00:00:00.000+00:002024-08-10T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/valkey2024-03-28T00:00:00.000+00:002024-08-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/valkey2024-03-28T00:00:00.000+00:002024-08-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/valkey2024-03-28T00:00:00.000+00:002024-09-28T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/valkey2024-03-28T00:00:00.000+00:002024-09-28T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/valkey2024-03-28T00:00:00.000+00:002025-03-28T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/valkey2024-03-28T00:00:00.000+00:002025-03-28T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/valkey_people_2024-03-28T00:00:00.000+00:002024-08-20T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/valkey_people_2024-03-28T00:00:00.000+00:002024-08-20T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/valkey_people_2024-03-28T00:00:00.000+00:002024-09-28T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/valkey_people_2024-03-28T00:00:00.000+00:002024-09-28T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/fork-case-study/data-files/valkey_people_2024-03-28T00:00:00.000+00:002025-03-28T00:00:00.000+00:00.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/license-changes/fork-case-study/data-files/valkey_people_2024-03-28T00:00:00.000+00:002025-03-28T00:00:00.000+00:00.pkl
--------------------------------------------------------------------------------
/dataset/license-changes/forks.csv:
--------------------------------------------------------------------------------
1 | fork_name,fork_date,fork_repo,fork_lic,fork_owner,from_name,from_repo,from_lic,from_owner,category,notes
2 | MariaDB,2009-10-29,https://github.com/MariaDB/server,GPL-2.0,MariaDB Foundation,MySQL,https://github.com/mysql/mysql-server ,GPL-2.0,Oracle,acquisition,Fork was shortly after Oracle acquired Sun. Many of the original MySQL core developers wanted to ensure that the code base would remain open.
3 | LibreOffice,2010-09-28,https://github.com/LibreOffice/core,GPL-3.0,The Document Foundation,OpenOffice,https://svn.apache.org/repos/asf/openoffice,LGPL-2.1,Oracle,acquisition,Fork was shortly after Oracle acquired Sun when Oracle pulled most of the developers off of OpenOffice.
4 | OpenSearch,2021-04-12,https://github.com/opensearch-project/OpenSearch ,Apache-2.0,AWS,Elasticsearch,https://github.com/elastic/elasticsearch/,Apache-2.0,Elastic,relicense,Fork happened when Elastic relicensed to SSPL-1.0 + Elastic License
5 | OpenSearch Dashboard,2021-04-12,https://github.com/opensearch-project/OpenSearch-Dashboards,Apache-2.0,AWS,Kibana,https://github.com/elastic/kibana,Apache-2.0,Elastic,relicense,Fork happened when Elastic relicensed to SSPL-1.0 + Elastic License
6 | Mimir,2022-03-30,https://github.com/grafana/mimir,AGPL-3.0,Grafana,Cortex,https://github.com/cortexproject/cortex,Apache-2.0,CNCF,feature,"Grafana was driving most of the development on Cortex and had features they wanted to add, but not under Apache. They pitched it as a strategic decision, but it was also effectively a relicense. "
7 | OpenTofu,2023-09-05,https://github.com/opentofu/opentofu,MPL-2.0,Linux Foundation,Terraform,https://github.com/hashicorp/terraform,MPL-2.0,HashiCorp,relicense,Fork happened when HashiCorp switched from an open source license to the Business Source License (BSL)
8 | OpenBao,2023-12-08,https://github.com/openbao/openbao,MPL-2.0,Linux Foundation,Vault,https://github.com/hashicorp/vault,MPL-2.0,HashiCorp,relicense,Fork happened when HashiCorp switched from an open source license to the Business Source License (BSL)
9 | Valkey,2024-03-28,https://github.com/valkey-io/valkey,BSD-3-clause,Linux Foundation,Redis,https://github.com/redis/redis,BSD-3-clause,Redis,relicense,Forked when Redis moved to be dual-licensed under the RSALv2 and SSPLv1.
10 | egcs,1997-08-15,https://github.com/gcc-mirror/gcc ,GPL,Cygnus,gcc,https://github.com/gcc-mirror/gcc ,GPL-2.0,Free Software Foundation,feature,Community members became frustrated that their changes were not being merged into gcc and forked it into egcs. FSF made egcs the official version of gcc and in July 1999 the two projects were united again.
11 | Adempiere,2006-10-12,https://github.com/adempiere/adempiere,GPL-2.0,ADempiere Community,Compiere,https://www.compiere.com/svn/,GPL-2.0,Compiere,feature,Community members became frustrated that their changes were not being merged into Compiere. Compiere the company failed and Adempiere is still running
12 | Jenkins,2011-02-02,https://github.com/jenkinsci/jenkins,MIT,Linux Foundation CD Foundation,Hudson,,MIT,Oracle,acquisition,Fork was shortly after Oracle acquired Sun. Original creator forked it into Jenkins. Hudson was donated to eclipse in 2012 and final release in 2016.
13 | io.js,2014-12-01,https://github.com/nodejs/node,MIT,Node Forward,Node.js,https://github.com/nodejs/node ,MIT,Joyent,feature,Forked to allow for more community input. This fork lasted only 12 months or so before the communities knitted the forks back together again under the newly formed Node.js Foundation
14 | NextCloud,2016-04-22,https://github.com/nextcloud/server ,AGPL-3.0,Nextcloud GmbH,ownCloud,https://github.com/owncloud/core,AGPL-3.0,ownCloud,feature,"NextCloud forked from ownCloud by its founder Frank Karlitsheck due to concerns about management and priorities between growth, money, and sustainability."
15 |
--------------------------------------------------------------------------------
/dataset/license-changes/generate-license-data.py:
--------------------------------------------------------------------------------
1 | #SPDX-License-Identifier: MIT
2 |
3 | # Note: License file must be in the same org/repo
4 |
5 | def setup_validate():
6 | import argparse
7 | import os
8 | import csv
9 | import sys
10 |
11 | # Gather options from command line arguments and store them in variables
12 |
13 | parser = argparse.ArgumentParser()
14 |
15 | parser.add_argument("-c", "--configfile", required=False, dest="input_data_file", default="inputdata.csv", help="The full path to a csv input file see inputdata.csv example in this repo (defaults to inputdata.csv)")
16 | parser.add_argument("-t", "--tokenvar", required=False, dest="token_var", default="GITHUB_AUTH_TOKEN", help="The name of the environmental variable where your GitHub personal access token can be found (defaults to GITHUB_AUTH_TOKEN)")
17 |
18 | args = parser.parse_args()
19 | input_data_file = args.input_data_file
20 | token_var = args.token_var
21 |
22 | # Read GitHub personal access token from the GITHUB_AUTH_TOKEN environment variable
23 |
24 | try:
25 | gh_key = os.environ[token_var]
26 |
27 | except:
28 | print("You must have an environment variable with a GitHub personal access token to run this script")
29 | print("For Linux / MacOS: export GITHUB_AUTH_TOKEN=")
30 | print("For Windows: set GITHUB_AUTH_TOKEN=")
31 | print("Exiting ...")
32 | sys.exit(1)
33 |
34 | # Read input file with some minimal data validation and store data in a list
35 |
36 | try:
37 | input_data = []
38 |
39 | with open(input_data_file, 'r') as f:
40 | data = csv.reader(f)
41 | next(data, None) # skip header line
42 | for row in data:
43 | # Skip blank lines
44 | if len(row) != 0:
45 | input_data.append(row)
46 |
47 | # Make sure the csv file has 3 values per row before continuing
48 | if len(row) != 7:
49 | print("Data errors detected in row containing:", row)
50 | print("Each line in the csv must contain 7 values.")
51 | sys.exit(1)
52 |
53 | # Check for empty file and exit
54 | if len(input_data) == 0:
55 | print(input_data_file, "appears to be empty.")
56 | sys.exit(1)
57 |
58 | print("Reading input data from", input_data_file)
59 |
60 | return input_data, gh_key
61 |
62 | except:
63 | print("Can't read file", input_data_file, "Exiting ...")
64 | sys.exit(1)
65 |
66 |
67 |
68 | def make_query():
69 | return """query licenseData($org: String!, $repo: String!, $lic_file: String!){
70 | repository(owner: $org, name: $repo){
71 | defaultBranchRef {
72 | target {
73 | ... on Commit {
74 | history(path: $lic_file, first: 100) {
75 | nodes {
76 | committedDate
77 | url
78 | additions
79 | deletions
80 | message
81 | }
82 | }
83 | }
84 | }
85 | }
86 | url
87 | nameWithOwner
88 | }
89 | }
90 | """
91 |
92 | def get_license_data():
93 |
94 | import requests
95 | import json
96 | import sys
97 |
98 | input_data, api_token = setup_validate()
99 | #print(input_data, api_token)
100 |
101 | url = 'https://api.github.com/graphql'
102 | headers = {'Authorization': 'token %s' % api_token}
103 |
104 | json_list = []
105 |
106 | for row in input_data:
107 | project = row[0]
108 | relicense_date = row[1]
109 | orig_lic = row[2]
110 | new_lic = row[3]
111 | org = row[4]
112 | repo = row[5]
113 | lic_file = row[6]
114 |
115 | try:
116 | query = make_query()
117 |
118 | variables = {"org": org, "repo": repo, "lic_file": lic_file}
119 | #variables = {"org": org, "repo": repo}
120 | r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
121 | json_data = json.loads(r.text)['data']
122 |
123 | # Add data from csv file
124 | json_data['repository']['project'] = project
125 | json_data['repository']['relicense_date'] = relicense_date
126 | json_data['repository']['orig_lic'] = orig_lic
127 | json_data['repository']['new_lic'] = new_lic
128 |
129 | json_list.append(json_data)
130 |
131 | except:
132 | print("Unknown error: Could not get data from GitHub")
133 | sys.exit(1)
134 |
135 | return json_list
136 |
137 | import json
138 |
139 | json_list = get_license_data()
140 | print(json_list)
141 |
142 | json_obj = json.dumps(json_list, indent=4)
143 |
144 | with open('output.json', 'w') as output:
145 | output.write(json_obj)
146 |
--------------------------------------------------------------------------------
/dataset/license-changes/inputdata.csv:
--------------------------------------------------------------------------------
1 | This file has moved to https://github.com/chaoss/wg-data-science/blob/main/dataset/license-changes/license_changes.csv
2 |
--------------------------------------------------------------------------------
/dataset/license-changes/license_changes.csv:
--------------------------------------------------------------------------------
1 | project,relicense_date,orig_license,new_license,org,repo,license_file
2 | Airbyte,2021-09-27,MIT,Elastic License v2.0,airbytehq,airbyte,LICENSE
3 | Akka,2022-09-07,Apache-2.0,Business Source License,akka,akka,LICENSE
4 | ArangoDB,2024-02-23,Apache-2.0,Business Source License,arangodb,arangodb,LICENSE
5 | Aseprite,2016-08-29,GPL-2.0,EULA that permits personal use but forbids redistribution,aseprite,aseprite,EULA.txt
6 | CockroachDB,2019-06-04,Apache-2.0,Business Source License,cockroachdb,cockroach,LICENSE
7 | Confluent,2018-12-14,Apache-2.0,Confluent Community License Agreement,confluentinc,ksql,LICENSE
8 | Consul,2023-08-10,MPL-2.0,Business Source License,hashicorp,consul,LICENSE
9 | Elasticsearch,2021-02-03,Apache-2.0,"Elastic License and Server Side Public License",elastic,elasticsearch,LICENSE.txt
10 | Elasticsearch,2021-02-03,Apache-2.0,"Elastic License and Server Side Public License",elastic,kibana,LICENSE.txt
11 | LiveCode,2021-08-31,GPL-3.0-only,project archived,livecode,livecode,LICENSE
12 | MongoDB,2018-10-16,AGPL-3.0-only,Server Side Public License,mongodb,mongo,LICENSE-Community.txt
13 | OTRS,2021-01-27,GPL-3.0-or-later,project archived,OTRS,otrs,COPYING
14 | Redis,2024-03-20,BSD-3-Clause,dual: custom license and Server Side Public License,redis,redis,LICENSE.txt
15 | Sentry,2019-11-06,BSD-3-Clause,Business Source License,getsentry,sentry,LICENSE.md
16 | Sourcegraph,2018-10-27,Apache-2.0,Sourcegraph Enterprise license,sourcegraph,sourcegraph,LICENSE
17 | Terraform,2023-08-10,MPL-2.0,Business Source License,hashicorp,terraform,LICENSE
18 | Timescale,2018-12-29,Apache-2.0,Timescale License,timescale,timescaledb,LICENSE
19 | Vagrant,2023-08-10,MIT,Business Source License,hashicorp,vagrant,LICENSE
20 | Vault,2023-08-10,MPL-2.0,Business Source License,hashicorp,vault,LICENSE
21 |
--------------------------------------------------------------------------------
/dataset/license-changes/more_forks.csv:
--------------------------------------------------------------------------------
1 | fork_name,fork_date,fork_repo,fork_lic,fork_owner,from_name,from_repo,from_lic,from_owner,category,notes,Type,source_citation,additional_notes
2 | Flock,2024-10-27,https://github.com/join-the-flock/flock,BSD-3-clause,,Flutter,https://github.com/flutter/flutter,BSD-3-clause,Google,feature,Community fork for faster iteration speed. Probably friendly.,clone fork,https://getflocked.dev/blog/posts/we-are-forking-flutter-this-is-why/,friendly
3 | Fossify Calendar,2023-12-03,https://github.com/FossifyOrg/Calendar,GPL-3.0,,Simple-Calendar,https://github.com/SimpleMobileTools/Simple-Calendar,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
4 | Fossify File Manager,2023-12-03,https://github.com/FossifyOrg/File-Manager,GPL-3.0,,Simple-File-Manager,https://github.com/SimpleMobileTools/Simple-File-Manager,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
5 | Fossify Gallery,2023-12-03,https://github.com/FossifyOrg/Gallery,GPL-3.0,,Simple-Gallery,https://github.com/SimpleMobileTools/Simple-Gallery,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
6 | Fossify Music Player,2023-12-03,https://github.com/FossifyOrg/Music-Player,GPL-3.0,,Simple-Music-Player,https://github.com/SimpleMobileTools/Simple-Music-Player,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
7 | Fossify Contacts,2023-12-03,https://github.com/FossifyOrg/Contacts,GPL-3.0,,Simple-Contacts,https://github.com/SimpleMobileTools/Simple-Contacts,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
8 | Fossify Notes,2023-12-03,https://github.com/FossifyOrg/Notes,GPL-3.0,,Simple-Notes,https://github.com/SimpleMobileTools/Simple-Notes,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
9 | Fossify Messages,2023-12-03,https://github.com/FossifyOrg/Messages,GPL-3.0,,Simple-SMS-Messenger,https://github.com/SimpleMobileTools/Simple-SMS-Messenger,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
10 | Fossify Phone,2023-12-03,https://github.com/FossifyOrg/Phone,GPL-3.0,,Simple-Dialer,https://github.com/SimpleMobileTools/Simple-Dialer,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
11 | Fossify Keyboard,2023-12-03,https://github.com/FossifyOrg/Keyboard,GPL-3.0,,Simple-Keyboard,https://github.com/SimpleMobileTools/Simple-Keyboard,GPL-3.0,ZipoApps,acquisition,"SimpleMobileTools was sold to ZipoApps, the community forked it",clone fork,https://github.com/SimpleMobileTools/General-Discussion/issues/241#issuecomment-1837102917,"hostile, original got bought by ad tech company"
12 | QUIK,,https://github.com/octoshrimpy/quik,GPL-3.0,,QKSMS,https://github.com/moezbhatti/qksms,GPL-3.0,,feature,QKSMS was unmaintained and got forked,clone fork,,
13 | HeliBoard,,https://github.com/Helium314/HeliBoard,GPL-3.0 and Apache 2,,OpenBoard,https://github.com/openboard-team/openboard,GPL-3.0,Openboard Team,feature,OpenBoard was unmaintained,clone fork,,
14 | Tenacity,,https://codeberg.org/tenacityteam/tenacity,GPL-2.0,,Audacity,https://github.com/audacity/audacity,GPL-3.0,,feature,,,,
15 |
--------------------------------------------------------------------------------
/dataset/license-changes/wikipedia_list.csv:
--------------------------------------------------------------------------------
1 | Akka,2009,2022,Apache-2.0,Business Source License
2 | ArangoDB,2011,2023,Apache-2.0,Business Source License
3 | Aseprite,2001,2016,GPL-2.0,EULA that permits personal use but forbids redistribution
4 | CockroachDB,2015,2019,Apache-2.0,Business Source License
5 | Consul,2014,2023,MPL-2.0,Business Source License
6 | Couchbase Server,2010,2021,Apache-2.0,Business Source License
7 | Couchbase Mobile,,2022,Apache-2.0,Business Source License
8 | Elasticsearch,2010,2021,Apache-2.0,"Elastic License and Server Side Public License"
9 | Emby,2014,2018,GPL-2.0,"Source code closed on December 8, 2018"
10 | FBReader,2013,2015,GPL-2.0-or-later,"Apparently the number of devs was limited, and they all agreed to relicense it"
11 | LiveCode,2013,2021,GPL-3.0-only,proprietary
12 | LiveJournal,1999,2014,GPL-2.0-or-later,The source code was made private in 2014
13 | MongoDB,2009,2018,AGPL-3.0-only,Server Side Public License
14 | Nexuiz,2005,2012,GPL-2.0-or-later,"Game abandoned in favour of a commercial video game of the same name, which licensed the Nexuiz title but is not based on its engine."
15 | OctoberCMS,2014,2021,MIT,Cited the sustainability of its open source model as a factor.
16 | OTRS,2001,2020,GPL-3.0-or-later,"Support for the Community Edition dropped on December 23, 2020"
17 | Paint.NET,2004,2007,MIT,freeware license that prohibits modification or resale
18 | PyMOL,2000,2010,MIT-CMU
19 | Reddit,2008,2017,CPAL-1.0,"Source code was made private in 2017, as the internal codebase had already diverged significantly from the public one."
20 | Redis,2009,2024,BSD-3-Clause,dual: custom license and Server Side Public License
21 | Sourcegraph,2013,2023,Apache-2.0,proprietary
22 | Terraform,2014,2023,MPL-2.0,Business Source License
23 | Tux Racer,2000,2002,GPL-2.0-or-later,"Commercial expansion by original authors, also called Tux Racer."
24 | Vagrant,2010,2023,MIT,Business Source License
25 |
--------------------------------------------------------------------------------
/dataset/releases/fork-relicense-jais-2025-04.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/releases/fork-relicense-jais-2025-04.tar.gz
--------------------------------------------------------------------------------
/dataset/taxonomies/FOSDEM_ Do we need another open source software taxonomy_ (1).pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/dataset/taxonomies/FOSDEM_ Do we need another open source software taxonomy_ (1).pdf
--------------------------------------------------------------------------------
/dataset/taxonomies/README.md:
--------------------------------------------------------------------------------
1 | # Open Source Taxonomies
2 |
3 | This repository is an experiment designed to collect existing taxonomies about open source projects as well as taxonomies used/created within research about open source ecosystems. The motivation and goals of this effort were discussed in a talk at [FOSDEM 2025](https://github.com/chaoss/wg-data-science/blob/main/dataset/taxonomies/FOSDEM_%20Do%20we%20need%20another%20open%20source%20software%20taxonomy_%20(1).pdf).
4 |
5 | As all submissions will be treated as public data - we will not accept submissions with sensitive datatypes (user data, personally identifiable data, non-public business data, health data, etc).
6 |
7 | ## Template
8 |
9 | _Data Card_
10 |
11 | | | Submission |
12 | |:-----|---------------|
13 | |**Title:**| |
14 | |**Submitter:**|(github handle)|
15 | |**Author(s):**|(github handle? email?)|
16 | |**Author affiliation(s):**|(academic institution(s), company, funder, publisher, unknown, etc)|
17 | |**Description:**|(What was this used for? E.g. survey, analysis, publication, etc, How was this taxonomy created? (i.e. human, model, ML/AI, etc)|
18 | |**Data source(s) / data type(s):**|(Was this taxonomy created from a specific data source or for a specific purpose? e.g. github events labels, internal/non-public logs, survey question etc)|
19 | |**Suggested tags for this taxonomy:**|(plain text separated by commas)|
20 | |**Link(s) to public resources that use this taxonomy:**|(e.g. published articles, webpage, blog, etc)|
21 | |**License(s):**|(if applicable)|
22 |
23 | _Taxonomy_
24 | (Preferred format is a plaintext bulleted table or list)
25 |
26 |
--------------------------------------------------------------------------------
/dataset/taxonomies/Taxonomies - repostatus.csv:
--------------------------------------------------------------------------------
1 | Data card,
2 | ,Submission
3 | Title:,repostatus
4 | Submitter:,sophia-iv
5 | Author(s):,jantman
6 | Author affiliation(s):,unknown
7 | Description:,project status taxonomy for GitHub repositories
8 | Data source(s) / data type(s):,GitHub repository badges and labels
9 | Suggested tags for this taxonomy: ,"project status, project lifecycle"
10 | Link(s) to public resources that use this taxonomy: ,https://www.repostatus.org/
11 | License(s): ,CC-BY-SA 4.0 license
12 | ,
13 | Taxonomy,
14 | Project Status,
15 | ,
16 | Concept,"Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept."
17 | WIP,"Initial development is in progress, but there has not yet been a stable, usable release suitable for the public."
18 | Suspended,"Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work."
19 | Abandoned,"Initial development has started, but there has not yet been a stable, usable release; the project has been abandoned and the author(s) do not intend on continuing development."
20 | Active,"The project has reached a stable, usable state and is being actively developed."
21 | Inactive,"The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows."
22 | Unsupported,"The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired."
23 | Moved,"The project has been moved to a new location, and the version at that location should be considered authoritative. This status should be accompanied by a new URL."
--------------------------------------------------------------------------------
/events/hackathon-june-2025.md:
--------------------------------------------------------------------------------
1 | # CHAOSS Data Science Hackathon June 2025:
2 |
3 | ## Logistics
4 | Date: June 26, 2025
5 |
6 | This event will be co-located with the [Open Source Summit North America](https://events.linuxfoundation.org/open-source-summit-north-america/) and will be held the morning before [CHAOSScon](https://chaoss.community/chaosscon-2025-na/)
7 |
8 | The details and registration for this event can now be found on the [CHAOSS Website](https://chaoss.community/chaoss-data-science-hackathon-2025/)
9 |
--------------------------------------------------------------------------------
/practitioner-guides/README.md:
--------------------------------------------------------------------------------
1 | # Practitioner Guides
2 |
3 | **Read the [Practitioner Guides](https://chaoss.community/about-chaoss-practitioner-guides/)**
4 |
5 | Practitioner Guides are designed to be used by practitioners. The goal is to help people understand how to interpret the data about an open source project to develop insights that can help improve the project health of an open source project.
6 |
7 | The Getting Started series of guides is designed for people who may not be experts in data analysis or open source.
8 |
9 | ## Audience for the Guides
10 |
11 | Open Source Program Offices (OSPOs), project leads, community managers, maintainers, and anyone who wants to better understand project health and take action on what they learn from their metrics.
12 |
13 | Note that while these guides are being developed within the Data Science WG, we are not the audience for these guides. The guides should be written with the above audience of practitioners in mind.
14 |
15 | ## Contributions
16 |
17 | We welcome contributions! If you'd like to work on a guide or propose a new guide, you can browse the [Practitioner Guide issues](https://github.com/chaoss/wg-data-science/issues?q=is%3Aissue+is%3Aopen+label%3A%22practitioner+guide%22) in the data science WG repo. We have some issues for guides that we know we want, but that have no yet been started, and you can create a new issue using our template to propose a new Practitioner Guide.
18 |
19 | A few things to keep in mind when contributing to these guides:
20 | * We have Google Docs templates that you should use when creating a new guide to make it easier for others to review and collaborate with you. We prefer for this document to be owned by the chaossproject@gmail.com account. You should link to your document from the GitHub issue.
21 | * [Getting Started Template](https://docs.google.com/document/d/1xe3KkkoBHcn9tyFaMTQ0TIfzZGkJPzBvrravp8Cdtl4/edit?usp=sharing)
22 | * [Template for more advanced guides](https://docs.google.com/document/d/1GLPUdjWA6rz1chFl0psKmtb8uniDeuGyg9UWTDgjIAI/edit?usp=sharing)
23 | * Each Getting Started Practitioner Guide should contain no more than 4 metrics. Pick the 2-4 metrics that best represent the breadth of the topic and list additional metrics in Step 3. Advanced guides have no limit for the number of metrics.
24 | * Make sure that all links to metrics and metrics models use the permanent, non-changing link in the form of `https://chaoss.community/?p=####`. These links can be found near the end of every published metric.
25 | * Please read the [Practitioner Guide: Introduction](introduction.md) before starting and link to it rather than duplicating where possible. Note that some duplication is likely required for clarity in the individual guides.
26 | * To see an example of how this template looks when completed, see the [Published Guides](https://chaoss.community/about-chaoss-practitioner-guides/).
27 |
28 |
--------------------------------------------------------------------------------
/practitioner-guides/images/active-contrib-over-time-bar-trend.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/active-contrib-over-time-bar-trend.png
--------------------------------------------------------------------------------
/practitioner-guides/images/active-organizations-over-time-by-data-source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/active-organizations-over-time-by-data-source.png
--------------------------------------------------------------------------------
/practitioner-guides/images/bus-factor-bar-balanced.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/bus-factor-bar-balanced.png
--------------------------------------------------------------------------------
/practitioner-guides/images/bus-factor-pie-one-person.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/bus-factor-pie-one-person.png
--------------------------------------------------------------------------------
/practitioner-guides/images/change_requests_abandoned.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/change_requests_abandoned.png
--------------------------------------------------------------------------------
/practitioner-guides/images/closure-ratio-falling-behind.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/closure-ratio-falling-behind.png
--------------------------------------------------------------------------------
/practitioner-guides/images/closure-ratio-summer-gap.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/closure-ratio-summer-gap.png
--------------------------------------------------------------------------------
/practitioner-guides/images/commit-activity-by-domain-unclean.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/commit-activity-by-domain-unclean.png
--------------------------------------------------------------------------------
/practitioner-guides/images/commit-activity-by-domain-vmw.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/commit-activity-by-domain-vmw.png
--------------------------------------------------------------------------------
/practitioner-guides/images/contrib-by-data-source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/contrib-by-data-source.png
--------------------------------------------------------------------------------
/practitioner-guides/images/contributor-growth-by-engagement-bar.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/contributor-growth-by-engagement-bar.png
--------------------------------------------------------------------------------
/practitioner-guides/images/issues_abandoned.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/issues_abandoned.png
--------------------------------------------------------------------------------
/practitioner-guides/images/lead-time.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/lead-time.png
--------------------------------------------------------------------------------
/practitioner-guides/images/leadership-positions-istio-before-cncf-april-2022.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/leadership-positions-istio-before-cncf-april-2022.png
--------------------------------------------------------------------------------
/practitioner-guides/images/leadership-positions-istio-graduating-june-2023.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/leadership-positions-istio-graduating-june-2023.png
--------------------------------------------------------------------------------
/practitioner-guides/images/libyears.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/libyears.png
--------------------------------------------------------------------------------
/practitioner-guides/images/ossf-badge-categories.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/ossf-badge-categories.png
--------------------------------------------------------------------------------
/practitioner-guides/images/ossf-badge-criteria-example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/ossf-badge-criteria-example.png
--------------------------------------------------------------------------------
/practitioner-guides/images/ossf-badge-curl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/ossf-badge-curl.png
--------------------------------------------------------------------------------
/practitioner-guides/images/releases.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/releases.png
--------------------------------------------------------------------------------
/practitioner-guides/images/time-to-close.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/time-to-close.png
--------------------------------------------------------------------------------
/practitioner-guides/images/time-to-first-response.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/practitioner-guides/images/time-to-first-response.png
--------------------------------------------------------------------------------
/practitioner-guides/introduction.md:
--------------------------------------------------------------------------------
1 | # Practitioner Guide: Introduction - Things to Think about When Interpreting Metrics
2 |
3 | * Related Metrics: [All](https://chaoss.community/kb-metrics-and-metrics-models/)
4 | * Audience: Practitioner Guides are designed to be used by practitioners who may not be experts in data analysis and who want to better understand how to interpret the data about an open source project to develop insights that can help them improve the project health of an open source project. These guides will be especially useful for Open Source Program Offices (OSPOs), project leads, community managers, maintainers, and anyone who wants to better understand project health and take action on what they learn from their metrics.
5 |
6 | Measuring project health is complex with so many potential aspects to consider (Linåker et al. 2022). The Practitioner Guide series is designed to break project health down into a series of logical topics that you can use to assess and improve the health of your open source projects. This Introduction Guide is designed to get you thinking about what you might want to measure and how to measure it along with some general tips and cautions. It is meant to complement the Practitioner Guide series, which is where you will find the details about how to draw insights about specific topics, like Responsiveness, Contributor Sustainability, Organizational Participation, and more.
7 |
8 | There is no one size fits all approach to using metrics to measure project health. Every open source project is a little different, and metrics should always be interpreted with the needs of that project and its context taken into account (Goggins et al. 2021). Small projects will have different needs than large projects. An open source operating system project will have very different characteristics than a project that produces a small package or library. Different communities will have different ways of working to produce their open source software projects. Projects have different methods of publishing releases. Projects and the people who contribute to them will have different needs and goals.
9 |
10 | One of the best places to start isn’t actually with the metrics, but by spending some time understanding the overall goals for the project. If the project is primarily driven by one organization or owned by an organization, you should also consider the goals for that organization. By thinking strategically about the overall goals, you’ll be in a better place to decide what you need to measure to determine whether the project is achieving its goals. Open source projects generate a tsunami of data that can be overwhelming, but by focusing on the goals, you can develop a [metrics strategy](https://community.linuxfoundation.org/events/details/lfhq-todo-group-ospology-presents-ways-to-define-a-metrics-strategy-in-your-ospo/) that helps you focus on the metrics that matter most for a particular project.
11 |
12 | All of this and more will have an impact on the interpretation of any open source metrics. The real experts are the people who are involved in the day to day work on a project. In addition to focusing on the goals, you might also need to spend some time looking at trends related to who participates in the community and how they participate to get an overall feel for the project and who you might want to reach out to for more details. You need to involve key people from the project / community that you are measuring, since they can help you ethically interpret the metrics and any trends identified in ways that make the most sense for that particular project (Casari et al. 2023) as described in more detail in the "Step 2: Diagnosis" section. If you haven’t already read [Beyond the Repository](https://cacm.acm.org/magazines/2023/10/276630-beyond-the-repository/fulltext) by Amanda Casari, Julia Ferraioli, and Juniper Lovato, I recommend pausing and reading this 6-page article now.
13 |
14 | Within the CHAOSS project, we have [software (Augur and GrimoireLab)](https://chaoss.community/software/) that can be used to gather data and identify trends in a neutral way that can be easily audited and tracked over time. However, these Practitioner Guides don’t assume that you are using any particular piece of software, since you may have other metrics tools that work for your particular situation. Regardless of how you are gathering metrics, these guides help you interpret those metrics to draw meaningful and actionable insights to help improve project health.
15 |
16 | # Step 1: Identify Trends
17 |
18 | Metrics for open source projects can be noisy, with many data points generated by the many activities within a project. One way to cut through this noise is to focus on the trends over time. Rather than looking at what happened yesterday or last week, it can help to start by aggregating your data by month and looking at whether some aspect of your community is improving, staying steady, or declining over the past 3 to 6 months. You can drill down later into the data for a specific day or week to help understand what you see. By looking at the big-picture trends, you can avoid over-correcting or worrying too much about day to day fluctuations.
19 |
20 | # Step 2: Diagnosis
21 |
22 | The first action of diagnosing problems or identifying opportunities for improvement is to talk to the people intimately involved in the project. Show them the data and ask what might be causing the issues. Project leaders and community members might not know what is causing the problems, so each guide should have some tips for exploring areas and potential ideas for where to look and how to diagnose specific issues or find opportunities to improve.
23 |
24 | When deciding if something is a problem or concern that needs to be addressed, the first question is whether the issue might be a temporary fluctuation instead of a real problem. What else is happening in your community, project, and ecosystem? Was there a big conference, major release, vacation season, or other things that impacted people's time to make contributions? It can help to overlay these types of milestones on a graph to understand their impact better, and if it looks like there is an impact, wait a month or two and see if the metric(s) rebound after the temporary disruption. As mentioned earlier, this is why it’s so important to have people involved in the project daily, helping to interpret what you are seeing in the metrics.
25 |
26 | An excellent example of a temporary fluctuation is when there are downward trends in July / August and December / January if you have a lot of contributors in places where these are vacation or holiday seasons. A downward trend shows that people are taking some time off to rest and recharge, which is likely a positive sign for the long-term sustainability of your project, instead of a problem.
27 |
28 | If you’ve decided that the problem will likely be ongoing and not temporary, then it’s time to start thinking about what might be causing the issue. This will likely be metric-specific and will be covered in more detail in the Practitioner Guides for specific topics.
29 |
30 | # Step 3: Gather Additional Data if Needed
31 |
32 | At this point, if you know what you need to improve and how to improve it, you can skip this step for now. You can always return to it if you make changes but don’t see any improvements over the next few months.
33 |
34 | In other cases, you should look into an area in more detail before deciding what improvement actions to make. The Practitioner Guides for specific topics will include additional metrics that can be used to gather additional data to diagnose specific problems.
35 |
36 | # Step 4: Make Improvements
37 |
38 | It is important for this step to have buy-in from the community and project leadership before you start taking action toward making improvements. Not having support from the project could lead to changes being ineffective, disruptive, or even damaging for the project and the people contributing to it.
39 |
40 | Open source projects, communities, and ecosystems are complex; changes you make in one area might impact other parts of the project. Many people working on open source projects are likely to be busy with little time for additional work, so it’s important not to overload people to the point of burnout. For these reasons, it is usually best to focus on no more than 2 or 3 improvement actions to make at one time.
41 |
42 | Like with the other steps, the Practitioner Guides for specific topics will include more details on how to make improvements for that topic.
43 |
44 | # Step 5: Monitor Results
45 |
46 | An important step toward knowing whether your actions to improve a topic have been effective is to continue measuring and then monitor those results. At a minimum, you’ll probably want to monitor it for 2 or 3 months (more for complex changes) before deciding whether your actions are starting to be effective. Remember that if anything happens that might cause temporary fluctuations, you’ll want to increase that timeframe.
47 |
48 | You should also continue to monitor it over the long-term to see if your improvements continue to have an impact. A frequent pattern is that improvements tend to continue while people are focused on them but then can backslide if people fall into old patterns and stop making improvements. You might find yourself cycling through these steps to renew people’s interest and continue making improvements.
49 |
50 | # Cautions and Considerations
51 |
52 | * When interpreting metrics and making improvements in your open source project, you should always think first about the people involved in your project and how these changes might impact them (positively and negatively).
53 | * Always have the people working on the project involved in gathering and interpreting the metrics and in any potential improvement actions you might make.
54 | * Every project is a little different, so it’s essential to interpret the metrics in light of a project’s individual needs and ways of operating.
55 | * Avoid using metrics to compare projects against each other when possible, but if you need to compare projects, make sure that you are only comparing projects with similar characteristics. Just a few of the many examples include, Javascript projects will have very different characteristics and patterns than C / C++ projects; foundation-owned projects will be different from projects driven out of corporations; and a project the size of Kubernetes will be nothing like a project producing a small library or package.
56 | * Be careful never to set yourself up for people to weaponize your metrics, and be very careful with metrics that can be used to compare people against each other in ways that might result in the punishment of individuals.
57 | * Remember that automation and bot activity can influence the interpretation of many metrics, so it’s essential to understand how automation and bots might influence your results.
58 |
59 | # Additional Reading
60 |
61 | * [Practitioner Guide series](https://chaoss.community/about-chaoss-practitioner-guides/) with guides to help you improve responsiveness, contributor sustainability, organizational participation, and more.
62 | * A short video about each guide can be found in the [Practitioner Guides Playlist](https://www.youtube.com/playlist?list=PL60k37cxI-HSHV4-rEsWMzExw2y2Oq79Z) in the CHAOSS YouTube channel.
63 | * An episode of the [SustainOSS podcast](https://podcast.sustainoss.org/243) about the Practitioner Guide series.
64 | * For more cautions, considerations, and best practices, please read [Beyond the Repository](https://cacm.acm.org/magazines/2023/10/276630-beyond-the-repository/fulltext) by Amanda Casari, Julia Ferraioli, and Juniper Lovato.
65 | * Video of a panel related to developing a [metrics strategy for your OSPO](https://community.linuxfoundation.org/events/details/lfhq-todo-group-ospology-presents-ways-to-define-a-metrics-strategy-in-your-ospo/) and a blog post about [building an open source strategy](https://blogs.vmware.com/opensource/2020/03/03/open-source-strategy/) and using metrics to determine success.
66 | * [CHAOSS Software](https://chaoss.community/software/)
67 |
68 | # Feedback
69 |
70 | We would love to have feedback to learn more about how people are using the CHAOSS Practitioner Guides and how we can improve them over time. Please complete this [short survey](https://forms.gle/w3B1gBH8kp3rPbhr8) to provide your feedback.
71 |
72 | # Contributors
73 |
74 | The following people contributed to this guide:
75 |
76 | * Dawn Foster
77 | * Chan Voong
78 | * Luis Cañas Díaz
79 |
80 | # References
81 |
82 | * Casari, A., Ferraioli, J., & Lovato, J. (2023). [Beyond the repository: Best practices for open source ecosystems researchers](https://cacm.acm.org/practice/beyond-the-repository/). Queue, 21(2), 14-34.
83 | * Goggins, S. P., Germonprez, M., & Lumbard, K. (2021). [Making open source project health transparent](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9504501). Computer, 54(8), 104-111.
84 | * Linåker, J., Papatheocharous, E., & Olsson, T. (2022, September). [How to characterize the health of an Open Source Software project? A snowball literature review of an emerging practice](https://dl.acm.org/doi/pdf/10.1145/3555051.3555067). In Proceedings of the 18th International Symposium on Open Collaboration (pp. 1-12).
85 |
86 | CHAOSS Practitioner Guides are MIT licensed, living documents, and we welcome your feedback and input. You can suggest edits to this document at https://github.com/chaoss/wg-data-science/blob/main/practitioner-guides/introduction.md
87 |
--------------------------------------------------------------------------------
/practitioner-guides/sunset.md:
--------------------------------------------------------------------------------
1 | # Practitioner Guide: Getting Started with Sunsetting an Open Source Project
2 |
3 | Primary metrics:
4 |
5 | * [Change Requests](https://chaoss.community/?p=3610)
6 | * [Issues New](https://chaoss.community/?p=3634)
7 | * [Technical Forks](https://chaoss.community/?p=3431)
8 |
9 | If you haven’t already read the [Insight Guide: Introduction - Things to Think about When Interpreting Metrics](https://chaoss.community/practitioner-guide-introduction/), please pause now and read that guide.
10 |
11 | Many open source projects, even widely used ones, become abandoned for a variety of reasons (e.g., evolving interests, family situations, employment changes), but abandonment can be done in a responsible way by proactively sunsetting the project (Miller et al. 2025). Sunsetting is an important consideration for corporate environments where it can be easy to lose track of projects that were created by employees who later walked away from the project and left if abandoned. You don’t want abandoned open source projects with security vulnerabilities sitting in your organization’s source code repositories where someone might trust that project simply because they trust your organization. Finding inactive projects and responsibly sunsetting them is a good business decision and something that many open source teams / Open Source Program Offices (OSPOs) do on a regular basis.
12 |
13 | It’s important to remember that not every open source project can or should exist forever: technologies evolve, corporate priorities change, and people’s interests change. Part of the beauty of open source is that we work in the open as we innovate, and some of those innovative projects will stand the test of time, while others should be responsibly deprecated via a sunset process. Sunsetting an open source project should take your user’s needs into account, and where possible, offer users time to migrate to a replacement technology. At a minimum, it’s important to signal that the project will no longer be maintained, updated, or have security patches so that users know that they should no longer be using the project.
14 |
15 | The most important step you can take when sunsetting a project is to be transparent at every step of the way. Thus, being open about your intentions and ensuring trust for future open source work.
16 |
17 | # Step 1: Identify Trends
18 |
19 | There are several metrics that can help you identify whether there is any activity or usage for a project to make decisions about responsibly sunsetting a project. Looking at [Change Requests](https://chaoss.community/?p=3610) (aka Pull Requests / Merge Requests) and [New Issues](https://chaoss.community/?p=3634) can be a good start when looking at whether there are still people using and contributing to a project. Another good metric is [Technical Forks](https://chaoss.community/?p=3431), which tend to be an indication of usage and potential contribution.
20 |
21 |
22 | ## Change Requests
23 |
24 | The Change Requests metric can help you understand whether people are trying to contribute to your project, which signals whether there is interest in your project. It helps to look at both closed and open requests, since closed change requests indicate that a project might still be maintained, while open change requests that are not being resolved can indicate that a project might be abandoned. If there are no recently merged change requests, then it’s also likely that there have not been any security updates.
25 |
26 | 
27 |
28 | ## New Issues
29 |
30 | Many new issues are created when a user finds a bug, has a question, or wants a new feature, so new issues may be filed because people are using your project or are otherwise interested in your project. When there are few to no issues created over a period of some time, then it might indicate that the project has been abandoned.
31 |
32 | 
33 |
34 | ## Technical Forks
35 |
36 | These are the forks that people create when they are using your project or planning to contribute to it, but it can help to look beyond the numbers of forks to see who has forked your project and whether they are continuing to update their fork. Recently updated forks can indicate usage, and by gathering data about who has forked your project, you might also be able to reach out to some of these people to ask if they are still using it before you decide whether, or how, to sunset it.
37 |
38 | # Step 2: Diagnosis
39 |
40 | If you are part of an organization who is auditing their repositories to find projects that seem to be abandoned, you should start by talking to the people who were involved in the project so that you can better understand whether the project is truly abandoned, and if so, why. This will likely require looking at the most recent change requests to see who made the last few changes to the project so that you can reach out to them. In some cases, a project might not need to be updated regularly and might seem to be abandoned when it has simply stabilized and might still be widely used. For example, a small library that performs some complex mathematical function might not need to be updated after it is written, assuming that it doesn’t have dependencies that need to be updated. If the project is primarily distributed via package managers, usage metrics from those sources should also be considered.
41 |
42 | If the project has truly been abandoned, then it can help to understand why it was abandoned and whether anyone might still be using it before you decide to sunset it. This is where the technical fork data can be useful to see if other people have forked your project and are continuing to update their fork, which might indicate usage of your project. [Criticality Score](https://github.com/ossf/criticality_score), an OpenSSF project, can also shed light on usage, since it also calculates the number of projects that depend on your project. There is a [Python script](https://github.com/geekygirldawn/project-api-metrics/blob/main/scripts/sunset.py) called that uses criticality score and the GitHub API fork data that can be used to gather and analyze this data.
43 |
44 | # Step 3: Gather Additional Data if Needed
45 |
46 | CHAOSS has other metrics to understand project activity and usage that can be helpful when deciding whether to sunset a project.
47 |
48 | Additional Metrics:
49 |
50 | * [Collaboration Platform Activity](https://chaoss.community/?p=3484)
51 | * [Clones](https://chaoss.community/?p=3429)
52 | * [Code Changes Commits](https://chaoss.community/?p=4707)
53 | * [Release Frequency](https://chaoss.community/?p=4765)
54 | * [Project Popularity](https://chaoss.community/?p=3573)
55 | * [Criticality Score](https://github.com/ossf/criticality_score) (an OpenSSF tool, not a CHAOSS metric)
56 | * Package Manager usage metrics
57 |
58 | # Step 4: Make the Change
59 |
60 | After you have completed your diagnosis and have decided to sunset a project, there are several things you can do to ensure that you are sunsetting the project in a responsible manner.
61 |
62 | **Communication**
63 |
64 | Communication should start with any existing contributors to make sure that they agree with this decision. In some cases, there may be contributors who would like to continue the project, instead of sunsetting it.
65 |
66 | When you have agreement to sunset the project, then this needs to be communicated to any existing users, including any other internal teams who may be using the project. This communication should be done in a transparent manner by being clear about the reasons for sunsetting it. There are quite a few places that this should be communicated and documented:
67 |
68 | * README: Clearly state at the top the README that the project has been deprecated and will no longer be updated. If possible, suggest alternate projects that provide similar functionality.
69 | * Project communication channels: This may include Slack, mailing lists, forums, social media, and any other channels used for communication within the project.
70 | * Documentation: Project documentation should also be updated. This may include wikis, websites, and other project documentation.
71 | * Package managers / distribution channels: Since most projects are distributed via package managers (e.g., npm, PyPI, RubyGems), those should also be updated with a deprecation warning that clearly states that the project is no longer being maintained or updated.
72 | * Additional channels: If this is a corporate project, marketing and other internal teams should also be notified.
73 | * Users: Known users of the project should also be notified.
74 |
75 | **Archival**
76 |
77 | The project should be officially archived using something like GitHub’s archival method. It might also be a good idea to keep these projects in a separate location to make it clear that these projects have reached the end of their life. For example, VMware has a separate ‘vmware-archive’ organization, and Apache has something similar called the ‘Apache Attic’.
78 |
79 | Taking the additional step of adding your code to the [Software Heritage](https://www.softwareheritage.org/) archive can help preserve it over time. This is especially true if you are self-hosting your source code repositories, but it can also help archived code outlive potential platform changes and make it easier for people to find in the future.
80 |
81 | Keep in mind that projects can always be unarchived if you change your mind later. Stressing this fact can often make it easier to get people to agree to the sunset process.
82 |
83 | **Special Case: Sunsetting Active Projects**
84 |
85 | Unfortunately, even active projects may need to be sunsetted, which can be much more difficult. This can happen when a project is being maintained entirely by a company, and that company has a shift in strategy and decides that they no longer wish to staff or maintain a project that is being widely used. There are a number of additional considerations in this case that should happen before the project is archived:
86 |
87 | * Public Relations (PR): Archiving an active project can be a tricky situation that might result in negative press as soon as you start talking to people about your desire to sunset the project, so before talking to anyone outside of your company, you want to get your PR team involved so that they can be ready to handle any queries.
88 | * Option to continue under new ownership: Determine if there are other contributors or other organizations who would be willing and able to take over new development and / or maintenance of the project.
89 | * Sunset plan: It can help to create a detailed plan that outlines all of the steps that need to be taken along with a timeframe for when the project will be sunsetted.
90 | * Timing: A responsible sunset plan will give users time to migrate to another solution before you stop making security updates and releases. 6 months is a good starting point.
91 | * Customers, users, and communication: Careful communication is required to communicate this decision along with the timing to any existing customers and users.
92 |
93 | # Cautions and Considerations
94 |
95 | * It is worth taking some extra time to talk to contributors and make sure that the decision to sunset a project is the right decision before you do it.
96 | * Be as transparent as possible about the decision to sunset a project and why you made that decision.
97 | * Sunsetting a project is not an indication of failure and should not be positioned as such. Projects have life cycles; they endure for as long as they are needed and then they should be responsibly deprecated when they are no longer needed.
98 | * Consider providing transition details, and if possible, tooling that helps your existing users transition to an alternative solution if a reasonable one is available.
99 | * As with all of the CHAOSS Practitioner Getting Started Guides, these materials are designed to help you get started when sunsetting a project, but it is not a comprehensive guide with everything you might need to know for every situation.
100 |
101 | # Additional Reading
102 |
103 | * Stefka Dimitrova on When and How to Deprecate an Open Source Project ([Part 1](https://blogs.vmware.com/opensource/2022/09/29/when-and-how-to-deprecate-an-open-source-project/) and [Part 2](https://blogs.vmware.com/opensource/2023/05/17/deprecating-an-open-source-project-part-2/)) along with a video from the Open Source Summit in Europe in 2022 about [Simple Steps for a Calm “Sunset”](https://www.youtube.com/watch?v=OdpkMkoKtDY). Much of the content for this guide is based on these materials.
104 | * [GitHub’s Do’s and Don'ts When Sunsetting Open Source Projects](https://github.blog/open-source/maintainers/dos-and-donts-when-sunsetting-open-source-projects/)
105 | * [TODO Group Guide: Shutting Down an Open Source Project](https://todogroup.org/resources/guides/shutting-down-an-open-source-project/)
106 | * [Allen Friedman on End of Life and End of Support Across the Ecosystem](https://www.youtube.com/watch?v=ZgWwiKLB6hE) (video)
107 | * [10 Quick tips for making software outlive your job (white paper)](https://arxiv.org/abs/2505.06484)
108 |
109 | # Contributors
110 |
111 | The following people contributed to this guide:
112 |
113 | * Dawn Foster
114 | * Ria Schalnat
115 | * Damián Vicino
116 | * Matt Germonprez
117 | * Elizabeth Barron
118 | * Tara Tarakiyee
119 |
120 | # References
121 |
122 | * Miller, C., Jahanshahi, M., Mockus, A., Vasilescu, B., & Kästner, C. (2025). [Understanding the response to open-source dependency abandonment in the npm ecosystem](http://www.cs.cmu.edu/~ckaestne/pdf/icse25_abandonment.pdf). In *Int’l Conf. Software Engineering (ICSE), IEEE/ACM*.
123 |
124 | CHAOSS Practitioner Guides are MIT licensed, living documents, and we welcome your feedback and input. You can suggest edits to this document at [https://github.com/chaoss/wg-data-science/blob/main/practitioner-guides/sunset.md](https://github.com/chaoss/wg-data-science/blob/main/practitioner-guides/sunset.md)
--------------------------------------------------------------------------------
/practitioner-guides/website-landing.md:
--------------------------------------------------------------------------------
1 | Practitioner Guides are designed to be used by practitioners. The goal is to help you understand how to interpret the data about an open source project to develop insights that can help you improve the project health of an open source project.
2 |
3 | The Getting Started series of guides is designed for people who may not be experts in data analysis or open source.
4 |
5 | These guides will be especially useful for Open Source Program Offices (OSPOs), project leads, community managers, maintainers, and anyone who wants to better understand project health and take action on what they learn from their metrics.
6 |
7 | **Guides**
8 |
9 | * [Practitioner Guide: Introduction - Things to Think about When Interpreting Metrics](https://chaoss.community/practitioner-guide-introduction/) <- Please start here
10 | * [Practitioner Guide: Getting Started with Contributor Sustainability](https://chaoss.community/practitioner-guide-contributor-sustainability/)
11 | * [Practitioner Guide: Getting Started with Responsiveness](https://chaoss.community/practitioner-guide-responsiveness/)
12 | * [Practitioner Guide: Getting Started with Organizational Participation](https://chaoss.community/practitioner-guide-organizational-participation/)
13 | * [Practitioner Guide: Getting Started with Security](https://chaoss.community/practitioner-guide-security/)
14 | * [Practitioner Guide: Getting Started with Sunsetting an Open Source Project](https://chaoss.community/practitioner-guide-sunset)
15 | * [Practitioner Guide: Getting Started with Building Diverse Leadership](https://chaoss.community/practitioner-guide-diverse-leadership)
16 |
17 | Short videos about the guides can be found in the [Practitioner Guides Playlist](https://www.youtube.com/playlist?list=PL60k37cxI-HSHV4-rEsWMzExw2y2Oq79Z) in the CHAOSS YouTube channel.
18 |
19 | We also have more guides coming soon. You can see the work in progress guides, contribute to them, and propose new guides in the [Practitioner Guides](https://github.com/chaoss/wg-data-science/tree/main/practitioner-guides) section of the CHAOSS Data Science Working Group repository. These guides are under the MIT License, so you can use them in a variety of ways that best meet your needs, including copying and modifying them for your organization.
20 |
--------------------------------------------------------------------------------
/publications/Foster-OFA-New-Dynamics-Open-Source-Relicensing-Forks-Community-Impact-2024.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chaoss/wg-data-science/c4db8382b35553a260654b917256e84fc3ffeab6/publications/Foster-OFA-New-Dynamics-Open-Source-Relicensing-Forks-Community-Impact-2024.pdf
--------------------------------------------------------------------------------
/publications/README.md:
--------------------------------------------------------------------------------
1 | # Publications
2 |
3 | These publications came out of research conducted by the [CHAOSS Data Science Working Group](https://github.com/chaoss/wg-data-science). Please see our [Publication Guidelines](publication-guidelines.md) for more details.
4 |
5 | **[The New Dynamics of Open Source: Relicensing, Forks, and Community Impact](https://github.com/chaoss/wg-data-science/blob/main/publications/Foster-OFA-New-Dynamics-Open-Source-Relicensing-Forks-Community-Impact-2024.pdf) (PDF paper link)**
6 |
7 | * **Citation:** Foster, D. (2024, November 13-14). The New Dynamics of Open Source: Relicensing, Forks, and Community Impact. OpenForum Academy Symposium 2024, Boston, Massachusetts, Available at https://doi.org/10.48550/arXiv.2411.04739.
8 | * [Paper](https://github.com/chaoss/wg-data-science/blob/main/publications/Foster-OFA-New-Dynamics-Open-Source-Relicensing-Forks-Community-Impact-2024.pdf) and [Presentation Slides](https://fastwonderblog.com/wp-content/uploads/2024/11/Dawn-Foster-New-Dynamics-Relicensing-Forks-OFA-2024.pdf)
9 | * [Data files](https://github.com/chaoss/wg-data-science/releases/tag/v1.0-OFA-2024) released for replication, validation, and / or exploration.
10 | * [Ongoing research into relicensing and forks](https://github.com/chaoss/wg-data-science/tree/main/dataset/license-changes/fork-case-study) - We plan to release a report with additional data, and contributions are welcome.
11 |
12 |
--------------------------------------------------------------------------------
/publications/publication-guidelines.md:
--------------------------------------------------------------------------------
1 | # CHAOSS Data Science Publication Guidelines
2 |
3 | One goal of the CHAOSS Data Science Working Group (WG) is to publish the research and outcomes from our projects so that people can learn from our findings while also raising the visibility of CHAOSS and this WG. Our current list of projects can be found as [Issues labelled with "Project"](https://github.com/chaoss/wg-data-science/issues?q=is%3Aissue%20state%3Aopen%20label%3Aproject) in the Data Science WG repository, and we have an issue template that anyone can use to propose a new project.
4 |
5 | Open source program offices, community managers, and other people working in corporate open source environments have been the biggest audiences for CHAOSS tools and metrics, so to make it easier for these people to consume our research, we are focusing on producing reports that are in the style of white papers, like what you can find published by [Linux Foundation Research](https://www.linuxfoundation.org/research). Ideally, we would work with organizations including LF Research to release our reports. However, not everything we publish will be appropriate for an LF Research report, so alternatively, some will be published as reports on the CHAOSS site. Good reports to model ours after are the LF / Harvard [Census II](https://www.linuxfoundation.org/hubfs/Research%20Reports/lfr_harvard_censusII_mar2022_042824b.pdf?hsLang=en) and [Census III](https://www.linuxfoundation.org/hubfs/LF%20Research/lfr_censusiii_120424a.pdf?hsLang=en) reports, since they are well-researched, and presented in a style that is accessible for a wide audience (note that we expect our reports to be much shorter).
6 |
7 | From a CHAOSS Data Science WG perspective, these reports are not intended to be academic publications; however, the people leading the research for a report are welcome (and encouraged) to submit variations of the research into academic publications. One of the reasons we chose the white paper / LF research report format is because we can quickly get our research published, since the academic publishing pipeline is often measured in years, so it is important to note that we will **not** delay the publishing of our reports, so academic publications will almost always come after CHAOSS has published a report. We also require that you provide appropriate credit via an acknowledgements section to all of the other people who have contributed to the research and acknowledge the CHAOSS project in any additional publications.
8 |
9 | Style Guidelines:
10 | * [The LF Research Style Guide](https://docs.google.com/document/d/1pgq7EQtlo2syaRuEZaD1Rco1bEWmqC46gSX2ryZJNLE/edit?tab=t.0)
11 | * Please use APA style and footnotes for all references.
12 |
--------------------------------------------------------------------------------