├── requirements.txt
├── notebooks
    ├── output
    │   └── README.md
    ├── README.md
    ├── org_info.ipynb
    └── CodeOfConductBug.ipynb
├── scripts
    ├── output
    │   └── README.md
    ├── running_scripts.ipynb
    ├── filter_keyword_by_org.py
    ├── org_access_audit.py
    ├── repo_activity_REST.py
    ├── mystery_orgs.py
    ├── monitoring.py
    ├── keyword_by_repo.py
    ├── README.md
    ├── common_functions.py
    ├── inclusivity_check.py
    ├── pr_activity.py
    ├── repo_activity.py
    ├── repo_activity_coc.py
    ├── commits_people.py
    └── sunset.py
├── MAINTAINERS.md
├── setup-venv.sh
├── NOTICE
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── .gitignore
└── CODE-OF-CONDUCT.md


/requirements.txt:
--------------------------------------------------------------------------------
1 | requests
2 | pandas
3 | 


--------------------------------------------------------------------------------
/notebooks/output/README.md:
--------------------------------------------------------------------------------
1 | Output files from these scripts will be stored as csv files, and if this directory does not exist, the scripts will fail.
2 | 
3 | The csv files in this directory will be ignored by git (.gitignore)
4 | 


--------------------------------------------------------------------------------
/scripts/output/README.md:
--------------------------------------------------------------------------------
1 | Output files from these scripts will be stored as csv files, and
2 | if this directory does not exist, the scripts will fail.
3 | 
4 | The csv files in this directory will be ignored by git (.gitignore)
5 | 


--------------------------------------------------------------------------------
/MAINTAINERS.md:
--------------------------------------------------------------------------------
1 | # Maintainers
2 | 
3 | | Maintainer | GitHub ID | Affiliation |
4 | | --------------- | --------- | ----------- |
5 | | Dawn Foster | [geekygirldawn](https://github.com/geekygirldawn) | [CHAOSS Project](https://github.com/chaoss) |
6 | 


--------------------------------------------------------------------------------
/setup-venv.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | 
3 | # uncomment for more output
4 | # set -x
5 | python3 -m venv .venv
6 | echo "To activate virtual env: \"source .venv/bin/activate\""
7 | echo "To install dependencies: \"pip install -r requirements.txt"
8 | 
9 | 


--------------------------------------------------------------------------------
/NOTICE:
--------------------------------------------------------------------------------
1 | Project API Metrics
2 | Copyright 2021 VMware, Inc.
3 | 
4 | This product is licensed to you under the BSD-2 license (the "License"). You may not use this product except in compliance with the BSD-2 License.  
5 | 
6 | This product may include a number of subcomponents with separate copyright notices and license terms. Your use of these subcomponents is subject to the terms and conditions of the subcomponent's license, as noted in the LICENSE file. 
7 | 
8 | 


--------------------------------------------------------------------------------
/notebooks/README.md:
--------------------------------------------------------------------------------
 1 | # Project API Metrics Jupyter Notebooks
 2 | 
 3 | This is where I experiment with ideas, troubleshoot issues,
 4 | or explore data.
 5 | 
 6 | Some of these notebooks have also been converted into scripts
 7 | in the [scripts](../scripts) directory.
 8 | 
 9 | The advantage of the notebooks is that you can have a closer look 
10 | at the dataframes and other output to poke around and customize
11 | them for your needs.
12 | 
13 | These notebooks might also be useful for people who are more
14 | comfortable using Jupyter notebooks than running scripts.
15 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Project Api Metrics
 2 | 
 3 | This repo contains a few Python scripts and Jupyter notebooks that query the
 4 | GitHub API to gather metrics related to project health and other activities.
 5 | 
 6 | I am also using this repo as I learn how to use the GitHub GraphQL API.
 7 | 
 8 | The [scripts](scripts/) directory contains the scripts and more information
 9 | about how to run them.
10 | 
11 | The [notebooks](notebooks/) directory contains Jupyter notebooks that allow
12 | you to explore the data gathered from the API.
13 | 
14 | ## Contributing
15 | 
16 | I welcome any suggestions via issues or pull requests! Please have a look
17 | at the [CONTRIBUTING.md](CONTRIBUTING.md) document for more details.
18 | 
19 | Participation in this project is subject to the 
20 | [Code of Conduct](CODE-OF-CONDUCT.md)
21 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Project API Metrics
 2 | Copyright 2021 VMware, Inc.
 3 | 
 4 | The BSD-2 license (the "License") set forth below applies to all parts of the Project API Metrics project. You may not use this file except in compliance with the License.
 5 | 
 6 | BSD-2 License 
 7 | 
 8 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 9 | 
10 | Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
11 | 
12 | Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
13 | 
14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
15 | 
16 | 
17 | 
18 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to Project API Metrics
 2 | 
 3 | Welcome to Project API Metrics! We're happy to have you here,
 4 | and we're always looking for ways to improve these scripts.
 5 | 
 6 | As you get started, you are in a great position provide feedback 
 7 | about whether these scripts work as expected in your environment.
 8 | 
 9 | If anything doesn't make sense, or doesn't work when you run it, please open a
10 | bug report and let us know!
11 | 
12 | ## Contribution Flow
13 | 
14 | This is a rough outline of what a contributor's workflow looks like:
15 | 
16 | - Create a topic branch from where you want to base your work
17 | - Make commits of logical units
18 | - Make sure your commit messages are in the proper format (see below)
19 | - Push your changes to a topic branch in your fork of the repository
20 | - Submit a pull request
21 | 
22 | GitHub has more documentation about the [GitHub Workflow](https://docs.github.com/en/get-started/quickstart/github-flow)
23 | 
24 | ## Code Style
25 | 
26 | ### Formatting Commit Messages
27 | 
28 | We follow the conventions on [How to Write a Git Commit Message](http://chris.beams.io/posts/git-commit/).
29 | 
30 | Be sure to include any related GitHub issue references in the commit message.  See
31 | [GitHub Flavored Markdown syntax](https://guides.github.com/features/mastering-markdown/#GitHub-flavored-markdown) for referencing issues
32 | and commits.
33 | 
34 | ## Reporting Bugs and Creating Issues
35 | 
36 | When opening a new issue, try to roughly follow the commit message format conventions above.
37 | 


--------------------------------------------------------------------------------
/scripts/running_scripts.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "id": "6bf81bdd",
 6 |    "metadata": {},
 7 |    "source": [
 8 |     "# Overview\n",
 9 |     "\n",
10 |     "This notebook is used to make it easy for someone to run the scripts if they are familiar with Jupyter Notebooks, but less familiar with navigating on the command line.\n",
11 |     "\n",
12 |     "Prerequisite for all scripts:\n",
13 |     "* Python environment with Jupyter Notebook installed.\n",
14 |     "  If you don't already have something, Anaconda is a\n",
15 |     "  popular choice.\n",
16 |     "* Pandas installed (if you are using Anaconda, this is probably included already)\n",
17 |     "* [PyGithub](https://pygithub.readthedocs.io/en/latest/introduction.html) installed:\n",
18 |     "  ```pip install PyGithub```\n",
19 |     "* [GitHub Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) created and saved into a file called 'gh_key' in this scripts directory."
20 |    ]
21 |   },
22 |   {
23 |    "cell_type": "markdown",
24 |    "id": "92e6b76c",
25 |    "metadata": {},
26 |    "source": [
27 |     "# repo_activity_coc.py\n",
28 |     "\n",
29 |     "As input, this script requires a file named 'orgs.txt' containing\n",
30 |     "the name of one GitHub org per line residing in the same folder \n",
31 |     "as this script."
32 |    ]
33 |   },
34 |   {
35 |    "cell_type": "code",
36 |    "execution_count": null,
37 |    "id": "53de1338",
38 |    "metadata": {},
39 |    "outputs": [],
40 |    "source": [
41 |     "# Test that your orgs.txt file exists and contains the data you expect\n",
42 |     "f = open('orgs.txt', 'r')\n",
43 |     "print(f.read())"
44 |    ]
45 |   },
46 |   {
47 |    "cell_type": "code",
48 |    "execution_count": null,
49 |    "id": "ab54e818",
50 |    "metadata": {},
51 |    "outputs": [],
52 |    "source": [
53 |     "%run repo_activity_coc.py"
54 |    ]
55 |   }
56 |  ],
57 |  "metadata": {
58 |   "kernelspec": {
59 |    "display_name": "Python 3 (ipykernel)",
60 |    "language": "python",
61 |    "name": "python3"
62 |   },
63 |   "language_info": {
64 |    "codemirror_mode": {
65 |     "name": "ipython",
66 |     "version": 3
67 |    },
68 |    "file_extension": ".py",
69 |    "mimetype": "text/x-python",
70 |    "name": "python",
71 |    "nbconvert_exporter": "python",
72 |    "pygments_lexer": "ipython3",
73 |    "version": "3.10.9"
74 |   }
75 |  },
76 |  "nbformat": 4,
77 |  "nbformat_minor": 5
78 | }
79 | 


--------------------------------------------------------------------------------
/scripts/filter_keyword_by_org.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | # Copyright 2022 VMware, Inc.
 4 | # SPDX-License-Identifier: BSD-2-Clause
 5 | # Author: Dawn M. Foster <fosterd@vmware.com>
 6 | 
 7 | """ Filter results obtained from keyword_by_repo.py to a subset of orgs
 8 | 
 9 | This script uses the results from a keyword search and filters it based
10 | on a list of GitHub organizations. 
11 | 
12 | As input, this script requires a file generated by the keyword_by_repo.py
13 | script. This is provided via a command line argument.
14 | Example: filter_keyword_by_org.py /path/to/keyword_search_2022-07-26.csv
15 | 
16 | If a command line argument is not specified, you will be prompted for this
17 | file.
18 | 
19 | As input, this script requires a file named 'orgs.txt' containing
20 | the name of one GitHub org per line residing in the same folder 
21 | as this script.
22 | 
23 | As output:
24 | * the script creates a csv file stored in an subdirectory
25 |   of the folder with the script called "output" with the filename in 
26 |   this format with today's date.
27 |   Example: "output/keyword_search_org_filter_2022-07-27.csv"
28 | """
29 | 
30 | import sys
31 | import csv
32 | from common_functions import read_orgs, create_file 
33 | 
34 | # Read list of orgs from a file
35 | try:
36 |     org_list = read_orgs('orgs.txt')
37 | except:
38 |     print("Error reading orgs. This script depends on the existance of a file called orgs.txt containing one org per line. Exiting")
39 |     sys.exit()
40 | 
41 | # read filename from command line or prompt if no arguments were given.
42 | try:
43 |     file_name = str(sys.argv[1])
44 | 
45 | except:
46 |     print("Please enter the filename for the csv file generated from keyword_by_repo.py (full path)")
47 |     file_name = input("Enter a file name: ")
48 | 
49 | # open csv file 
50 | with open(file_name) as in_file:
51 |     content = csv.reader(in_file)
52 |     
53 |     # get the csv header line and move to the first result
54 |     header = next(content)
55 | 
56 |     # find csv results lines that match a GitHub org name from orgs.txt
57 |     # and append the matches to a list
58 |     org_match_list = []
59 |     for line in content:
60 |         if line[0] in org_list:
61 |             org_match_list.append(line)
62 | 
63 |     # prepare output file and write header and list to csv
64 |     try:
65 |         file, file_path = create_file("keyword_search_org_filter")
66 | 
67 |         with open(file_path, "w") as out_file:
68 |             wr = csv.writer(out_file)
69 |             wr.writerow(header)
70 |             wr.writerows(org_match_list)
71 | 
72 |     except:
73 |         print('Could not write to csv file. This may be because the output directory is missing or you do not have permissions to write to it. Exiting')
74 | 
75 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # API key
  2 | gh_key*
  3 | 
  4 | # Input files
  5 | *org*.txt
  6 | *keyword*.txt
  7 | *monitoring*.txt
  8 | *.csv
  9 | 
 10 | # Output directory contents
 11 | scripts/output/*.csv
 12 | notebooks/output/*.csv
 13 | *.pkl
 14 | 
 15 | # Experimental scripts
 16 | scripts/exp*
 17 | 
 18 | # VS Code files
 19 | /.vscode/
 20 | .vscode
 21 | 
 22 | # MacOS files
 23 | .DS_Store
 24 | 
 25 | # Byte-compiled / optimized / DLL files
 26 | __pycache__/
 27 | *.py[cod]
 28 | *$py.class
 29 | 
 30 | # C extensions
 31 | *.so
 32 | 
 33 | # Distribution / packaging
 34 | .Python
 35 | build/
 36 | develop-eggs/
 37 | dist/
 38 | downloads/
 39 | eggs/
 40 | .eggs/
 41 | lib/
 42 | lib64/
 43 | parts/
 44 | sdist/
 45 | var/
 46 | wheels/
 47 | pip-wheel-metadata/
 48 | share/python-wheels/
 49 | *.egg-info/
 50 | .installed.cfg
 51 | *.egg
 52 | MANIFEST
 53 | 
 54 | # PyInstaller
 55 | #  Usually these files are written by a python script from a template
 56 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 57 | *.manifest
 58 | *.spec
 59 | 
 60 | # Installer logs
 61 | pip-log.txt
 62 | pip-delete-this-directory.txt
 63 | 
 64 | # Unit test / coverage reports
 65 | htmlcov/
 66 | .tox/
 67 | .nox/
 68 | .coverage
 69 | .coverage.*
 70 | .cache
 71 | nosetests.xml
 72 | coverage.xml
 73 | *.cover
 74 | *.py,cover
 75 | .hypothesis/
 76 | .pytest_cache/
 77 | 
 78 | # Translations
 79 | *.mo
 80 | *.pot
 81 | 
 82 | # Django stuff:
 83 | *.log
 84 | local_settings.py
 85 | db.sqlite3
 86 | db.sqlite3-journal
 87 | 
 88 | # Flask stuff:
 89 | instance/
 90 | .webassets-cache
 91 | 
 92 | # Scrapy stuff:
 93 | .scrapy
 94 | 
 95 | # Sphinx documentation
 96 | docs/_build/
 97 | 
 98 | # PyBuilder
 99 | target/
100 | 
101 | # Jupyter Notebook
102 | .ipynb_checkpoints
103 | 
104 | # IPython
105 | profile_default/
106 | ipython_config.py
107 | 
108 | # pyenv
109 | .python-version
110 | 
111 | # pipenv
112 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
113 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
114 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
115 | #   install all needed dependencies.
116 | #Pipfile.lock
117 | 
118 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
119 | __pypackages__/
120 | 
121 | # Celery stuff
122 | celerybeat-schedule
123 | celerybeat.pid
124 | 
125 | # SageMath parsed files
126 | *.sage.py
127 | 
128 | # Environments
129 | .env
130 | .venv
131 | env/
132 | venv/
133 | ENV/
134 | env.bak/
135 | venv.bak/
136 | 
137 | # Spyder project settings
138 | .spyderproject
139 | .spyproject
140 | 
141 | # Rope project settings
142 | .ropeproject
143 | 
144 | # mkdocs documentation
145 | /site
146 | 
147 | # mypy
148 | .mypy_cache/
149 | .dmypy.json
150 | dmypy.json
151 | 
152 | # Pyre type checker
153 | .pyre/
154 | 


--------------------------------------------------------------------------------
/scripts/org_access_audit.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | # Author: Dawn M. Foster <fosterd@vmware.com>
  6 | 
  7 | """GitHub Organization Access Audit
  8 | This script uses the GitHub GraphQL API to retrieve relevant
  9 | information about all enterprise owners and org members from 
 10 | one or more GitHub orgs.
 11 | 
 12 | Note that you must have appropriate access to this data in the
 13 | orgs requested. Missing data likely means that you don't have
 14 | access.
 15 | 
 16 | As input, this script requires a file named 'orgs.txt' containing
 17 | the name of one GitHub org per line residing in the same folder 
 18 | as this script.
 19 | 
 20 | Your API key should be stored in a file called gh_key in the
 21 | same folder as this script.
 22 | 
 23 | As output:
 24 | * JSON data is currently printed to the screen as way to do this
 25 |   quickly.
 26 | """
 27 | 
 28 | import sys
 29 | from common_functions import read_key
 30 | 
 31 | def make_query(after_cursor = None):
 32 |     """Creates and returns a GraphQL query with cursor for pagination"""
 33 | 
 34 |     return """query ($org_name: String!){
 35 |   organization(login: $org_name){
 36 |     url
 37 |     enterpriseOwners(first:100){
 38 |       nodes{
 39 |         login
 40 |       }
 41 |     }
 42 |     membersWithRole(first:100){
 43 |       nodes{
 44 |         login
 45 |         name
 46 |       }
 47 |     }
 48 |   }
 49 | }
 50 | """
 51 | 
 52 | # Read GitHub key from file using the read_key function in 
 53 | # common_functions.py
 54 | try:
 55 |     api_token = read_key('gh_key')
 56 | 
 57 | except:
 58 |     print("Error reading GH Key. This script depends on the existance of a file called gh_key containing your GitHub API token. Exiting")
 59 |     sys.exit()
 60 | 
 61 | def get_org_data(api_token):
 62 |     """Executes the GraphQL query to get owner / member data from one or more GitHub orgs.
 63 | 
 64 |     Parameters
 65 |     ----------
 66 |     api_token : str
 67 |         The GH API token retrieved from the gh_key file.
 68 | 
 69 |     Returns
 70 |     -------
 71 |     repo_info_df : pandas.core.frame.DataFrame
 72 |     """
 73 |     import requests
 74 |     import json
 75 |     import pandas as pd
 76 |     from common_functions import read_orgs
 77 |     import sys
 78 | 
 79 |     url = 'https://api.github.com/graphql'
 80 |     headers = {'Authorization': 'token %s' % api_token}
 81 |     
 82 |     # Read list of orgs from a file
 83 | 
 84 |     try:
 85 |         org_list = read_orgs('orgs.txt')
 86 |     except:
 87 |         print("Error reading orgs. This script depends on the existance of a file called orgs.txt containing one org per line. Exiting")
 88 |         sys.exit()
 89 |     
 90 |     for org_name in org_list: 
 91 |         try:
 92 |             query = make_query()
 93 | 
 94 |             variables = {"org_name": org_name}
 95 |             r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
 96 |             json_data = json.loads(r.text)
 97 | 
 98 |             print(json_data)
 99 |         except:
100 |             print("ERROR Cannot process", org_name)
101 |     
102 | get_org_data(api_token)
103 | 
104 | 


--------------------------------------------------------------------------------
/scripts/repo_activity_REST.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | 
  6 | """Repo Activity REST API Version - DEPRECATED Example only
  7 | This script is comparison with the other script (repo_activity.py)
  8 | which uses the GraphQL API. This one is very slow and should not
  9 | be used to gather data.
 10 | 
 11 | This script uses the GitHub REST API to retrieve relevant
 12 | information about all repositories from one or more GitHub
 13 | orgs.
 14 | 
 15 | This version runs much more slowly than the other GraphQL 
 16 | version, and if there are a lot of orgs / repos, the API
 17 | rate limit will be exceeded if the script is not slowed down.
 18 | 
 19 | As input, this script requires a file named 'orgs.txt' containing
 20 | the name of one GitHub org per line residing in the same folder 
 21 | as this script.
 22 | 
 23 | Your API key should be stored in a file called gh_key in the
 24 | same folder as this script.
 25 | 
 26 | This script requires that `pandas` be installed within the Python
 27 | environment you are running this script in.
 28 | 
 29 | As output:
 30 | * A message about each org being processed will be printed to the screen.
 31 | * the script creates a csv file stored in an subdirectory
 32 |   of the folder with the script called "output" with the filename in 
 33 |   this format with today's date.
 34 | 
 35 | output/a_repo_activity_2022-01-14.csv"
 36 | """
 37 | 
 38 | import sys
 39 | import pandas as pd
 40 | import csv
 41 | from datetime import datetime
 42 | from time import sleep
 43 | from github import Github
 44 | from os.path import dirname, join
 45 | from common_functions import read_key
 46 | 
 47 | # Read GitHub key from file
 48 | try:
 49 |     gh_key = read_key('gh_key')
 50 |     g = Github(gh_key)
 51 | 
 52 | except:
 53 |     print("Error reading GH Key. Exiting")
 54 |     sys.exit()
 55 | 
 56 | # prepare csv file and write header row
 57 | 
 58 | today = datetime.today().strftime('%Y-%m-%d')
 59 | output_filename = "./output/a_repo_activity_" + today + ".csv"
 60 | 
 61 | try:
 62 |     current_dir = dirname(__file__)
 63 |     file_path = join(current_dir, output_filename)
 64 | 
 65 |     csv_output = open(file_path, 'w')
 66 |     csv_output.write('org,repo,license,private,forked,archived,last_updated,last_pushed,last_committer_login,last_committer_name,last_committer_email,last_committer_date\n')
 67 | 
 68 | except:
 69 |     print('Could not write to csv file. Exiting')
 70 |     sys.exit(1)
 71 | 
 72 | # Read list of orgs from a file
 73 | 
 74 | org_list = []
 75 | with open('orgs.txt') as orgfile:
 76 |     orgs = csv.reader(orgfile)
 77 |     for row in orgs:
 78 |         org_list.append(row[0])
 79 |         
 80 | # Get repos and repo info for each org
 81 | 
 82 | for github_org in org_list:
 83 | 
 84 | #    sleep(90) #add delay to slow down hitting rate limits
 85 |     print("Processing ", github_org)
 86 | 
 87 |     try:
 88 |         org = g.get_organization(github_org)
 89 |     except:
 90 |         print("ERROR: Cannot process ", github_org)
 91 |         continue
 92 | 
 93 |     try:
 94 |         for x in org.get_repos():
 95 |             try:
 96 |                 for y in x.get_commits():
 97 |                     try:
 98 |                         author_login = y.author.login
 99 |                         author_name = y.author.name
100 |                         author_email = y.author.email
101 |                         break
102 |                     except:
103 |                         author_login = None
104 |                         author_name = None
105 |                         author_email = None
106 |                         
107 | 
108 |                 try:
109 |                     last_commit_date = x.get_commit(y.sha).commit.author.date
110 |                 except:
111 |                     last_commit_date = "No commits, repo may be empty"
112 | 
113 |                 # When this fails it usually means there is no license
114 |                 try:
115 |                     license = x.get_license().license.name
116 |                 except:
117 |                     license = "Likely Unlicensed" 
118 | 
119 |                 csv_string = github_org + ',' + x.full_name + ',' + str(license) + ',' + str(x.private) + ',' + str(x.fork) + ',' + str(x.archived) + ',' + str(x.updated_at) + ',' + str(x.pushed_at) + ',' + str(author_login) + ',' + str(author_name) + ',' + str(author_email) + ',' + str(last_commit_date) + '\n'
120 |                 csv_output.write(csv_string)
121 |             except: 
122 |                 print("Cannot process data for", x) 
123 |                 csv_output.write(github_org + ',' + x.full_name + ',' + str(x.private) + ',' + str(x.fork) + ',' + str(x.archived) + ',' + str(x.updated_at) + ',' + str(x.pushed_at) + ',' + 'Error' + ',' + 'Error' + ',' + 'Error' + ',' + 'Error' + '\n')
124 | 
125 |     except:
126 |          print("Cannot get repos for", github_org)
127 | 


--------------------------------------------------------------------------------
/scripts/mystery_orgs.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2022 VMware, Inc.
  2 | # SPDX-License-Identifier: BSD-2-Clause
  3 | 
  4 | """Mystery Orgs
  5 | This script uses the GitHub GraphQL API to retrieve relevant
  6 | information about one or more GitHub orgs.
  7 | 
  8 | We use this script at to gather basic data about GitHub orgs that
  9 | we believe may have been created outside of our process by various
 10 | employees across our business units. We gather the first few members
 11 | of the org to help identify employees who can provide more details
 12 | about the purpose of the org and how it is used.
 13 | 
 14 | As input, this script requires a file named 'orgs.txt' containing
 15 | the name of one GitHub org per line residing in the same folder 
 16 | as this script.
 17 | 
 18 | Your API key should be stored in a file called gh_key in the
 19 | same folder as this script.
 20 | 
 21 | As output:
 22 | * A message about each org being processed will be printed to the screen.
 23 | * the script creates a csv file stored in an subdirectory
 24 |   of the folder with the script called "output" with the filename in 
 25 |   this format with today's date.
 26 | 
 27 | output/mystery_orgs_2022-01-14.csv"
 28 | """
 29 | 
 30 | import sys
 31 | from common_functions import read_key
 32 | 
 33 | def make_query():
 34 |     """Creates and returns a GraphQL query"""
 35 |     return """query OrgQuery($org_name: String!) {
 36 |              organization(login:$org_name) {
 37 |                name
 38 |                url
 39 |                websiteUrl
 40 |                createdAt
 41 |                updatedAt
 42 |                membersWithRole(first: 15){
 43 |                  nodes{
 44 |                    login
 45 |                    name
 46 |                    email
 47 |                    company
 48 |                  }
 49 |                }
 50 |               }
 51 |             }"""
 52 | 
 53 | # Read GitHub key from file using the read_key function in 
 54 | # common_functions.py
 55 | try:
 56 |     api_token = read_key('gh_key')
 57 | 
 58 | except:
 59 |     print("Error reading GH Key. This script depends on the existance of a file called gh_key containing your GitHub API token. Exiting")
 60 |     sys.exit()
 61 | 
 62 | def get_org_data(api_token):
 63 |     """Executes the GraphQL query to get org data from one or more GitHub orgs.
 64 | 
 65 |     Parameters
 66 |     ----------
 67 |     api_token : str
 68 |         The GH API token retrieved from the gh_key file.
 69 | 
 70 |     Output
 71 |     -------
 72 |     Writes a csv file of the form 'mystery_orgs_2022-16-01.csv' with today's date
 73 |     """
 74 | 
 75 |     import requests
 76 |     import json
 77 |     import sys
 78 |     import csv
 79 |     from datetime import datetime
 80 |     from common_functions import read_orgs, create_file
 81 | 
 82 |     url = 'https://api.github.com/graphql'
 83 |     headers = {'Authorization': 'token %s' % api_token}
 84 |     
 85 |     # Read list of orgs from a file
 86 |     try:
 87 |         org_list = read_orgs('orgs.txt')
 88 |     except:
 89 |         print("Error reading orgs. This script depends on the existance of a file called orgs.txt containing one org per line. Exiting")
 90 |         sys.exit()
 91 | 
 92 |     # Initialize list of lists with a header row.
 93 |     # Each embedded list will become a row in the csv file
 94 |     all_rows = [['org_name', 'org_url', 'website', 'org_createdAt', 'org_updatedAt', 'people(login,name,email,company):repeat']]
 95 |     
 96 |     for org_name in org_list:
 97 | 
 98 |         print("Processing", org_name)
 99 | 
100 |         row = []
101 |         query = make_query()
102 | 
103 |         variables = {"org_name": org_name}
104 |         r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
105 |         json_data = json.loads(r.text)
106 | 
107 |         # Take the json_data file and expand the info about people horizontally into the 
108 |         # same row as the rest of the data about that org.  
109 |         try:      
110 |             for key in json_data['data']['organization']:
111 |                 if key == 'membersWithRole':
112 |                     for nkey in json_data['data']['organization'][key]['nodes']:
113 |                         row.append(nkey['login'])
114 |                         row.append(nkey['name'])
115 |                         row.append(nkey['email'])
116 |                         row.append(nkey['company'])
117 |                 else:
118 |                     row.append(json_data['data']['organization'][key])
119 |             all_rows.append(row)
120 |         except:
121 |             pass
122 |         
123 |     # prepare file and write rows to csv
124 | 
125 |     try:
126 |         file = create_file("mystery_orgs")
127 | 
128 |         with file:    
129 |             write = csv.writer(file)
130 |             write.writerows(all_rows)
131 | 
132 |     except:
133 |         print('Could not write to csv file. This may be because the output directory is missing or you do not have permissions to write to it. Exiting')
134 | 
135 | get_org_data(api_token)
136 | 


--------------------------------------------------------------------------------
/scripts/monitoring.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | # Author: Dawn M. Foster <fosterd@vmware.com>
  6 | 
  7 | """Calculate OSSF's Criticality Score for pinned repos on a list of orgs
  8 | 
  9 | This script uses the GitHub GraphQL API to retrieve a list of pinned
 10 | repositories from a list of GitHub organizations and then runs 
 11 | criticality score on each of those repos. It can also take an individual 
 12 | repo URL as one of the inputs
 13 | 
 14 | As input, this script requires a file named 'monitoring.txt' residing in
 15 | the same folder as this script. This file should contain the name of one
 16 | GitHub org or URL to an individual repository per line. Example file format:
 17 |     vmware
 18 |     https://github.com/greenplum-db/gpdb
 19 |     vmware-tanzu
 20 | 
 21 | Your API key should be stored in a file called gh_key in the
 22 | same folder as this script.
 23 | 
 24 | This script requires that `pandas` be installed within the Python
 25 | environment you are running this script in.
 26 | 
 27 | This script depends on another tool called Criticality Score to run.
 28 | See https://github.com/ossf/criticality_score for more details, including
 29 | how to set up a required environment variable. This function requires that
 30 | you have this tool installed, and it might only run on mac / linux. 
 31 | 
 32 | As output:
 33 | * A message about each repo being processed will be printed to the screen.
 34 | * the script creates a csv file stored in an subdirectory
 35 |   of the folder with the script called "output" with the filename in 
 36 |   this format with today's date.
 37 | 
 38 | output/monitoring_2023-01-28.csv"
 39 | 
 40 | """
 41 | 
 42 | import sys
 43 | import subprocess
 44 | import os
 45 | import requests
 46 | import json
 47 | import csv
 48 | from common_functions import create_file, read_key, read_orgs  
 49 | 
 50 | # Read GitHub key from file using the read_key function in 
 51 | # common_functions.py
 52 | try:
 53 |     api_token = read_key('gh_key')
 54 | 
 55 | except:
 56 |     print("Error reading GH Key. This script depends on the existance of a file called gh_key containing your GitHub API token. Exiting")
 57 |     sys.exit()
 58 | 
 59 | # Use the token to set the environment variable required by criticality_score
 60 | os.environ['GITHUB_AUTH_TOKEN'] = api_token
 61 | 
 62 | def make_query():
 63 |     """Creates and returns a GraphQL query to get pinned repos for an org"""
 64 |     return """query pinned($org_name: String!) {
 65 |                  organization(login:$org_name) {
 66 |                     pinnedItems(first: 10, types: REPOSITORY) {
 67 |                         nodes {
 68 |                             ... on Repository {
 69 |                               url
 70 |                             }
 71 |                         }
 72 |                     }
 73 |                   }
 74 |                 }"""
 75 | 
 76 | # TODO use read_orgs function to read from file.
 77 | org_list = read_orgs('monitoring.txt')
 78 | 
 79 | # Set up the parameters needed to use GitHub's GraphQL API
 80 | url = 'https://api.github.com/graphql'
 81 | headers = {'Authorization': 'token %s' % api_token}
 82 | 
 83 | # Iterate through each GH org, run the query to get pinned repos, 
 84 | # and store the repo URLs in repo_list
 85 | repo_list = []
 86 | 
 87 | for org_name in org_list:
 88 |     # To handle both orgs and individual repos ...
 89 |     # Maybe split this into 2. If startswith http, run crit score on that URL
 90 |     # Otherwise run this query for pinned items on an org
 91 | 
 92 |     if org_name.startswith('http'):
 93 |         repo_list.append(org_name)
 94 | 
 95 |     else:
 96 |         query = make_query()
 97 | 
 98 |         variables = {"org_name": org_name}
 99 |         r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
100 |         json_data = json.loads(r.text)
101 | 
102 |         # Wrap in try/except for when org isn't valid.
103 |         try:
104 |             for url_dict in json_data['data']['organization']['pinnedItems']['nodes']:
105 |                 repo_list.append(url_dict['url'])
106 |         except:
107 |             print("Could not get data on", org_name, "- check to make sure the org name is correct and has pinned repos")
108 | 
109 | # For each repo in repo_list, run criticality_score and append
110 | # the json output to csv_row_list
111 | csv_row_list = []
112 | 
113 | for repo in repo_list:
114 |     cmd_str = 'criticality_score --repo ' + repo + ' --format json'
115 |     print("Processing", repo)
116 |     try:
117 |         proc = subprocess.Popen(cmd_str, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
118 |         out, err = proc.communicate()
119 | 
120 |         if not err:
121 |             json_str = out.decode("utf-8")
122 |             csv_row_list.append(json.loads(json_str))
123 |         else:
124 |             print('Error calculating scores', repo)
125 |     except:
126 |         print('Error calculating scores', repo)
127 | 
128 | # Create csv output file and write to it
129 | 
130 | keys = csv_row_list[0].keys()
131 | 
132 | file, file_path = create_file("monitoring")
133 | 
134 | with open(file_path, 'w', newline='') as output_file:
135 |     dict_writer = csv.DictWriter(output_file, keys)
136 |     dict_writer.writeheader()
137 |     dict_writer.writerows(csv_row_list)
138 | 


--------------------------------------------------------------------------------
/scripts/keyword_by_repo.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | # Author: Dawn M. Foster <fosterd@vmware.com>
  6 | 
  7 | """ Search for repos mentioning a keyword
  8 | This script uses the GitHub GraphQL API to retrieve relevant
  9 | information about repositories mentioning certain keywords.
 10 | 
 11 | As input, this script requires a file named 'keywords.txt' containing
 12 | one keyword per line residing in the same folder as this script.
 13 | 
 14 | Your API key should be stored in a file called gh_key in the
 15 | same folder as this script.
 16 | 
 17 | This script requires that `pandas` be installed within the Python
 18 | environment you are running this script in.
 19 | 
 20 | As output:
 21 | * A message about each keyword being processed will be printed to the screen.
 22 | * the script creates a csv file stored in an subdirectory
 23 |   of the folder with the script called "output" with the filename in 
 24 |   this format with today's date.
 25 |   Example: "output/keyword_search_2022-07-22.csv"
 26 | """
 27 | 
 28 | import sys
 29 | from common_functions import read_key, create_file
 30 | 
 31 | def make_query(after_cursor = None):
 32 |     """Creates and returns a GraphQL query with cursor for pagination"""
 33 | 
 34 |     return """query MyQuery ($keyword: String!){
 35 |         search(query: $keyword, type: REPOSITORY, first: 100, after: AFTER) {
 36 |             pageInfo {
 37 |                 hasNextPage
 38 |                 endCursor
 39 |             }
 40 |             nodes {
 41 |             ... on Repository {
 42 |                 nameWithOwner
 43 |                 name
 44 |                 owner{
 45 |                     login
 46 |                 }
 47 |                 url
 48 |                 description
 49 |                 updatedAt
 50 |                 createdAt
 51 |                 isFork
 52 |                 isEmpty
 53 |                 isArchived
 54 |                 forkCount
 55 |                 stargazerCount
 56 |             }
 57 |             }
 58 |         }
 59 |     }""".replace(
 60 |         "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
 61 |     )
 62 | 
 63 | def get_repo_data(api_token):
 64 |     """Executes the GraphQL query to get repository data from one or more GitHub orgs.
 65 | 
 66 |     Parameters
 67 |     ----------
 68 |     api_token : str
 69 |         The GH API token retrieved from the gh_key file.
 70 | 
 71 |     Returns
 72 |     -------
 73 |     repo_info_df : pandas.core.frame.DataFrame
 74 |     """
 75 |     import requests
 76 |     import json
 77 |     import pandas as pd
 78 |     from common_functions import read_file 
 79 | 
 80 |     url = 'https://api.github.com/graphql'
 81 |     headers = {'Authorization': 'token %s' % api_token}
 82 |     
 83 |     repo_info_df = pd.DataFrame()
 84 |     
 85 |     # Read list of keywords from a file
 86 | 
 87 |     try:
 88 |         keyword_list = read_file('keywords.txt')
 89 |     except:
 90 |         print("Error reading keywords. This script depends on the existance of a file called keywords.txt containing one keyword per line. Exiting")
 91 |         sys.exit()
 92 |     
 93 |     for keyword in keyword_list:  
 94 |         has_next_page = True
 95 |         after_cursor = None
 96 |     
 97 |         print("Processing", keyword)
 98 | 
 99 |         while has_next_page:
100 | 
101 |             try:
102 |                 query = make_query(after_cursor)
103 | 
104 |                 variables = {"keyword": keyword}
105 |                 r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
106 |                 json_data = json.loads(r.text)
107 | 
108 |                 df_temp = pd.DataFrame(json_data["data"]["search"]["nodes"])
109 |                 repo_info_df = pd.concat([repo_info_df, df_temp])
110 | 
111 |                 has_next_page = json_data["data"]["search"]["pageInfo"]["hasNextPage"]
112 |                 after_cursor = json_data["data"]["search"]["pageInfo"]["endCursor"]
113 |             except:
114 |                 has_next_page = False
115 |                 print("ERROR Cannot process", keyword)
116 |         
117 |     return repo_info_df
118 | 
119 | # Read GitHub key from file using the read_key function in 
120 | # common_functions.py
121 | try:
122 |     api_token = read_key('gh_key')
123 | 
124 | except:
125 |     print("Error reading GH Key. This script depends on the existance of a file called gh_key containing your GitHub API token. Exiting")
126 |     sys.exit()
127 | 
128 | repo_info_df = get_repo_data(api_token)
129 | 
130 | def expand_owner(owner):
131 |     import pandas as pd
132 |     if pd.isnull(owner):
133 |         owner = 'Not Found'
134 |     else:
135 |         owner = owner['login']
136 |     return owner
137 | 
138 | repo_info_df['owner_name'] = repo_info_df['owner'].apply(expand_owner)
139 | repo_info_df = repo_info_df.drop(columns=['owner'])
140 | 
141 | # Reformat to put columns in a logical order
142 | repo_info_df = repo_info_df[['owner_name','name','nameWithOwner','url','description','updatedAt','createdAt','isFork','isEmpty','isArchived','forkCount','stargazerCount']]
143 | 
144 | # prepare file and write dataframe to csv
145 | 
146 | try:
147 |     file, file_path = create_file("keyword_search")
148 |     repo_info_df.to_csv(file_path, index=False)
149 | 
150 | except:
151 |     print('Could not write to csv file. This may be because the output directory is missing or you do not have permissions to write to it. Exiting')
152 | 


--------------------------------------------------------------------------------
/scripts/README.md:
--------------------------------------------------------------------------------
  1 | # Python Scripts
  2 | 
  3 | These Python scripts use the GitHub APIs to gather data.
  4 | 
  5 | ## Acceptable Use
  6 | 
  7 | Note: Some of these scripts gather names and email addresses, which we use 
  8 | to help us find a contact if we have questions about a 
  9 | repository or org. Note that the [GitHub Acceptable Use
 10 | Policies](https://docs.github.com/en/github/site-policy/github-acceptable-use-policies)
 11 | prohibits certain usage of information, and I would encourage you to read
 12 | this policy and not use scripts like these for unethical purposes.
 13 | 
 14 | ## Requirements:
 15 | 
 16 | The scripts all have a few common requirements, and individual
 17 | scripts may have additional requirements and other information
 18 | which can be found in the Docstrings.
 19 | 
 20 | * These scripts require that `pandas` be installed within the Python
 21 |   environment you are running this script in.
 22 | * Your API key should be stored in a file called gh_key in the
 23 |   same folder as these scripts.
 24 | * Most scripts also require an orgs.txt or other text file used as
 25 |   input. Details can be found in the docstring for each script.
 26 | * Most scripts require that a folder named "output" exists in this
 27 |   scripts directory, and csv output files will be stored there.
 28 | 
 29 | ## Scripts
 30 | 
 31 | ### Inclusivity Check
 32 | 
 33 | This script uses the GitHub GraphQL API to retrieve default branch
 34 | name and code of conduct for each repo in a GitHub org for a very
 35 | quick, but rudimentary inclusivity check.
 36 | 
 37 | **Running the script**
 38 | 
 39 | Requires orgs.txt
 40 | ```
 41 | $python3 inclusivity_check.py
 42 | ```
 43 | 
 44 | ### Repository Activity
 45 | These scripts demonstrate the difference in speed and
 46 | rate limits between the GitHub REST API and the GraphQL API. The original
 47 | REST script took hours to run across our 60+ GitHub orgs and had to be
 48 | slowed down to avoid hitting the rate limit, while the GraphQL version,
 49 | which gathers the same data, runs in less than 15 minutes without hitting
 50 | any rate limits.
 51 | 
 52 |     scripts/repo_activity.py
 53 |     scripts/repo_activity_coc.py
 54 |     scripts/repo_activity_REST.py
 55 | 
 56 | We used this script to gather basic data about the repositories found across
 57 | dozens of an organization's GitHub orgs. We use this to understand whether
 58 | projects are meeting our compliance requirements. We also use this script to
 59 | find abandoned repos that have outlived their usefulness and should be
 60 | archived.
 61 | 
 62 | Note: repo_activity_coc.py is mostly identical to repo_activity.py, 
 63 | but it adds info about the code of conduct. This is a separate script
 64 | because the codeOfConduct object in the GraphQL API is a bit problematic
 65 | and tends to time out when getting relatively small amounts of data.
 66 | 
 67 | **Running the scripts**
 68 | 
 69 | Requires orgs.txt
 70 | ```
 71 | $python3 repo_activity.py
 72 | ```
 73 | 
 74 | ### Sunset
 75 | 
 76 | This script uses the GitHub GraphQL API to Gather data to determine 
 77 | whether a repo can be archived. It retrieves relevant
 78 | information about a repository, including forks to determine ownership
 79 | and possibly contact people to understand how they are using a project.
 80 | 
 81 | As input, this script requires a GitHub URL for a repository or a csv
 82 | file containing one repo_name,org_name pair per line.
 83 | 
 84 | **Running the script**
 85 | 
 86 | Run the script with one repo url as input
 87 | ```
 88 | $python3 sunset.py -u "https://github.com/org_name/repo_name"
 89 | ```
 90 | 
 91 | Run the script with a csv file containing one repo_name,org_name pair
 92 | per line:
 93 | ```
 94 | $python3 sunset.py -f sunset.csv
 95 | ```
 96 | 
 97 | ### Monitoring
 98 | 
 99 | This script uses the GitHub GraphQL API to retrieve the pinned repos
100 | for each GitHub org listed in monitoring.txt and runs criticality score
101 | for each of those pinned repositories.
102 | 
103 | **Running the script**
104 | 
105 | Requires monitoring.txt
106 | ```
107 | $python3 monitoring.py
108 | ```
109 | 
110 | ### Keyword by Repo with Optional Filter
111 | 
112 | The keyword_by_repo script uses the GitHub GraphQL API to retrieve
113 | relevant information about repositories mentioning certain keywords.
114 | 
115 | The filter_keyword_by_org script uses the results from a keyword search
116 | and filters it based on a list of GitHub organizations. 
117 | 
118 | As input, this script requires a file generated by the keyword_by_repo.py
119 | script. This is provided via a command line argument.
120 | 
121 | **Running the scripts**
122 | 
123 | keyword_by_repo requires keywords.txt
124 | ```
125 | $python3 keyword_by_repo.py
126 | ```
127 | 
128 | filter_keyword_by_org requires output file from keyword_by_repo
129 | ```
130 | $python3 filter_keyword_by_org.py /path/to/keyword_search_2022-07-26.csv
131 | ```
132 | 
133 | ### Mystery GitHub Organizations
134 | 
135 | We can use this script to gather basic data about GitHub orgs that
136 | we believe may have been created outside of our process by various
137 | employees across our business units. We gather the first few members
138 | of the org to help identify employees who can provide more details
139 | about the purpose of the org and how it is used.
140 | 
141 |     scripts/mystery_orgs.py
142 | 
143 | However, since members are private by default, this script may not
144 | be as useful as just running repo_activity.py on those same orgs
145 | to also learn more about the repos and get better contact info
146 | from the commit data.
147 | 
148 | **Running the script**
149 | 
150 | Requires orgs.txt
151 | ```
152 | $python3 mystery_orgs.py
153 | ```
154 | 
155 | 


--------------------------------------------------------------------------------
/CODE-OF-CONDUCT.md:
--------------------------------------------------------------------------------
  1 | # Contributor Covenant Code of Conduct
  2 | 
  3 | ## Our Pledge
  4 | 
  5 | We as members, contributors, and leaders pledge to make participation in this project and our
  6 | community a harassment-free experience for everyone, regardless of age, body
  7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
  8 | identity and expression, level of experience, education, socio-economic status,
  9 | nationality, personal appearance, race, religion, or sexual identity
 10 | and orientation.
 11 | 
 12 | We pledge to act and interact in ways that contribute to an open, welcoming,
 13 | diverse, inclusive, and healthy community.
 14 | 
 15 | ## Our Standards
 16 | 
 17 | Examples of behavior that contributes to a positive environment for our
 18 | community include:
 19 | 
 20 | * Demonstrating empathy and kindness toward other people
 21 | * Being respectful of differing opinions, viewpoints, and experiences
 22 | * Giving and gracefully accepting constructive feedback
 23 | * Accepting responsibility and apologizing to those affected by our mistakes,
 24 |   and learning from the experience
 25 | * Focusing on what is best not just for us as individuals, but for the
 26 |   overall community
 27 | 
 28 | Examples of unacceptable behavior include:
 29 | 
 30 | * The use of sexualized language or imagery, and sexual attention or
 31 |   advances of any kind
 32 | * Trolling, insulting or derogatory comments, and personal or political attacks
 33 | * Public or private harassment
 34 | * Publishing others' private information, such as a physical or email
 35 |   address, without their explicit permission
 36 | * Other conduct which could reasonably be considered inappropriate in a
 37 |   professional setting
 38 | 
 39 | ## Enforcement Responsibilities
 40 | 
 41 | Community leaders are responsible for clarifying and enforcing our standards of
 42 | acceptable behavior and will take appropriate and fair corrective action in
 43 | response to any behavior that they deem inappropriate, threatening, offensive,
 44 | or harmful.
 45 | 
 46 | Community leaders have the right and responsibility to remove, edit, or reject
 47 | comments, commits, code, wiki edits, issues, and other contributions that are
 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
 49 | decisions when appropriate.
 50 | 
 51 | ## Scope
 52 | 
 53 | This Code of Conduct applies within all community spaces, and also applies when
 54 | an individual is officially representing the community in public spaces.
 55 | Examples of representing our community include using an official e-mail address,
 56 | posting via an official social media account, or acting as an appointed
 57 | representative at an online or offline event.
 58 | 
 59 | ## Enforcement
 60 | 
 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
 62 | reported Dawn Foster - dawn@dawnfoster.com.
 63 | All complaints will be reviewed and investigated promptly and fairly.
 64 | 
 65 | All community leaders are obligated to respect the privacy and security of the
 66 | reporter of any incident.
 67 | 
 68 | ## Enforcement Guidelines
 69 | 
 70 | Community leaders will follow these Community Impact Guidelines in determining
 71 | the consequences for any action they deem in violation of this Code of Conduct:
 72 | 
 73 | ### 1. Correction
 74 | 
 75 | **Community Impact**: Use of inappropriate language or other behavior deemed
 76 | unprofessional or unwelcome in the community.
 77 | 
 78 | **Consequence**: A private, written warning from community leaders, providing
 79 | clarity around the nature of the violation and an explanation of why the
 80 | behavior was inappropriate. A public apology may be requested.
 81 | 
 82 | ### 2. Warning
 83 | 
 84 | **Community Impact**: A violation through a single incident or series
 85 | of actions.
 86 | 
 87 | **Consequence**: A warning with consequences for continued behavior. No
 88 | interaction with the people involved, including unsolicited interaction with
 89 | those enforcing the Code of Conduct, for a specified period of time. This
 90 | includes avoiding interactions in community spaces as well as external channels
 91 | like social media. Violating these terms may lead to a temporary or
 92 | permanent ban.
 93 | 
 94 | ### 3. Temporary Ban
 95 | 
 96 | **Community Impact**: A serious violation of community standards, including
 97 | sustained inappropriate behavior.
 98 | 
 99 | **Consequence**: A temporary ban from any sort of interaction or public
100 | communication with the community for a specified period of time. No public or
101 | private interaction with the people involved, including unsolicited interaction
102 | with those enforcing the Code of Conduct, is allowed during this period.
103 | Violating these terms may lead to a permanent ban.
104 | 
105 | ### 4. Permanent Ban
106 | 
107 | **Community Impact**: Demonstrating a pattern of violation of community
108 | standards, including sustained inappropriate behavior,  harassment of an
109 | individual, or aggression toward or disparagement of classes of individuals.
110 | 
111 | **Consequence**: A permanent ban from any sort of public interaction within
112 | the community.
113 | 
114 | ## Attribution
115 | 
116 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
117 | version 2.0, available at
118 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
119 | 
120 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
121 | enforcement ladder](https://github.com/mozilla/diversity).
122 | 
123 | [homepage]: https://www.contributor-covenant.org
124 | 
125 | For answers to common questions about this code of conduct, see the FAQ at
126 | https://www.contributor-covenant.org/faq. Translations are available at
127 | https://www.contributor-covenant.org/translations.
128 | 


--------------------------------------------------------------------------------
/scripts/common_functions.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2022 VMware, Inc.
  2 | # SPDX-License-Identifier: BSD-2-Clause
  3 | 
  4 | """Common Functions
  5 | This file contains some common functions that are used within
  6 | the other scripts in this repo.
  7 | 
  8 | This file can also be imported as a module.
  9 | """
 10 | 
 11 | def read_key(file_name):
 12 |     """Retrieves a GitHub API key from a file.
 13 |     
 14 |     Parameters
 15 |     ----------
 16 |     file_name : str
 17 | 
 18 |     Returns
 19 |     -------
 20 |     key : str
 21 |     """
 22 | 
 23 |     from os.path import dirname, join
 24 | 
 25 |     # Reads the first line of a file containing the GitHub API key
 26 |     # Usage: key = read_key('gh_key')
 27 | 
 28 |     current_dir = dirname(__file__)
 29 |     file2 = "./" + file_name
 30 |     file_path = join(current_dir, file2)
 31 | 
 32 |     with open(file_path, 'r') as kf:
 33 |         key = kf.readline().rstrip() # remove newline & trailing whitespace
 34 |     return key
 35 | 
 36 | def read_orgs(file_name):
 37 |     """Retrieves a list of orgs from a file.
 38 |     
 39 |     Parameters
 40 |     ----------
 41 |     file_name : str
 42 | 
 43 |     Returns
 44 |     -------
 45 |     org_list : list
 46 |     """
 47 |     import csv
 48 | 
 49 |     org_list = []
 50 | 
 51 |     with open(file_name) as orgfile:
 52 |         orgs = csv.reader(orgfile)
 53 |         for row in orgs:
 54 |             org_list.append(row[0])
 55 | 
 56 |     return org_list
 57 | 
 58 | def read_file(file_name):
 59 |     """Retrieves a list from a file.
 60 |     
 61 |     Parameters
 62 |     ----------
 63 |     file_name : str
 64 | 
 65 |     Returns
 66 |     -------
 67 |     a_list : list
 68 |     """
 69 |     import csv
 70 | 
 71 |     content_list = []
 72 | 
 73 |     with open(file_name) as in_file:
 74 |         content = csv.reader(in_file)
 75 |         for row in content:
 76 |             content_list.append(row[0])
 77 | 
 78 |     return content_list
 79 | 
 80 | def expand_name_df(df,old_col,new_col):
 81 |     """Takes a dataframe df with an API JSON object with nested elements in old_col, 
 82 |     extracts the name, and saves it in a new dataframe column called new_col
 83 | 
 84 |     Parameters
 85 |     ----------
 86 |     df : dataframe
 87 |     old_col : str
 88 |     new_col : str
 89 | 
 90 |     Returns
 91 |     -------
 92 |     df : dataframe
 93 |     """
 94 |     
 95 |     import pandas as pd
 96 | 
 97 |     def expand_name(nested_name):
 98 |         """Takes an API JSON object with nested elements and extracts the name
 99 |         Parameters
100 |         ----------
101 |         nested_name : JSON API object
102 | 
103 |         Returns
104 |         -------
105 |         object_name : str
106 |         """
107 |         if pd.isnull(nested_name):
108 |             object_name = 'Not Found'
109 |         else:
110 |             object_name = nested_name['name']
111 |         return object_name
112 | 
113 |     df[new_col] = df[old_col].apply(expand_name)
114 |     return df
115 |     
116 | 
117 | def get_criticality(org_name, repo_name, api_token):
118 |     """See https://github.com/ossf/criticality_score for more details
119 |     This function requires that you have version 1.0.7 of this tool 
120 |     installed (the older Python version but not the final Python version,
121 |     which doesn't work for some reason within the script - possibly because
122 |     of how they've implemented the deprecation warnings). You can install
123 |     the correct version using:
124 |     pip install criticality-score==1.0.7
125 |     
126 |     Parameters
127 |     ----------
128 |     org_name : str
129 |     repo_name : str
130 |     api_token : str
131 | 
132 |     Returns
133 |     -------
134 |     dependents_count : str
135 |         Numeric integer that is returned as a string
136 |     criticality_score : str
137 |         This value ranges from 0 to 1 (like a float) with lower scores indicating less critical projects.
138 |     
139 |     """
140 | 
141 |     import subprocess
142 |     import os
143 |     
144 |     os.environ['GITHUB_AUTH_TOKEN'] = api_token
145 | 
146 |     cmd_str = 'criticality_score --repo github.com/' + org_name + '/' + repo_name + ' --format csv'
147 | 
148 |     try:
149 |         proc = subprocess.Popen(cmd_str, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
150 |         out, err = proc.communicate()
151 |         
152 |         if not err:
153 |             csv_str = out.decode("utf-8")
154 |             items = csv_str.split(',')
155 |             dependents_count = items[25]
156 |             criticality_score = items[26].rstrip()
157 |         else: 
158 |             dependents_count = None
159 |             criticality_score = None
160 |     except:
161 |         dependents_count = None
162 |         criticality_score = None
163 | 
164 |     return dependents_count, criticality_score
165 | 
166 | def create_file(pre_string):
167 |     """Creates an output file in an "output" directory with today's date
168 |     as part of the filename and prints the file_path to the terminal to
169 |     make it easier to open the output file.
170 |     
171 |     Parameters
172 |     ----------
173 |     pre_string : str
174 |         This is the string that will preface today's date in the filename
175 | 
176 |     Returns
177 |     -------
178 |     file : file object
179 |     file_path : str
180 |         This is the full path to the file name for the output.
181 |     
182 |     """
183 |     from datetime import datetime
184 |     from os.path import dirname, join
185 | 
186 |     today = datetime.today().strftime('%Y-%m-%d')
187 |     output_filename = "./output/" + pre_string + "_" + today + ".csv"
188 |     current_dir = dirname(__file__)
189 |     file_path = join(current_dir, output_filename)
190 |     file = open(file_path, 'w', newline ='')
191 | 
192 |     print("Output file:\n", file_path, sep="")
193 | 
194 |     return file, file_path
195 | 


--------------------------------------------------------------------------------
/scripts/inclusivity_check.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | # Author: Dawn M. Foster <fosterd@vmware.com>
  6 | 
  7 | """Quick Inclusivity Check for GitHub orgs
  8 | This script uses the GitHub GraphQL API to retrieve default branch
  9 | name and code of conduct for each repo in a GitHub org.
 10 | 
 11 | As input, this script requires a file named 'orgs.txt' containing
 12 | the name of one GitHub org per line residing in the same folder 
 13 | as this script.
 14 | 
 15 | Your API key should be stored in a file called gh_key in the
 16 | same folder as this script.
 17 | 
 18 | This script requires that `pandas` be installed within the Python
 19 | environment you are running this script in.
 20 | 
 21 | As output:
 22 | * A message will be printed to the screen for any org with a default 
 23 |   branch name of "master" or a missing / unrecognized code of conduct.
 24 |   Orgs that are forks of another repo, private, empty, or archived
 25 |   will not be printed, but the details will be written to the csv file.
 26 | * The script creates a csv file stored in an subdirectory
 27 |   of the folder with the script called "output" with the filename in 
 28 |   this format with today's date. All details are written to the csv file
 29 |   including repos that aren't printed to the screen.
 30 | 
 31 | output/inclusivity_check_2022-01-14.csv"
 32 | 
 33 | """
 34 | 
 35 | import sys
 36 | from common_functions import read_key, expand_name_df, create_file
 37 | 
 38 | def make_query(after_cursor = None):
 39 |     """Creates and returns a GraphQL query with cursor for pagination"""
 40 | 
 41 |     return """query RepoQuery($org_name: String!) {
 42 |              organization(login: $org_name) {
 43 |                repositories (first: 5 after: AFTER){
 44 |                  pageInfo {
 45 |                    hasNextPage
 46 |                    endCursor
 47 |                  }
 48 |                  nodes { 
 49 |                    nameWithOwner
 50 |                     defaultBranchRef {
 51 |                         name 
 52 |                     }
 53 |                     codeOfConduct{
 54 |                         url
 55 |                     }
 56 |                     isPrivate
 57 |                     isFork
 58 |                     isEmpty
 59 |                     isArchived
 60 |                  }
 61 |                 }
 62 |              }
 63 |         }""".replace(
 64 |         "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
 65 |     )
 66 | 
 67 | # Read GitHub key from file using the read_key function in 
 68 | # common_functions.py
 69 | try:
 70 |     api_token = read_key('gh_key')
 71 | 
 72 | except:
 73 |     print("Error reading GH Key. This script depends on the existance of a file called gh_key containing your GitHub API token. Exiting")
 74 |     sys.exit()
 75 | 
 76 | def get_repo_data(api_token):
 77 |     """Executes the GraphQL query to get repository data from one or more GitHub orgs.
 78 | 
 79 |     Parameters
 80 |     ----------
 81 |     api_token : str
 82 |         The GH API token retrieved from the gh_key file.
 83 | 
 84 |     Returns
 85 |     -------
 86 |     repo_info_df : pandas.core.frame.DataFrame
 87 |     """
 88 |     import requests
 89 |     import json
 90 |     import pandas as pd
 91 |     from common_functions import read_orgs 
 92 | 
 93 |     url = 'https://api.github.com/graphql'
 94 |     headers = {'Authorization': 'token %s' % api_token}
 95 |     
 96 |     repo_info_df = pd.DataFrame()
 97 |     
 98 |     # Read list of orgs from a file
 99 | 
100 |     try:
101 |         org_list = read_orgs('orgs.txt')
102 |     except:
103 |         print("Error reading orgs. This script depends on the existance of a file called orgs.txt containing one org per line. Exiting")
104 |         sys.exit()
105 |     
106 |     for org_name in org_list:  
107 |         has_next_page = True
108 |         after_cursor = None
109 |     
110 |         print("Processing", org_name)
111 | 
112 |         while has_next_page:
113 | 
114 |             try:
115 |                 query = make_query(after_cursor)
116 | 
117 |                 variables = {"org_name": org_name}
118 |                 r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
119 |                 json_data = json.loads(r.text)
120 | 
121 |                 df_temp = pd.DataFrame(json_data['data']['organization']['repositories']['nodes'])
122 |                 repo_info_df = pd.concat([repo_info_df, df_temp])
123 | 
124 |                 has_next_page = json_data["data"]["organization"]["repositories"]["pageInfo"]["hasNextPage"]
125 | 
126 |                 after_cursor = json_data["data"]["organization"]["repositories"]["pageInfo"]["endCursor"]
127 |             except:
128 |                 has_next_page = False
129 |                 print("ERROR Cannot process", org_name)
130 |         
131 |     return repo_info_df
132 | 
133 | repo_info_df = get_repo_data(api_token)
134 | 
135 | # This section reformats the output into what we need in the csv file
136 | repo_info_df = expand_name_df(repo_info_df,'defaultBranchRef','defaultBranch')
137 | 
138 | def expand_coc(coc):
139 |     import pandas as pd
140 |     if pd.isnull(coc):
141 |         coc_url = 'Not Found'
142 |     else:
143 |         coc_url = coc['url']
144 |     return coc_url
145 | 
146 | repo_info_df['codeOfConduct_url'] = repo_info_df['codeOfConduct'].apply(expand_coc)
147 | repo_info_df = repo_info_df.drop(columns=['defaultBranchRef', 'codeOfConduct'])
148 | 
149 | # prepare file and write dataframe to csv
150 | 
151 | try:
152 |     file, file_path = create_file("inclusivity_check")
153 |     repo_info_df.to_csv(file_path, index=False)
154 | 
155 | except:
156 |     print('Could not write to csv file. This may be because the output directory is missing or you do not have permissions to write to it. Exiting')
157 | 
158 | print("repo            branch   Code of Conduct")
159 | for rows in repo_info_df.iterrows():
160 |     repo = rows[1]['nameWithOwner']
161 |     branch = rows[1]['defaultBranch']
162 |     coc = rows[1]['codeOfConduct_url']
163 |     private = rows[1]['isPrivate']
164 |     fork = rows[1]['isFork']
165 |     empty = rows[1]['isEmpty']
166 |     archive = rows[1]['isArchived']
167 |     if private or fork or empty or archive:
168 |         pass
169 |     elif (branch == 'master' or  coc == 'Not Found'):
170 |         print(repo, branch, coc)
171 | print("\nMore details can be found in", file_path)
172 | 


--------------------------------------------------------------------------------
/scripts/pr_activity.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | # Author: Dawn M. Foster <fosterd@vmware.com>
  6 | 
  7 | """ Gathers some basic data about PRs in a specific repo starting with the
  8 | most recent PRs.
  9 | 
 10 | Usage
 11 | -----
 12 | 
 13 | pr_activity.py [-h] -o ORG_NAME -r REPO_NAME [-n NUM_PAGES]
 14 |   -h, --help            show this help message and exit
 15 |   -o ORG_NAME, --org ORG_NAME
 16 |                         The name of the GitHub organization where your repo is found (required)
 17 |   -r REPO_NAME, --repo REPO_NAME
 18 |                         The name of a GitHub repository in that org where your PRs can be found (required)
 19 |   -n NUM_PAGES, --num NUM_PAGES
 20 |                         The number of pages of results with 10 results per page (10 = 100 results) - default is 10
 21 | 
 22 | Output
 23 | ------
 24 | 
 25 | * the script creates a csv file stored in an subdirectory
 26 |   of the folder with the script called "output" with the filename in 
 27 |   this format with today's date.
 28 | 
 29 | output/sunset_2022-01-14.csv
 30 | """
 31 | 
 32 | import argparse
 33 | import pandas as pd
 34 | from common_functions import read_key, create_file
 35 | 
 36 | def make_query(before_cursor = None):
 37 |     """Creates and returns a GraphQL query with cursor for pagination"""
 38 | 
 39 |     return """query repo($org_name: String!, $repo_name: String!){
 40 |   repository(owner: $org_name, name: $repo_name) {
 41 |     pullRequests (last: 10, before: BEFORE) {
 42 |         pageInfo{
 43 |             hasPreviousPage
 44 |             startCursor
 45 |         }
 46 |         nodes {
 47 |           createdAt
 48 |           mergedAt
 49 |           additions
 50 |           deletions
 51 |           changedFiles
 52 |           state
 53 |           comments{
 54 |             totalCount
 55 |           }
 56 |           author{
 57 |             ... on User{
 58 |             	login
 59 |                 name
 60 |                 pullRequests{
 61 |                     totalCount
 62 |                 }
 63 |           	}
 64 |           }
 65 |         }
 66 |       }
 67 |     }
 68 |   }""".replace(
 69 |         "BEFORE", '"{}"'.format(before_cursor) if before_cursor else "null"
 70 |     )
 71 | 
 72 | parser = argparse.ArgumentParser()
 73 | 
 74 | parser.add_argument("-o", "--org", required=True, dest = "org_name", help="The name of the GitHub organization where your repo is found (required)")
 75 | parser.add_argument("-r", "--repo", required=True, dest = "repo_name", help="The name of a GitHub repository in that org where your PRs can be found (required)")
 76 | parser.add_argument("-n", "--num", dest = "num_pages", default=10, type=int, help="The number of pages of results with 10 results per page (10 = 100 results) - default is 10")
 77 | 
 78 | args = parser.parse_args()
 79 | 
 80 | # Read GitHub key from file using the read_key function in 
 81 | # common_functions.py
 82 | try:
 83 |     api_token = read_key('gh_key')
 84 | 
 85 | except:
 86 |     print("Error reading GH Key. This script depends on the existance of a file called gh_key containing your GitHub API token. Exiting")
 87 |     sys.exit()
 88 | 
 89 | def get_pr_data(api_token, org_name, repo_name, num_pages):
 90 |     """Executes the GraphQL query to get PR data from a repository.
 91 | 
 92 |     Parameters
 93 |     ----------
 94 |     api_token : str
 95 |         The GH API token retrieved from the gh_key file.
 96 |     org_name : str
 97 |         The name of the GitHub organization where your repo is found
 98 |     repo_name : str
 99 |         The name of a GitHub repository in that org where your PRs can be found 
100 |     num_pages : int
101 |         The number of pages of results with 10 results per page (10 = 100 results)
102 | 
103 |     Returns
104 |     -------
105 |     repo_info_df : pandas.core.frame.DataFrame
106 |     """
107 |     import requests
108 |     import json
109 |     import pandas as pd
110 | 
111 |     url = 'https://api.github.com/graphql'
112 |     headers = {'Authorization': 'token %s' % api_token}
113 |     
114 |     repo_info_df = pd.DataFrame()
115 | 
116 |     has_previous_page = True
117 |     before_cursor = None
118 | 
119 |     i = 1 # Iterator starts at page 1
120 | 
121 |     while has_previous_page and i <= num_pages:
122 |         i+=1
123 |         try:
124 |             query = make_query(before_cursor)
125 | 
126 |             variables = {"org_name": org_name, "repo_name": repo_name}
127 |             r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
128 |             json_data = json.loads(r.text)
129 | 
130 |             df_temp = pd.DataFrame(json_data["data"]["repository"]["pullRequests"]["nodes"])
131 |             repo_info_df = pd.concat([repo_info_df, df_temp])
132 | 
133 |             has_previous_page = json_data["data"]["repository"]["pullRequests"]["pageInfo"]["hasPreviousPage"]
134 |             before_cursor = json_data["data"]["repository"]["pullRequests"]["pageInfo"]["startCursor"]
135 | 
136 |             status = "OK"
137 |         except:
138 |             has_previous_page = False
139 |             status = "Error"
140 | 
141 |     return repo_info_df, status
142 | 
143 | repo_info_df, status = get_pr_data(api_token, args.org_name, args.repo_name, args.num_pages)
144 | 
145 | def expand_author(author):
146 |     import pandas as pd
147 |     if pd.isnull(author):
148 |         author_list = [None, None, None]
149 |     else:
150 |         author_list = [author['login'], author['name'], author['pullRequests']['totalCount']]
151 |     return author_list
152 | 
153 | def expand_count(comments):
154 |     import pandas as pd
155 |     if pd.isnull(comments):
156 |         comment_ct = 0
157 |     else:
158 |         comment_ct = comments['totalCount']
159 |     return comment_ct
160 | 
161 | repo_info_df['author_list'] = repo_info_df['author'].apply(expand_author)
162 | repo_info_df[['author_login', 'author_name', 'author_pr_count', ]] = pd.DataFrame(repo_info_df.author_list.tolist(), index= repo_info_df.index)
163 | repo_info_df['comment_ct'] = repo_info_df['comments'].apply(expand_count)
164 | repo_info_df = repo_info_df.drop(columns=['author','author_list','comments'])
165 | 
166 | # prepare file and write dataframe to csv
167 | 
168 | try:
169 |     file, file_path = create_file("pr_activity")
170 |     repo_info_df.to_csv(file_path, index=False)
171 | 
172 | except:
173 |     print('Could not write to csv file. This may be because the output directory is missing or you do not have permissions to write to it. Exiting')
174 | 
175 | 


--------------------------------------------------------------------------------
/scripts/repo_activity.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | 
  6 | """Repo Activity GraphQL Version
  7 | This script uses the GitHub GraphQL API to retrieve relevant
  8 | information about all repositories from one or more GitHub
  9 | orgs.
 10 | 
 11 | We use this script to gather basic data about the repositories
 12 | found in dozens of GitHub orgs. We use this to understand whether
 13 | projects are meeting our compliance requirements. We also use this 
 14 | script to find abandoned repos that have outlived their usefulness
 15 | and should be archived.
 16 | 
 17 | As input, this script requires a file named 'orgs.txt' containing
 18 | the name of one GitHub org per line residing in the same folder 
 19 | as this script.
 20 | 
 21 | Your API key should be stored in a file called gh_key in the
 22 | same folder as this script.
 23 | 
 24 | This script requires that `pandas` be installed within the Python
 25 | environment you are running this script in.
 26 | 
 27 | As output:
 28 | * A message about each org being processed will be printed to the screen.
 29 | * the script creates a csv file stored in an subdirectory
 30 |   of the folder with the script called "output" with the filename in 
 31 |   this format with today's date.
 32 | 
 33 | output/a_repo_activity_2022-01-14.csv"
 34 | """
 35 | 
 36 | import sys
 37 | import pandas as pd
 38 | 
 39 | from common_functions import read_key, expand_name_df, create_file
 40 | 
 41 | def make_query(after_cursor = None):
 42 |     """Creates and returns a GraphQL query with cursor for pagination"""
 43 | 
 44 |     return """query RepoQuery($org_name: String!) {
 45 |              organization(login: $org_name) {
 46 |                repositories (first: 100 after: AFTER){
 47 |                  pageInfo {
 48 |                    hasNextPage
 49 |                    endCursor
 50 |                  }
 51 |                  nodes { 
 52 |                    nameWithOwner
 53 |                    name
 54 |                    licenseInfo {
 55 |                      name
 56 |                    }
 57 |                    isPrivate
 58 |                    isFork
 59 |                    isEmpty
 60 |                    isArchived
 61 |                    forkCount
 62 |                    stargazerCount
 63 |                    createdAt
 64 |                    updatedAt
 65 |                    pushedAt
 66 |                    defaultBranchRef {
 67 |                      name 
 68 |                      target{
 69 |                         ... on Commit{
 70 |                             history(first:1){
 71 |                         edges{
 72 |                             node{
 73 |                                 ... on Commit{
 74 |                                     committedDate
 75 |                                     author{
 76 |                                       name
 77 |                                       email
 78 |                                       user{
 79 |                                         login
 80 |                                       }
 81 |                                     }
 82 |                             }
 83 |                         }
 84 |                     }
 85 |                 }
 86 |                    }
 87 |                  }
 88 |                }
 89 |               }
 90 |               }
 91 |               }
 92 |             }""".replace(
 93 |         "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
 94 |     )
 95 | 
 96 | # Read GitHub key from file using the read_key function in 
 97 | # common_functions.py
 98 | try:
 99 |     api_token = read_key('gh_key')
100 | 
101 | except:
102 |     print("Error reading GH Key. This script depends on the existence of a file called gh_key containing your GitHub API token. Exiting")
103 |     sys.exit()
104 | 
105 | def get_repo_data(api_token):
106 |     """Executes the GraphQL query to get repository data from one or more GitHub orgs.
107 | 
108 |     Parameters
109 |     ----------
110 |     api_token : str
111 |         The GH API token retrieved from the gh_key file.
112 | 
113 |     Returns
114 |     -------
115 |     repo_info_df : pandas.core.frame.DataFrame
116 |     """
117 |     import requests
118 |     import json
119 |     import pandas as pd
120 |     from common_functions import read_orgs
121 | 
122 |     url = 'https://api.github.com/graphql'
123 |     headers = {'Authorization': 'token %s' % api_token}
124 |     
125 |     repo_info_df = pd.DataFrame()
126 |     
127 |     # Read list of orgs from a file
128 | 
129 |     try:
130 |         org_list = read_orgs('orgs.txt')
131 |     except:
132 |         print("Error reading orgs. This script depends on the existance of a file called orgs.txt containing one org per line. Exiting")
133 |         sys.exit()
134 |     
135 |     for org_name in org_list:  
136 |         has_next_page = True
137 |         after_cursor = None
138 |     
139 |         print("Processing", org_name)
140 | 
141 |         while has_next_page:
142 | 
143 |             try:
144 |                 query = make_query(after_cursor)
145 | 
146 |                 variables = {"org_name": org_name}
147 |                 r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
148 |                 json_data = json.loads(r.text)
149 | 
150 |                 df_temp = pd.DataFrame(json_data['data']['organization']['repositories']['nodes'])
151 |                 repo_info_df = pd.concat([repo_info_df, df_temp])
152 | 
153 |                 has_next_page = json_data["data"]["organization"]["repositories"]["pageInfo"]["hasNextPage"]
154 | 
155 |                 after_cursor = json_data["data"]["organization"]["repositories"]["pageInfo"]["endCursor"]
156 |             except:
157 |                 has_next_page = False
158 |                 print("ERROR Cannot process", org_name)
159 |         
160 |     return repo_info_df
161 | 
162 | repo_info_df = get_repo_data(api_token)
163 | 
164 | # This section reformats the output into what we need in the csv file
165 | 
166 | repo_info_df["org"] = repo_info_df["nameWithOwner"].str.split('/').str[0]
167 | 
168 | repo_info_df = expand_name_df(repo_info_df,'licenseInfo','license')
169 | repo_info_df = repo_info_df.drop(columns=['licenseInfo'])
170 | 
171 | repo_info_df = expand_name_df(repo_info_df,'defaultBranchRef','defaultBranch')
172 | 
173 | def expand_commits(commits):
174 |     if pd.isnull(commits):
175 |         commits_list = [None, None, None, None]
176 |     else:
177 |         node = commits['target']['history']['edges'][0]['node']
178 |         try:
179 |             commit_date = node['committedDate']
180 |         except:
181 |             commit_date = None
182 |         try:
183 |             author_name = node['author']['name']
184 |         except:
185 |             author_name = None
186 |         try:
187 |             author_email = node['author']['email']
188 |         except:
189 |             author_email = None
190 |         try:
191 |             author_login = node['author']['user']['login']
192 |         except:
193 |             author_login = None
194 |         commits_list = [commit_date, author_name, author_email, author_login]
195 |     return commits_list
196 | 
197 | repo_info_df['commits_list'] = repo_info_df['defaultBranchRef'].apply(expand_commits)
198 | repo_info_df[['last_commit_date','author_name','author_email', 'author_login']] = pd.DataFrame(repo_info_df.commits_list.tolist(), index= repo_info_df.index)
199 | repo_info_df = repo_info_df.drop(columns=['commits_list','defaultBranchRef'])
200 | 
201 | repo_info_df = repo_info_df[['org','name','nameWithOwner','license','defaultBranch','isPrivate','isFork','isArchived', 'forkCount', 'stargazerCount', 'isEmpty', 'createdAt', 'updatedAt','pushedAt','last_commit_date','author_login','author_name','author_email']] 
202 | 
203 | # prepare file and write dataframe to csv
204 | 
205 | try:
206 |     file, file_path = create_file("a_repo_activity")
207 |     repo_info_df.to_csv(file_path, index=False)
208 | 
209 | except:
210 |     print('Could not write to csv file. This may be because the output directory is missing or you do not have permissions to write to it. Exiting')
211 | 
212 | 


--------------------------------------------------------------------------------
/scripts/repo_activity_coc.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | 
  6 | """Repo Activity GraphQL Version with Code of Conduct
  7 | This script uses the GitHub GraphQL API to retrieve relevant
  8 | information about all repositories from one or more GitHub
  9 | orgs.
 10 | 
 11 | Note: This is identical to repo_activity.py, but it adds info about the
 12 | code of conduct and CONTRIBUTING.md files. 
 13 | 
 14 | This is a separate script because the codeOfConduct object and getting
 15 | info about files in the GraphQL API is a bit problematic and tends to time
 16 | out when getting relatively small amounts of data, so this version gets
 17 | data from only 10 repos at a time, instead of 100 in the other script.
 18 | 
 19 | Note that it will only find CONTRIBUTING.md files that match that exact 
 20 | name and case (not contributing.md or contributing.rst) in public, but
 21 | not private repos.
 22 | 
 23 | We use this script to gather basic data about the repositories
 24 | found in dozens of GitHub orgs. We use this to understand whether
 25 | projects are meeting our compliance requirements. We also use this 
 26 | script to find abandoned repos that have outlived their usefulness
 27 | and should be archived.
 28 | 
 29 | As input, this script requires a file named 'orgs.txt' containing
 30 | the name of one GitHub org per line residing in the same folder 
 31 | as this script.
 32 | 
 33 | Your API key should be stored in a file called gh_key in the
 34 | same folder as this script.
 35 | 
 36 | This script requires that `pandas` be installed within the Python
 37 | environment you are running this script in.
 38 | 
 39 | As output:
 40 | * A message about each org being processed will be printed to the screen.
 41 | * the script creates a csv file stored in an subdirectory
 42 |   of the folder with the script called "output" with the filename in 
 43 |   this format with today's date.
 44 | 
 45 | output/a_repo_activity_2022-01-14.csv"
 46 | """
 47 | 
 48 | import sys
 49 | import pandas as pd
 50 | from common_functions import read_key, expand_name_df, create_file
 51 | 
 52 | def make_query(after_cursor = None):
 53 |     """Creates and returns a GraphQL query with cursor for pagination"""
 54 | 
 55 |     return """query RepoQuery($org_name: String!) {
 56 |              organization(login: $org_name) {
 57 |                repositories (first: 10 after: AFTER){
 58 |                  pageInfo {
 59 |                    hasNextPage
 60 |                    endCursor
 61 |                  }
 62 |                  nodes { 
 63 |                    nameWithOwner
 64 |                    name
 65 |                    licenseInfo {
 66 |                      name
 67 |                    }
 68 |                    codeOfConduct{
 69 |                      url
 70 |                    }
 71 |                    content: object(expression: "HEAD:CONTRIBUTING.md") {
 72 |                      ... on Blob {
 73 |                        abbreviatedOid
 74 |                        }
 75 |                    }
 76 |                    isPrivate
 77 |                    isFork
 78 |                    isEmpty
 79 |                    isArchived
 80 |                    forkCount
 81 |                    stargazerCount
 82 |                    createdAt
 83 |                    updatedAt
 84 |                    pushedAt
 85 |                    defaultBranchRef {
 86 |                      name 
 87 |                      target{
 88 |                         ... on Commit{
 89 |                             history(first:1){
 90 |                         edges{
 91 |                             node{
 92 |                                 ... on Commit{
 93 |                                     committedDate
 94 |                                     author{
 95 |                                       name
 96 |                                       email
 97 |                                       user{
 98 |                                         login
 99 |                                       }
100 |                                     }
101 |                             }
102 |                         }
103 |                     }
104 |                 }
105 |                    }
106 |                  }
107 |                }
108 |               }
109 |               }
110 |               }
111 |             }""".replace(
112 |         "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
113 |     )
114 | 
115 | # Read GitHub key from file using the read_key function in 
116 | # common_functions.py
117 | try:
118 |     api_token = read_key('gh_key')
119 | 
120 | except:
121 |     print("Error reading GH Key. This script depends on the existance of a file called gh_key containing your GitHub API token. Exiting")
122 |     sys.exit()
123 | 
124 | def get_repo_data(api_token):
125 |     """Executes the GraphQL query to get repository data from one or more GitHub orgs.
126 | 
127 |     Parameters
128 |     ----------
129 |     api_token : str
130 |         The GH API token retrieved from the gh_key file.
131 | 
132 |     Returns
133 |     -------
134 |     repo_info_df : pandas.core.frame.DataFrame
135 |     """
136 |     import requests
137 |     import json
138 |     import pandas as pd
139 |     from common_functions import read_orgs
140 | 
141 |     url = 'https://api.github.com/graphql'
142 |     headers = {'Authorization': 'token %s' % api_token}
143 |     
144 |     repo_info_df = pd.DataFrame()
145 |     
146 |     # Read list of orgs from a file
147 | 
148 |     try:
149 |         org_list = read_orgs('orgs.txt')
150 |     except:
151 |         print("Error reading orgs. This script depends on the existance of a file called orgs.txt containing one org per line. Exiting")
152 |         sys.exit()
153 |     
154 |     for org_name in org_list:  
155 |         has_next_page = True
156 |         after_cursor = None
157 |     
158 |         print("Processing", org_name)
159 | 
160 |         while has_next_page:
161 | 
162 |             try:
163 |                 query = make_query(after_cursor)
164 | 
165 |                 variables = {"org_name": org_name}
166 |                 r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
167 |                 json_data = json.loads(r.text)
168 | 
169 |                 df_temp = pd.DataFrame(json_data['data']['organization']['repositories']['nodes'])
170 |                 repo_info_df = pd.concat([repo_info_df, df_temp])
171 | 
172 |                 has_next_page = json_data["data"]["organization"]["repositories"]["pageInfo"]["hasNextPage"]
173 | 
174 |                 after_cursor = json_data["data"]["organization"]["repositories"]["pageInfo"]["endCursor"]
175 |             except:
176 |                 has_next_page = False
177 |                 print("ERROR Cannot process", org_name)
178 |         
179 |     return repo_info_df
180 | 
181 | repo_info_df = get_repo_data(api_token)
182 | 
183 | # This section reformats the output into what we need in the csv file
184 | 
185 | repo_info_df["org"] = repo_info_df["nameWithOwner"].str.split('/').str[0]
186 | 
187 | repo_info_df = expand_name_df(repo_info_df,'licenseInfo','license')
188 | repo_info_df = repo_info_df.drop(columns=['licenseInfo'])
189 | 
190 | repo_info_df = expand_name_df(repo_info_df,'defaultBranchRef','defaultBranch')
191 | 
192 | def expand_coc(coc):
193 |     if pd.isnull(coc):
194 |         coc_url = 'Likely Missing'
195 |     else:
196 |         coc_url = coc['url']
197 |     return coc_url
198 | 
199 | repo_info_df['codeOfConduct_url'] = repo_info_df['codeOfConduct'].apply(expand_coc)
200 | repo_info_df = repo_info_df.drop(columns=['codeOfConduct'])
201 | 
202 | def expand_contrib(contrib):
203 |     # Note that the script only finds the file if it exactly matches CONTRIBUTING.md
204 |     # and will not find contributing.md, CONTRIBUTING.rst, or other variations
205 |     if pd.isnull(contrib):
206 |         contrib_file = 'Missing Private or not CONTRIBUTING.md'
207 |     else:
208 |         contrib_file = 'CONTRIBUTING.md'
209 |     return contrib_file
210 | 
211 | repo_info_df['contrib_file'] = repo_info_df['content'].apply(expand_contrib)
212 | repo_info_df = repo_info_df.drop(columns=['content'])
213 | 
214 | def expand_commits(commits):
215 |     if pd.isnull(commits):
216 |         commits_list = [None, None, None, None]
217 |     else:
218 |         node = commits['target']['history']['edges'][0]['node']
219 |         try:
220 |             commit_date = node['committedDate']
221 |         except:
222 |             commit_date = None
223 |         try:
224 |             author_name = node['author']['name']
225 |         except:
226 |             author_name = None
227 |         try:
228 |             author_email = node['author']['email']
229 |         except:
230 |             author_email = None
231 |         try:
232 |             author_login = node['author']['user']['login']
233 |         except:
234 |             author_login = None
235 |         commits_list = [commit_date, author_name, author_email, author_login]
236 |     return commits_list
237 | 
238 | repo_info_df['commits_list'] = repo_info_df['defaultBranchRef'].apply(expand_commits)
239 | repo_info_df[['last_commit_date','author_name','author_email', 'author_login']] = pd.DataFrame(repo_info_df.commits_list.tolist(), index= repo_info_df.index)
240 | repo_info_df = repo_info_df.drop(columns=['commits_list','defaultBranchRef'])
241 | 
242 | repo_info_df = repo_info_df[['org','name','nameWithOwner','license','defaultBranch','codeOfConduct_url', 'contrib_file', 'isPrivate','isFork','isArchived', 'forkCount', 'stargazerCount', 'isEmpty', 'createdAt', 'updatedAt','pushedAt','last_commit_date','author_login','author_name','author_email']] 
243 | 
244 | # prepare file and write dataframe to csv
245 | 
246 | try:
247 |     file, file_path = create_file("a_repo_activity")
248 |     repo_info_df.to_csv(file_path, index=False)
249 | 
250 | except:
251 |     print('Could not write to csv file. This may be because the output directory is missing or you do not have permissions to write to it. Exiting')
252 | 
253 | 


--------------------------------------------------------------------------------
/notebooks/org_info.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "Copyright 2022 VMware, Inc.\n",
  8 |     "SPDX-License-Identifier: BSD-2-Clause\n",
  9 |     "\n",
 10 |     "This notebook uses the GitHub GraphQL API to retrieve relevant\n",
 11 |     "information about one or more GitHub orgs.\n",
 12 |     "\n",
 13 |     "This is a simplified version of `scripts/mystery_orgs.py` that can be used to gather basic data about GitHub orgs.\n",
 14 |     "\n",
 15 |     "I created it as one way to learn more about orgs that\n",
 16 |     "we believe may have been created outside of our process by various\n",
 17 |     "employees across our business units. We gather the first few members\n",
 18 |     "of the org to help identify employees who can provide more details\n",
 19 |     "about the purpose of the org and how it is used.\n",
 20 |     "\n",
 21 |     "Your API key should be stored in a file called gh_key in the\n",
 22 |     "same folder as this script.\n",
 23 |     "\n",
 24 |     "The orgs investigated can be specified in the cell below as part of org_list."
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 24,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "import requests\n",
 34 |     "import json\n",
 35 |     "import pandas as pd\n",
 36 |     "\n",
 37 |     "# Assumes your GH API token is in a file called gh_key in this directory\n",
 38 |     "with open('gh_key', 'r') as kf:\n",
 39 |     "    api_token = kf.readline().rstrip() # remove newline & trailing whitespace\n",
 40 |     "    \n",
 41 |     "org_list = [\"Moonkube\", \"ModernAppsNinja\"]"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": 25,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "def make_query():\n",
 51 |     "    return \"\"\"query OrgQuery($org_name: String!) {\n",
 52 |     "             organization(login:$org_name) {\n",
 53 |     "               name\n",
 54 |     "               url\n",
 55 |     "               createdAt\n",
 56 |     "               updatedAt\n",
 57 |     "               membersWithRole(first: 15){\n",
 58 |     "                 nodes{\n",
 59 |     "                   login\n",
 60 |     "                   name\n",
 61 |     "                   email\n",
 62 |     "                   company\n",
 63 |     "                 }\n",
 64 |     "               }\n",
 65 |     "              }\n",
 66 |     "            }\"\"\""
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "code",
 71 |    "execution_count": 26,
 72 |    "metadata": {},
 73 |    "outputs": [],
 74 |    "source": [
 75 |     "def get_org_data(api_token, org_list):\n",
 76 |     "    import requests\n",
 77 |     "    import json\n",
 78 |     "    import csv\n",
 79 |     "\n",
 80 |     "    url = 'https://api.github.com/graphql'\n",
 81 |     "    headers = {'Authorization': 'token %s' % api_token}\n",
 82 |     "    \n",
 83 |     "    org_info_df = pd.DataFrame()\n",
 84 |     "    \n",
 85 |     "    all_rows = [['org_name', 'org_url', 'org_createdAt', 'org_updatedAt', 'people(login,name,email,company):repeat']]\n",
 86 |     "    \n",
 87 |     "    for org_name in org_list:\n",
 88 |     "\n",
 89 |     "        row = []\n",
 90 |     "        query = make_query()\n",
 91 |     "\n",
 92 |     "        variables = {\"org_name\": org_name}\n",
 93 |     "        r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)\n",
 94 |     "        json_data = json.loads(r.text)\n",
 95 |     "        \n",
 96 |     "        df_temp = pd.DataFrame(json_data['data']['organization'])\n",
 97 |     "        org_info_df = org_info_df.append(df_temp, ignore_index=True)\n",
 98 |     "        \n",
 99 |     "        for key in json_data['data']['organization']:\n",
100 |     "            if key == 'membersWithRole':\n",
101 |     "                for nkey in json_data['data']['organization'][key]['nodes']:\n",
102 |     "                    row.append(nkey['login'])\n",
103 |     "                    row.append(nkey['name'])\n",
104 |     "                    row.append(nkey['email'])\n",
105 |     "                    row.append(nkey['company'])\n",
106 |     "            else:\n",
107 |     "                row.append(json_data['data']['organization'][key])\n",
108 |     "        all_rows.append(row)\n",
109 |     "    \n",
110 |     "    return org_info_df, all_rows\n",
111 |     "\n",
112 |     "org_info_df, all_rows = get_org_data(api_token, org_list)"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": 27,
118 |    "metadata": {},
119 |    "outputs": [
120 |     {
121 |      "data": {
122 |       "text/html": [
123 |        "<div>\n",
124 |        "<style scoped>\n",
125 |        "    .dataframe tbody tr th:only-of-type {\n",
126 |        "        vertical-align: middle;\n",
127 |        "    }\n",
128 |        "\n",
129 |        "    .dataframe tbody tr th {\n",
130 |        "        vertical-align: top;\n",
131 |        "    }\n",
132 |        "\n",
133 |        "    .dataframe thead th {\n",
134 |        "        text-align: right;\n",
135 |        "    }\n",
136 |        "</style>\n",
137 |        "<table border=\"1\" class=\"dataframe\">\n",
138 |        "  <thead>\n",
139 |        "    <tr style=\"text-align: right;\">\n",
140 |        "      <th></th>\n",
141 |        "      <th>name</th>\n",
142 |        "      <th>url</th>\n",
143 |        "      <th>createdAt</th>\n",
144 |        "      <th>updatedAt</th>\n",
145 |        "      <th>membersWithRole</th>\n",
146 |        "    </tr>\n",
147 |        "  </thead>\n",
148 |        "  <tbody>\n",
149 |        "    <tr>\n",
150 |        "      <th>0</th>\n",
151 |        "      <td>moonkube</td>\n",
152 |        "      <td>https://github.com/moonkube</td>\n",
153 |        "      <td>2020-06-29T23:19:15Z</td>\n",
154 |        "      <td>2020-06-29T23:19:15Z</td>\n",
155 |        "      <td>[]</td>\n",
156 |        "    </tr>\n",
157 |        "    <tr>\n",
158 |        "      <th>1</th>\n",
159 |        "      <td>ModernAppsNinja</td>\n",
160 |        "      <td>https://github.com/ModernAppsNinja</td>\n",
161 |        "      <td>2020-01-26T22:51:20Z</td>\n",
162 |        "      <td>2020-08-05T09:18:34Z</td>\n",
163 |        "      <td>[{'login': 'yogendra', 'name': 'Yogendra Rampu...</td>\n",
164 |        "    </tr>\n",
165 |        "  </tbody>\n",
166 |        "</table>\n",
167 |        "</div>"
168 |       ],
169 |       "text/plain": [
170 |        "              name                                 url             createdAt  \\\n",
171 |        "0         moonkube         https://github.com/moonkube  2020-06-29T23:19:15Z   \n",
172 |        "1  ModernAppsNinja  https://github.com/ModernAppsNinja  2020-01-26T22:51:20Z   \n",
173 |        "\n",
174 |        "              updatedAt                                    membersWithRole  \n",
175 |        "0  2020-06-29T23:19:15Z                                                 []  \n",
176 |        "1  2020-08-05T09:18:34Z  [{'login': 'yogendra', 'name': 'Yogendra Rampu...  "
177 |       ]
178 |      },
179 |      "execution_count": 27,
180 |      "metadata": {},
181 |      "output_type": "execute_result"
182 |     }
183 |    ],
184 |    "source": [
185 |     "# The dataframe where the info can be found.\n",
186 |     "org_info_df"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "code",
191 |    "execution_count": 20,
192 |    "metadata": {},
193 |    "outputs": [
194 |     {
195 |      "data": {
196 |       "text/plain": [
197 |        "[['org_name',\n",
198 |        "  'org_url',\n",
199 |        "  'org_createdAt',\n",
200 |        "  'org_updatedAt',\n",
201 |        "  'people(login,name,email,company):repeat'],\n",
202 |        " ['moonkube',\n",
203 |        "  'https://github.com/moonkube',\n",
204 |        "  '2020-06-29T23:19:15Z',\n",
205 |        "  '2020-06-29T23:19:15Z'],\n",
206 |        " ['ModernAppsNinja',\n",
207 |        "  'https://github.com/ModernAppsNinja',\n",
208 |        "  '2020-01-26T22:51:20Z',\n",
209 |        "  '2020-08-05T09:18:34Z',\n",
210 |        "  'yogendra',\n",
211 |        "  'Yogendra Rampuria - Yogi',\n",
212 |        "  '',\n",
213 |        "  '@yugabyte-db',\n",
214 |        "  'sammcgeown',\n",
215 |        "  'Sam McGeown',\n",
216 |        "  '',\n",
217 |        "  '@vmware',\n",
218 |        "  'pipuz',\n",
219 |        "  'Gianni',\n",
220 |        "  '',\n",
221 |        "  None,\n",
222 |        "  'MAHDTech',\n",
223 |        "  'MAHDTech',\n",
224 |        "  'MAHDTech@saltlabs.tech',\n",
225 |        "  '@salt-labs ',\n",
226 |        "  'afewell',\n",
227 |        "  'Arthur Fewell',\n",
228 |        "  '',\n",
229 |        "  'VMware',\n",
230 |        "  'ansergit',\n",
231 |        "  'Anser Arif',\n",
232 |        "  '',\n",
233 |        "  '@vmware',\n",
234 |        "  'afewellvmware',\n",
235 |        "  None,\n",
236 |        "  '',\n",
237 |        "  None,\n",
238 |        "  'guyzsarun',\n",
239 |        "  'Sarun Nuntaviriyakul',\n",
240 |        "  '',\n",
241 |        "  '@vmware',\n",
242 |        "  'yashitanamdeo',\n",
243 |        "  'Yashita Namdeo',\n",
244 |        "  'yashita.namdeo2000@gmail.com',\n",
245 |        "  'Shri Vaishnav Vidyapeeth Vishwavidyalaya',\n",
246 |        "  'shashwatbangar',\n",
247 |        "  'Shashwat Bangar',\n",
248 |        "  '',\n",
249 |        "  'Shri Vaishnav Vidyapeeth Vishwavidyalaya',\n",
250 |        "  'AttentiveAryan',\n",
251 |        "  'Attentive Aryan',\n",
252 |        "  'AttentiveAryan@gmail.com',\n",
253 |        "  'Attentive Aryan']]"
254 |       ]
255 |      },
256 |      "execution_count": 20,
257 |      "metadata": {},
258 |      "output_type": "execute_result"
259 |     }
260 |    ],
261 |    "source": [
262 |     "# Another way to look at this data is as a list of rows that could be written to a csv file.\n",
263 |     "all_rows"
264 |    ]
265 |   }
266 |  ],
267 |  "metadata": {
268 |   "kernelspec": {
269 |    "display_name": "Python 3",
270 |    "language": "python",
271 |    "name": "python3"
272 |   },
273 |   "language_info": {
274 |    "codemirror_mode": {
275 |     "name": "ipython",
276 |     "version": 3
277 |    },
278 |    "file_extension": ".py",
279 |    "mimetype": "text/x-python",
280 |    "name": "python",
281 |    "nbconvert_exporter": "python",
282 |    "pygments_lexer": "ipython3",
283 |    "version": "3.8.5"
284 |   }
285 |  },
286 |  "nbformat": 4,
287 |  "nbformat_minor": 4
288 | }
289 | 


--------------------------------------------------------------------------------
/scripts/commits_people.py:
--------------------------------------------------------------------------------
  1 | # Copyright Dawn M. Foster
  2 | # SPDX-License-Identifier: MIT
  3 | 
  4 | # DEPRECATED. This script has been moved to the CHAOSS Data Science WG repo
  5 | # https://github.com/chaoss/wg-data-science/blob/main/dataset/license-changes/fork-case-study/commits_people.py
  6 | 
  7 | """Gets Commit Data 
  8 | This is aggregated per person for a repo between two specified dates.
  9 | I'm currently using this to better understand who contributes to a project
 10 | before and after a key time in the project (relicense / fork) with a focus on
 11 | understanding organizational diversity.
 12 | 
 13 | Output (files are stored in the output directory)
 14 | * GitHub API response code (should be "<Response [200]>")
 15 | * Commit data pickle file containing a dataframe
 16 | * Person pickle file containing a dictionary
 17 | """
 18 | 
 19 | import sys
 20 | import pandas as pd
 21 | import argparse
 22 | import requests
 23 | import json
 24 | 
 25 | from common_functions import read_key
 26 | 
 27 | # Read arguments from command line
 28 | parser = argparse.ArgumentParser()
 29 | 
 30 | parser.add_argument("-t", "--token", dest = "gh_key", help="GitHub Personal Access Token")
 31 | parser.add_argument("-u", "--url", dest = "gh_url", help="URL for a GitHub repository")
 32 | parser.add_argument("-b", "--begin_date", dest = "begin_date", help="Date in the format YYYY-MM-DD - gather commits after this begin date")
 33 | parser.add_argument("-e", "--end_date", dest = "end_date", help="Date in the format YYYY-MM-DD - gather commits up until this end date")
 34 | 
 35 | args = parser.parse_args()
 36 | 
 37 | gh_url = args.gh_url
 38 | gh_key = args.gh_key
 39 | since_date = args.begin_date + "T00:00:00.000+00:00"
 40 | until_date = args.end_date + "T00:00:00.000+00:00"
 41 | 
 42 | url_parts = gh_url.strip('/').split('/')
 43 | org_name = url_parts[3]
 44 | repo_name = url_parts[4]
 45 | 
 46 | # Read GitHub key from file using the read_key function in 
 47 | # common_functions.py
 48 | try:
 49 |     api_token = read_key(gh_key)
 50 | 
 51 | except:
 52 |     print("Error reading GH Key. This script depends on the existence of a file called gh_key containing your GitHub API token. Exiting")
 53 |     sys.exit()
 54 | 
 55 | pickle_file = 'output/' + repo_name + str(since_date) + str(until_date) + '.pkl'
 56 | 
 57 | def make_query(after_cursor = None):
 58 |     return """query repo_commits($org_name: String!, $repo_name: String!, $since_date: GitTimestamp!, $until_date: GitTimestamp!){
 59 |                repository(owner: $org_name, name: $repo_name) {
 60 |   ... on Repository{
 61 |       defaultBranchRef{
 62 |           target{
 63 |               ... on Commit{
 64 |                   history(since: $since_date, until: $until_date, first: 100 after: AFTER){
 65 |                       pageInfo {
 66 |                          hasNextPage
 67 |                          endCursor
 68 |                       }
 69 |                       edges{
 70 |                           node{
 71 |                               ... on Commit{
 72 |                                 committedDate
 73 |                                 deletions
 74 |                                 additions
 75 |                                 oid
 76 |                                 authors(first:100) {
 77 |                                   nodes {
 78 |                                     date
 79 |                                     email
 80 |                                     user {
 81 |                                       login
 82 |                                       company
 83 |                                       email
 84 |                                       name
 85 |                                       }
 86 |                                     }
 87 |                                   }
 88 |                                 }
 89 |                             }
 90 |                         }
 91 |                     }
 92 |                 }
 93 |             }
 94 |         }
 95 |     }
 96 |   }
 97 |             }""".replace(
 98 |         "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
 99 |             )
100 | 
101 | def get_data(api_token, org_name, repo_name, since_date, until_date):
102 |     """Executes the GraphQL query to get data from one GitHub repo.
103 | 
104 |     Returns
105 |     -------
106 |     repo_info_df : pandas.core.frame.DataFrame
107 |     """
108 |     
109 |     url = 'https://api.github.com/graphql'
110 |     headers = {'Authorization': 'token %s' % api_token}
111 |     
112 |     repo_info_df = pd.DataFrame()
113 |     
114 |     has_next_page = True
115 |     after_cursor = None
116 | 
117 |     while has_next_page:
118 | 
119 |         query = make_query(after_cursor)
120 | 
121 |         variables = {"org_name": org_name, "repo_name": repo_name, "since_date": since_date, "until_date": until_date}
122 |         r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
123 |         print(r)
124 |         json_data = json.loads(r.text)
125 | 
126 |         df_temp = pd.DataFrame(json_data['data']['repository']['defaultBranchRef']['target']['history']['edges'])
127 | 
128 |         repo_info_df = repo_info_df.append(df_temp, ignore_index=True)
129 | 
130 |         has_next_page = json_data['data']['repository']['defaultBranchRef']['target']['history']["pageInfo"]["hasNextPage"]
131 | 
132 |         after_cursor = json_data['data']['repository']['defaultBranchRef']['target']['history']["pageInfo"]["endCursor"]
133 |         
134 |     return repo_info_df
135 | 
136 | repo_info_df = get_data(api_token, org_name, repo_name, since_date, until_date)
137 | 
138 | def expand_commits(commits):
139 |     if pd.isnull(commits):
140 |         commits_list = [None, None, None, None, None]
141 |     else:
142 |         node = commits
143 |         try:
144 |             commit_date = node['committedDate']
145 |         except:
146 |             commit_date = None
147 |         try:
148 |             dels = node['deletions']
149 |         except:
150 |             dels = None
151 |         try:
152 |             adds = node['additions']
153 |         except:
154 |             adds = None
155 |         try:
156 |             oid = node['oid']
157 |         except:
158 |             oid = None
159 |         try:
160 |             author = node['authors']['nodes']
161 |         except:
162 |             author = None
163 |         commits_list = [commit_date, dels, adds, oid, author]
164 |     return commits_list
165 | 
166 | repo_info_df['commits_list'] = repo_info_df['node'].apply(expand_commits)
167 | repo_info_df[['commit_date','deletions', 'additions','oid','author']] = pd.DataFrame(repo_info_df.commits_list.tolist(), index= repo_info_df.index)
168 | #repo_info_df = repo_info_df.drop(columns=['commits_list'])
169 | repo_info_df
170 | repo_info_df.to_pickle(pickle_file)
171 | 
172 | def create_person_dict(pickle_file, repo_name, since_date, until_date):
173 |     import collections
174 |     import pickle
175 | 
176 |     repo_info_df = pd.read_pickle(pickle_file)
177 | 
178 |     output_pickle = 'output/' + repo_name + '_people_' + str(since_date) + str(until_date) + '.pkl'
179 | 
180 |     # Create a dictionary for each person with the key being the gh login
181 |     # Create a dict for commits that aren't tied to a gh login (gh user = None)
182 |     person_dict=collections.defaultdict(dict)
183 |     fail_person_dict=collections.defaultdict(dict)
184 | 
185 |     for x in repo_info_df.iterrows():
186 |         data = x[1]
187 | 
188 |         for y in data['author']:
189 |             try:
190 |                 login = y['user']['login']
191 |                 company = y['user']['company']
192 |                 commit_email = y['email']
193 |                 login_email = y['user']['email']
194 |                 name = y['user']['name']
195 | 
196 |                 if person_dict[login]:
197 |                     person_dict[login]['commits'] = person_dict[login]['commits'] + 1
198 |                     person_dict[login]['additions'] = person_dict[login]['additions'] + data['additions']
199 |                     person_dict[login]['deletions'] = person_dict[login]['deletions'] + data['deletions']
200 |                     if commit_email not in person_dict[login]['email']:
201 |                         person_dict[login]['email'].append(commit_email)
202 |                 else:
203 |                     person_dict[login]['company'] = company
204 |                     person_dict[login]['name'] = name
205 |                     person_dict[login]['commits'] = 1
206 |                     person_dict[login]['additions'] = data['additions']
207 |                     person_dict[login]['deletions'] = data['deletions']
208 |                     if len(login_email) == 0:
209 |                         person_dict[login]['email'] = [commit_email]
210 |                     elif commit_email == login_email:
211 |                         person_dict[login]['email'] = [commit_email]
212 |                     else:
213 |                         person_dict[login]['email'] = [commit_email,login_email]
214 |             except:
215 |                 try:
216 |                     if fail_person_dict[commit_email]:
217 |                         fail_person_dict[commit_email]['commits'] = fail_person_dict[commit_email]['commits'] + 1
218 |                         fail_person_dict[commit_email]['additions'] = fail_person_dict[commit_email]['additions'] + data['additions']
219 |                         fail_person_dict[commit_email]['deletions'] = fail_person_dict[commit_email]['deletions'] + data['deletions']
220 |                     else:
221 |                         fail_person_dict[commit_email]['commits'] = 1
222 |                         fail_person_dict[commit_email]['additions'] = data['additions']
223 |                         fail_person_dict[commit_email]['deletions'] = data['deletions']
224 |                 except:
225 |                     print("Unknown Exception on", y)
226 | 
227 |     # For every email that didn't have a GH login / user, search for that email in the
228 |     # person_dict and if found, add the commits, additions, and deletions to the proper user
229 |     # Print error message if not found (above items for testing of that case)
230 |     for f_key, f_value in fail_person_dict.items():
231 |         found = False
232 |         for key, value in person_dict.items():
233 |             if f_key in value['email']:
234 |                 person_dict[key]['commits'] = person_dict[key]['commits'] + f_value['commits']
235 |                 person_dict[key]['additions'] = person_dict[key]['additions'] + f_value['additions']
236 |                 person_dict[key]['deletions'] = person_dict[key]['deletions'] + f_value['deletions']
237 |                 found = True
238 |         if found == False:
239 |             print('Not found - no person with this email',f_key,f_value)
240 | 
241 |     with open(output_pickle, 'wb') as f:
242 |         pickle.dump(person_dict, f)
243 | 
244 |     print('Commit data stored in', pickle_file)
245 |     print('People Dictionary stored in', output_pickle)
246 | 
247 | create_person_dict(pickle_file, repo_name, since_date, until_date)
248 | 


--------------------------------------------------------------------------------
/scripts/sunset.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright 2022 VMware, Inc.
  4 | # SPDX-License-Identifier: BSD-2-Clause
  5 | # Author: Dawn M. Foster <fosterd@vmware.com>
  6 | 
  7 | """Gather data to determine whether a repo can be archived
  8 | 
  9 | This script uses the GitHub GraphQL API to retrieve relevant
 10 | information about a repository, including forks to determine ownership
 11 | and possibly contact people to understand how they are using a project.
 12 | More detailed information is gathered about recently updated forks and 
 13 | their owners with the recently updated threshold set in a variable called
 14 | recently_updated (currently set to 9 months).
 15 | 
 16 | Usage
 17 | -----
 18 | 
 19 | Run the script with one repo url as input
 20 |     $python3 sunset.py -u "https://github.com/org_name/repo_name"
 21 | 
 22 | Run the script with a csv file containing one org_name,repo_name pair
 23 | per line:
 24 |     $python3 sunset.py -f sunset.csv
 25 | 
 26 | Dependencies and Requirements
 27 | -----------------------------
 28 | 
 29 | This script depends on another tool called Criticality Score to run.
 30 | See https://github.com/ossf/criticality_score for more details, including
 31 | how to set up a required environment variable. This function requires 
 32 | that you have version 1.0.7 of this tool installed (the older Python 
 33 | version but not the final Python version, which doesn't work for 
 34 | some reason within the script - possibly because of how they've 
 35 | implemented the deprecation warnings). You can install the correct version
 36 | using:
 37 | pip install criticality-score==1.0.7
 38 | 
 39 | Your API key should be stored in a file called gh_key in the
 40 | same folder as this script.
 41 | 
 42 | This script requires that `pandas` be installed within the Python
 43 | environment you are running this script in.
 44 | 
 45 | Before using this script, please make sure that you are adhering 
 46 | to the GitHub Acceptable Use Policies:
 47 | https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies
 48 | In particular, "You may not use information from the Service 
 49 | (whether scraped, collected through our API, or obtained otherwise)
 50 | for spamming purposes, including for the purposes of sending unsolicited
 51 | emails to users or selling User Personal Information (as defined in the
 52 | GitHub Privacy Statement), such as to recruiters, headhunters, and job boards."
 53 | 
 54 | Output
 55 | ------
 56 | 
 57 | * Prints basic data about each repo processed to the screen to show progress.
 58 | * the script creates a csv file stored in an subdirectory
 59 |   of the folder with the script called "output" with the filename in 
 60 |   this format with today's date.
 61 | 
 62 | output/sunset_2022-01-14.csv"
 63 | """
 64 | 
 65 | import argparse
 66 | import sys
 67 | from common_functions import create_file, read_key, get_criticality
 68 | from datetime import date
 69 | from dateutil.relativedelta import relativedelta
 70 | import csv
 71 | 
 72 | def make_query(after_cursor = None):
 73 |     """Creates and returns a GraphQL query with cursor for pagination on forks"""
 74 | 
 75 |     return """query repo_forks($org_name: String!, $repo_name: String!){
 76 |         repository(owner: $org_name, name: $repo_name){
 77 |             forks (first:20, after: AFTER) {
 78 |                 pageInfo {
 79 |                     hasNextPage
 80 |                     endCursor
 81 |                 }
 82 |                 totalCount
 83 |                 nodes {
 84 |                 updatedAt
 85 |                 url
 86 |                 owner {
 87 |                     __typename
 88 |                     url
 89 |                     ... on User{
 90 |                     name
 91 |                     company
 92 |                     email
 93 |                     organizations (last:50){
 94 |                         nodes{
 95 |                         name
 96 |                         }
 97 |                     }
 98 |                     }
 99 |                 }
100 |                 }
101 |             }
102 |             stargazerCount
103 |             }
104 |         }""".replace(
105 |             "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
106 |     )
107 | 
108 | def get_fork_data(api_token, org_name, repo_name):
109 |     """Executes the GraphQL query to get repository data.
110 | 
111 |     Parameters
112 |     ----------
113 |     api_token : str
114 |         The GH API token retrieved from the gh_key file.
115 |     org_name and repo_name : str
116 |         The GitHub organization name and repository name to analyze.
117 | 
118 |     Returns
119 |     -------
120 |     repo_info_df : pandas dataframe
121 |         Dataframe with all of the output from the API query
122 |     num_forks : int
123 |         Number of forks for the repo
124 |     num_stars : int
125 |         Number of stars for the repo
126 |     status : str
127 |         Value is "OK" or "Error" depending on whether data could be gathered for that org/repo pair
128 |     """
129 | 
130 |     import requests
131 |     import json
132 |     import pandas as pd
133 | 
134 |     url = 'https://api.github.com/graphql'
135 |     headers = {'Authorization': 'token %s' % api_token}
136 | 
137 |     repo_info_df = pd.DataFrame()
138 | 
139 |     has_next_page = True
140 |     after_cursor = None
141 | 
142 |     while has_next_page:
143 |         try:
144 |             query = make_query(after_cursor)
145 | 
146 |             variables = {"org_name": org_name, "repo_name": repo_name}
147 |             r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)
148 |             json_data = json.loads(r.text)
149 | 
150 |             df_temp = pd.DataFrame(json_data["data"]["repository"]["forks"]["nodes"])
151 |             repo_info_df = pd.concat([repo_info_df, df_temp])
152 | 
153 |             num_forks = json_data["data"]["repository"]["forks"]["totalCount"]
154 |             num_stars = json_data["data"]["repository"]["stargazerCount"]
155 | 
156 |             has_next_page = json_data["data"]["repository"]["forks"]["pageInfo"]["hasNextPage"]
157 |             after_cursor = json_data["data"]["repository"]["forks"]["pageInfo"]["endCursor"]
158 | 
159 |             status = "OK"
160 |         except:
161 |             has_next_page = False
162 |             num_forks = None
163 |             num_stars = None
164 |             status = "Error"
165 | 
166 |     return repo_info_df, num_forks, num_stars, status
167 | 
168 | # Read arguments from the command line to specify whether the repo and org
169 | # should be read from a file for multiple repos or from a url to analyze 
170 | # a single repo
171 | 
172 | parser = argparse.ArgumentParser()
173 | 
174 | parser.add_argument("-f", "--filename", dest = "csv_file", help="File name of a csv file containing one repo_name,org_name per line")
175 | parser.add_argument("-u", "--url", dest = "gh_url", help="URL for a GitHub repository")
176 | 
177 | args = parser.parse_args()
178 | 
179 | if args.csv_file:
180 |     with open(args.csv_file) as f:
181 |         reader = csv.reader(f)
182 |         repo_list = list(reader)
183 | 
184 | if args.gh_url:
185 |     gh_url = args.gh_url
186 | 
187 |     url_parts = gh_url.strip('/').split('/')
188 |     org_name = url_parts[3]
189 |     repo_name = url_parts[4]
190 | 
191 |     repo_list = [[org_name,repo_name]]
192 | 
193 | # Read GitHub key from file using the read_key function in 
194 | # common_functions.py
195 | try:
196 |     api_token = read_key('gh_key')
197 | 
198 | except:
199 |     print("Error reading GH Key. This script depends on the existance of a file called gh_key containing your GitHub API token. Exiting")
200 |     sys.exit()
201 | 
202 | # Uses nine months as recently updated fork threshold
203 | recently_updated = str(date.today() + relativedelta(months=-9))
204 | 
205 | # all_rows it the variable that will be written to the csv file. This initializes it with a csv header line
206 | all_rows = [["Org", "Repo", "Status", "Stars", "Forks", "Dependents", "Crit Score", "fork url", "Fork last updated", "account type", "owner URL", "name", "company", "email", "Other orgs that the owner belongs to"]]
207 | 
208 | for repo in repo_list:
209 |     org_name = repo[0]
210 |     repo_name = repo[1]
211 | 
212 |     try:
213 |         repo_info_df, num_forks, num_stars, status = get_fork_data(api_token, org_name, repo_name)
214 | 
215 |         if status == "OK":
216 | 
217 |             dependents_count, criticality_score = get_criticality(org_name, repo_name, api_token)
218 | 
219 |             # criticality_score sometimes fails in a way that is not reflected in it's own error status
220 |             # and dumps an error message into these variables. I suspect it's caused by a timeout, since it
221 |             # seems to happen mostly with very large repos. This is to clean that up and make the csv
222 |             # file more readable. The check is for isnumeric because Criticality Score returns strings 
223 |             # for some reason.
224 | 
225 |             if dependents_count.isnumeric() is False:
226 |                 criticality_score = "Error"
227 |                 dependents_count = "Error"
228 | 
229 |             print(org_name, repo_name, "Dependents:", dependents_count, "Criticality Score:", criticality_score, "Stars", num_stars, "Forks", num_forks)
230 |             
231 |             # Only run this section if there are forks
232 |             if num_forks > 0:
233 |                 # We only need recent forks in the csv file, so this creates a subset of the dataframe.
234 |                 # If there are no recent forks (empty df), only the basic repo info is
235 |                 # written to the csv file. Otherwise, details about the forks are gathered and added to the csv.
236 |                 recent_forks_df = repo_info_df.loc[repo_info_df['updatedAt'] > recently_updated]
237 | 
238 |                 if len(recent_forks_df) == 0:
239 |                     row = [org_name, repo_name, status, num_stars, num_forks, dependents_count, criticality_score]
240 |                     all_rows.append(row)
241 |                     
242 |                 else:
243 |                     for fork_obj in recent_forks_df.iterrows():
244 |                         fork = fork_obj[1]
245 | 
246 |                         fork_updated = fork['updatedAt']
247 |                         fork_url = fork['url']
248 |                         fork_owner_type = fork['owner']['__typename']
249 |                         fork_owner_url = fork['owner']['url']
250 |                         try:
251 |                             fork_owner_name = fork['owner']['name']
252 |                         except:
253 |                             fork_owner_name = None
254 |                         try:
255 |                             fork_owner_company = fork['owner']['company']
256 |                         except:
257 |                             fork_owner_company = None
258 |                         try:
259 |                             fork_owner_email = fork['owner']['email']
260 |                         except:
261 |                             fork_owner_email = None
262 |                         try:
263 |                             fork_owner_orgs = ''
264 |                             for orgs in fork['owner']['organizations']['nodes']:
265 |                                 fork_owner_orgs = fork_owner_orgs + orgs['name'] + ';'
266 |                             fork_owner_orgs = fork_owner_orgs[:-1] #strip last ;
267 |                             if len(fork_owner_orgs) == 0:
268 |                                 fork_owner_orgs = None
269 |                         except:
270 |                             fork_owner_orgs = None
271 | 
272 |                         row = [org_name, repo_name, status, num_stars, num_forks, dependents_count, criticality_score, fork_url, fork_updated, fork_owner_type, fork_owner_url, fork_owner_name, fork_owner_company, fork_owner_email, fork_owner_orgs]
273 |                         all_rows.append(row)
274 |             else:
275 |                 row = [org_name, repo_name, status, num_stars, num_forks, dependents_count, criticality_score, None, None, None, None, None, None, None, None]
276 |                 all_rows.append(row)
277 |         else:
278 |             print("Cannot process", org_name, repo_name)
279 |             row = [org_name, repo_name, status]
280 |             all_rows.append(row)
281 |     except:
282 |         status = "Error"
283 |         print("Cannot process",  org_name, repo_name)
284 |         row = [org_name, repo_name, status]
285 |         all_rows.append(row)
286 |         
287 | # Create csv output file and write to it.
288 | file, file_path = create_file("sunset")
289 | 
290 | try:
291 |     with file:    
292 |         write = csv.writer(file)
293 |         write.writerows(all_rows)
294 | except:
295 |     print('Could not write to csv file. This may be because the output directory is missing or you do not have permissions to write to it. Exiting')
296 | 


--------------------------------------------------------------------------------
/notebooks/CodeOfConductBug.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "Copyright 2022 VMware, Inc.\n",
  8 |     "SPDX-License-Identifier: BSD-2-Clause\n",
  9 |     "\n",
 10 |     "This notebook contains details about an issue with the `codeOfConduct` object in the GitHub GraphQL API that I used to exchange data with the GH team. They have confirmed that there is an issue and the API engineering team is working on a re-implementation of the `codeOfConduct` object.\n",
 11 |     "\n",
 12 |     "The short version is that the `codeOfConduct` object triggers an API timeout when gathering a small amount of data when compared to other queries that gather large amounts of data without timing out. "
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "code",
 17 |    "execution_count": 1,
 18 |    "metadata": {},
 19 |    "outputs": [],
 20 |    "source": [
 21 |     "import requests\n",
 22 |     "import json\n",
 23 |     "import pandas as pd\n",
 24 |     "\n",
 25 |     "# Assumes your GH API token is in a file called gh_key in this directory\n",
 26 |     "with open('gh_key', 'r') as kf:\n",
 27 |     "    api_token = kf.readline().rstrip() # remove newline & trailing whitespace"
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "markdown",
 32 |    "metadata": {},
 33 |    "source": [
 34 |     "# Example with large amounts of data\n",
 35 |     "\n",
 36 |     "Example of a query that gathers a large amount of data without triggering a timeout in the API. It does not include any data from the `codeOfConduct` object."
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "code",
 41 |    "execution_count": 2,
 42 |    "metadata": {},
 43 |    "outputs": [],
 44 |    "source": [
 45 |     "def complex_query(after_cursor = None):\n",
 46 |     "    return \"\"\"query RepoQuery($org_name: String!) {\n",
 47 |     "             organization(login: $org_name) {\n",
 48 |     "               repositories (first: 100 after: AFTER){\n",
 49 |     "                 pageInfo {\n",
 50 |     "                   hasNextPage\n",
 51 |     "                   endCursor\n",
 52 |     "                 }\n",
 53 |     "                 nodes { \n",
 54 |     "                   nameWithOwner\n",
 55 |     "                   name\n",
 56 |     "                   licenseInfo {\n",
 57 |     "                     name\n",
 58 |     "                   }\n",
 59 |     "                   isPrivate\n",
 60 |     "                   isFork\n",
 61 |     "                   isEmpty\n",
 62 |     "                   isArchived\n",
 63 |     "                   forkCount\n",
 64 |     "                   stargazerCount\n",
 65 |     "                   createdAt\n",
 66 |     "                   updatedAt\n",
 67 |     "                   pushedAt\n",
 68 |     "                   defaultBranchRef {\n",
 69 |     "                     name \n",
 70 |     "                     target{\n",
 71 |     "                        ... on Commit{\n",
 72 |     "                            history(first:1){\n",
 73 |     "                        edges{\n",
 74 |     "                            node{\n",
 75 |     "                                ... on Commit{\n",
 76 |     "                                    committedDate\n",
 77 |     "                                    author{\n",
 78 |     "                                      name\n",
 79 |     "                                      email\n",
 80 |     "                                      user{\n",
 81 |     "                                        login\n",
 82 |     "                                      }\n",
 83 |     "                                    }\n",
 84 |     "                            }\n",
 85 |     "                        }\n",
 86 |     "                    }\n",
 87 |     "                }\n",
 88 |     "                   }\n",
 89 |     "                 }\n",
 90 |     "               }\n",
 91 |     "              }\n",
 92 |     "              }\n",
 93 |     "              }\n",
 94 |     "            }\"\"\".replace(\n",
 95 |     "        \"AFTER\", '\"{}\"'.format(after_cursor) if after_cursor else \"null\"\n",
 96 |     "    )\n"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": 3,
102 |    "metadata": {},
103 |    "outputs": [
104 |     {
105 |      "data": {
106 |       "text/html": [
107 |        "<div>\n",
108 |        "<style scoped>\n",
109 |        "    .dataframe tbody tr th:only-of-type {\n",
110 |        "        vertical-align: middle;\n",
111 |        "    }\n",
112 |        "\n",
113 |        "    .dataframe tbody tr th {\n",
114 |        "        vertical-align: top;\n",
115 |        "    }\n",
116 |        "\n",
117 |        "    .dataframe thead th {\n",
118 |        "        text-align: right;\n",
119 |        "    }\n",
120 |        "</style>\n",
121 |        "<table border=\"1\" class=\"dataframe\">\n",
122 |        "  <thead>\n",
123 |        "    <tr style=\"text-align: right;\">\n",
124 |        "      <th></th>\n",
125 |        "      <th>nameWithOwner</th>\n",
126 |        "      <th>name</th>\n",
127 |        "      <th>licenseInfo</th>\n",
128 |        "      <th>isPrivate</th>\n",
129 |        "      <th>isFork</th>\n",
130 |        "      <th>isEmpty</th>\n",
131 |        "      <th>isArchived</th>\n",
132 |        "      <th>forkCount</th>\n",
133 |        "      <th>stargazerCount</th>\n",
134 |        "      <th>createdAt</th>\n",
135 |        "      <th>updatedAt</th>\n",
136 |        "      <th>pushedAt</th>\n",
137 |        "      <th>defaultBranchRef</th>\n",
138 |        "    </tr>\n",
139 |        "  </thead>\n",
140 |        "  <tbody>\n",
141 |        "    <tr>\n",
142 |        "      <th>0</th>\n",
143 |        "      <td>vmware/pg_rewind</td>\n",
144 |        "      <td>pg_rewind</td>\n",
145 |        "      <td>None</td>\n",
146 |        "      <td>False</td>\n",
147 |        "      <td>False</td>\n",
148 |        "      <td>False</td>\n",
149 |        "      <td>False</td>\n",
150 |        "      <td>20</td>\n",
151 |        "      <td>126</td>\n",
152 |        "      <td>2013-05-23T10:45:43Z</td>\n",
153 |        "      <td>2022-01-13T10:15:53Z</td>\n",
154 |        "      <td>2020-07-15T07:20:13Z</td>\n",
155 |        "      <td>{'name': 'master', 'target': {'history': {'edg...</td>\n",
156 |        "    </tr>\n",
157 |        "    <tr>\n",
158 |        "      <th>1</th>\n",
159 |        "      <td>vmware/pyvmomi</td>\n",
160 |        "      <td>pyvmomi</td>\n",
161 |        "      <td>{'name': 'Apache License 2.0'}</td>\n",
162 |        "      <td>False</td>\n",
163 |        "      <td>False</td>\n",
164 |        "      <td>False</td>\n",
165 |        "      <td>False</td>\n",
166 |        "      <td>732</td>\n",
167 |        "      <td>1886</td>\n",
168 |        "      <td>2013-12-13T17:30:30Z</td>\n",
169 |        "      <td>2022-01-27T03:48:17Z</td>\n",
170 |        "      <td>2021-10-14T20:36:08Z</td>\n",
171 |        "      <td>{'name': 'master', 'target': {'history': {'edg...</td>\n",
172 |        "    </tr>\n",
173 |        "    <tr>\n",
174 |        "      <th>2</th>\n",
175 |        "      <td>vmware/pyvmomi-community-samples</td>\n",
176 |        "      <td>pyvmomi-community-samples</td>\n",
177 |        "      <td>{'name': 'Apache License 2.0'}</td>\n",
178 |        "      <td>False</td>\n",
179 |        "      <td>False</td>\n",
180 |        "      <td>False</td>\n",
181 |        "      <td>False</td>\n",
182 |        "      <td>844</td>\n",
183 |        "      <td>860</td>\n",
184 |        "      <td>2014-04-24T20:31:56Z</td>\n",
185 |        "      <td>2022-01-27T06:16:13Z</td>\n",
186 |        "      <td>2022-01-14T01:06:43Z</td>\n",
187 |        "      <td>{'name': 'master', 'target': {'history': {'edg...</td>\n",
188 |        "    </tr>\n",
189 |        "    <tr>\n",
190 |        "      <th>3</th>\n",
191 |        "      <td>vmware/open-vm-tools</td>\n",
192 |        "      <td>open-vm-tools</td>\n",
193 |        "      <td>None</td>\n",
194 |        "      <td>False</td>\n",
195 |        "      <td>False</td>\n",
196 |        "      <td>False</td>\n",
197 |        "      <td>False</td>\n",
198 |        "      <td>357</td>\n",
199 |        "      <td>1651</td>\n",
200 |        "      <td>2014-04-25T21:30:54Z</td>\n",
201 |        "      <td>2022-01-27T03:33:49Z</td>\n",
202 |        "      <td>2022-01-21T01:24:38Z</td>\n",
203 |        "      <td>{'name': 'master', 'target': {'history': {'edg...</td>\n",
204 |        "    </tr>\n",
205 |        "    <tr>\n",
206 |        "      <th>4</th>\n",
207 |        "      <td>vmware/upgrade-framework</td>\n",
208 |        "      <td>upgrade-framework</td>\n",
209 |        "      <td>{'name': 'Other'}</td>\n",
210 |        "      <td>False</td>\n",
211 |        "      <td>False</td>\n",
212 |        "      <td>False</td>\n",
213 |        "      <td>False</td>\n",
214 |        "      <td>12</td>\n",
215 |        "      <td>18</td>\n",
216 |        "      <td>2014-06-16T17:22:11Z</td>\n",
217 |        "      <td>2021-12-06T23:09:20Z</td>\n",
218 |        "      <td>2021-12-09T20:35:22Z</td>\n",
219 |        "      <td>{'name': 'master', 'target': {'history': {'edg...</td>\n",
220 |        "    </tr>\n",
221 |        "    <tr>\n",
222 |        "      <th>...</th>\n",
223 |        "      <td>...</td>\n",
224 |        "      <td>...</td>\n",
225 |        "      <td>...</td>\n",
226 |        "      <td>...</td>\n",
227 |        "      <td>...</td>\n",
228 |        "      <td>...</td>\n",
229 |        "      <td>...</td>\n",
230 |        "      <td>...</td>\n",
231 |        "      <td>...</td>\n",
232 |        "      <td>...</td>\n",
233 |        "      <td>...</td>\n",
234 |        "      <td>...</td>\n",
235 |        "      <td>...</td>\n",
236 |        "    </tr>\n",
237 |        "    <tr>\n",
238 |        "      <th>206</th>\n",
239 |        "      <td>vmware/vmware-go-kcl-v2</td>\n",
240 |        "      <td>vmware-go-kcl-v2</td>\n",
241 |        "      <td>{'name': 'MIT License'}</td>\n",
242 |        "      <td>False</td>\n",
243 |        "      <td>False</td>\n",
244 |        "      <td>False</td>\n",
245 |        "      <td>False</td>\n",
246 |        "      <td>1</td>\n",
247 |        "      <td>1</td>\n",
248 |        "      <td>2021-11-30T15:05:11Z</td>\n",
249 |        "      <td>2022-01-07T02:25:00Z</td>\n",
250 |        "      <td>2022-01-07T02:24:58Z</td>\n",
251 |        "      <td>{'name': 'main', 'target': {'history': {'edges...</td>\n",
252 |        "    </tr>\n",
253 |        "    <tr>\n",
254 |        "      <th>207</th>\n",
255 |        "      <td>vmware/.github</td>\n",
256 |        "      <td>.github</td>\n",
257 |        "      <td>{'name': 'Other'}</td>\n",
258 |        "      <td>False</td>\n",
259 |        "      <td>False</td>\n",
260 |        "      <td>False</td>\n",
261 |        "      <td>False</td>\n",
262 |        "      <td>4</td>\n",
263 |        "      <td>1</td>\n",
264 |        "      <td>2021-12-03T20:18:21Z</td>\n",
265 |        "      <td>2021-12-15T15:48:36Z</td>\n",
266 |        "      <td>2021-12-15T15:48:33Z</td>\n",
267 |        "      <td>{'name': 'main', 'target': {'history': {'edges...</td>\n",
268 |        "    </tr>\n",
269 |        "    <tr>\n",
270 |        "      <th>208</th>\n",
271 |        "      <td>vmware/app-control-event-kernel-module</td>\n",
272 |        "      <td>app-control-event-kernel-module</td>\n",
273 |        "      <td>{'name': 'GNU General Public License v2.0'}</td>\n",
274 |        "      <td>False</td>\n",
275 |        "      <td>False</td>\n",
276 |        "      <td>False</td>\n",
277 |        "      <td>False</td>\n",
278 |        "      <td>0</td>\n",
279 |        "      <td>1</td>\n",
280 |        "      <td>2021-12-20T16:49:03Z</td>\n",
281 |        "      <td>2022-01-10T15:00:15Z</td>\n",
282 |        "      <td>2021-12-20T20:08:31Z</td>\n",
283 |        "      <td>{'name': 'main', 'target': {'history': {'edges...</td>\n",
284 |        "    </tr>\n",
285 |        "    <tr>\n",
286 |        "      <th>209</th>\n",
287 |        "      <td>vmware/ml-ops-platform-for-vsphere</td>\n",
288 |        "      <td>ml-ops-platform-for-vsphere</td>\n",
289 |        "      <td>{'name': 'Apache License 2.0'}</td>\n",
290 |        "      <td>True</td>\n",
291 |        "      <td>False</td>\n",
292 |        "      <td>False</td>\n",
293 |        "      <td>False</td>\n",
294 |        "      <td>0</td>\n",
295 |        "      <td>0</td>\n",
296 |        "      <td>2022-01-10T12:34:26Z</td>\n",
297 |        "      <td>2022-01-10T12:34:40Z</td>\n",
298 |        "      <td>2022-01-10T12:34:38Z</td>\n",
299 |        "      <td>None</td>\n",
300 |        "    </tr>\n",
301 |        "    <tr>\n",
302 |        "      <th>210</th>\n",
303 |        "      <td>vmware/test-automation-for-web-applications</td>\n",
304 |        "      <td>test-automation-for-web-applications</td>\n",
305 |        "      <td>{'name': 'Apache License 2.0'}</td>\n",
306 |        "      <td>True</td>\n",
307 |        "      <td>False</td>\n",
308 |        "      <td>False</td>\n",
309 |        "      <td>False</td>\n",
310 |        "      <td>0</td>\n",
311 |        "      <td>1</td>\n",
312 |        "      <td>2022-01-12T02:44:48Z</td>\n",
313 |        "      <td>2022-01-24T07:19:02Z</td>\n",
314 |        "      <td>2022-01-26T04:35:24Z</td>\n",
315 |        "      <td>None</td>\n",
316 |        "    </tr>\n",
317 |        "  </tbody>\n",
318 |        "</table>\n",
319 |        "<p>211 rows × 13 columns</p>\n",
320 |        "</div>"
321 |       ],
322 |       "text/plain": [
323 |        "                                   nameWithOwner  \\\n",
324 |        "0                               vmware/pg_rewind   \n",
325 |        "1                                 vmware/pyvmomi   \n",
326 |        "2               vmware/pyvmomi-community-samples   \n",
327 |        "3                           vmware/open-vm-tools   \n",
328 |        "4                       vmware/upgrade-framework   \n",
329 |        "..                                           ...   \n",
330 |        "206                      vmware/vmware-go-kcl-v2   \n",
331 |        "207                               vmware/.github   \n",
332 |        "208       vmware/app-control-event-kernel-module   \n",
333 |        "209           vmware/ml-ops-platform-for-vsphere   \n",
334 |        "210  vmware/test-automation-for-web-applications   \n",
335 |        "\n",
336 |        "                                     name  \\\n",
337 |        "0                               pg_rewind   \n",
338 |        "1                                 pyvmomi   \n",
339 |        "2               pyvmomi-community-samples   \n",
340 |        "3                           open-vm-tools   \n",
341 |        "4                       upgrade-framework   \n",
342 |        "..                                    ...   \n",
343 |        "206                      vmware-go-kcl-v2   \n",
344 |        "207                               .github   \n",
345 |        "208       app-control-event-kernel-module   \n",
346 |        "209           ml-ops-platform-for-vsphere   \n",
347 |        "210  test-automation-for-web-applications   \n",
348 |        "\n",
349 |        "                                     licenseInfo  isPrivate  isFork  isEmpty  \\\n",
350 |        "0                                           None      False   False    False   \n",
351 |        "1                 {'name': 'Apache License 2.0'}      False   False    False   \n",
352 |        "2                 {'name': 'Apache License 2.0'}      False   False    False   \n",
353 |        "3                                           None      False   False    False   \n",
354 |        "4                              {'name': 'Other'}      False   False    False   \n",
355 |        "..                                           ...        ...     ...      ...   \n",
356 |        "206                      {'name': 'MIT License'}      False   False    False   \n",
357 |        "207                            {'name': 'Other'}      False   False    False   \n",
358 |        "208  {'name': 'GNU General Public License v2.0'}      False   False    False   \n",
359 |        "209               {'name': 'Apache License 2.0'}       True   False    False   \n",
360 |        "210               {'name': 'Apache License 2.0'}       True   False    False   \n",
361 |        "\n",
362 |        "     isArchived  forkCount  stargazerCount             createdAt  \\\n",
363 |        "0         False         20             126  2013-05-23T10:45:43Z   \n",
364 |        "1         False        732            1886  2013-12-13T17:30:30Z   \n",
365 |        "2         False        844             860  2014-04-24T20:31:56Z   \n",
366 |        "3         False        357            1651  2014-04-25T21:30:54Z   \n",
367 |        "4         False         12              18  2014-06-16T17:22:11Z   \n",
368 |        "..          ...        ...             ...                   ...   \n",
369 |        "206       False          1               1  2021-11-30T15:05:11Z   \n",
370 |        "207       False          4               1  2021-12-03T20:18:21Z   \n",
371 |        "208       False          0               1  2021-12-20T16:49:03Z   \n",
372 |        "209       False          0               0  2022-01-10T12:34:26Z   \n",
373 |        "210       False          0               1  2022-01-12T02:44:48Z   \n",
374 |        "\n",
375 |        "                updatedAt              pushedAt  \\\n",
376 |        "0    2022-01-13T10:15:53Z  2020-07-15T07:20:13Z   \n",
377 |        "1    2022-01-27T03:48:17Z  2021-10-14T20:36:08Z   \n",
378 |        "2    2022-01-27T06:16:13Z  2022-01-14T01:06:43Z   \n",
379 |        "3    2022-01-27T03:33:49Z  2022-01-21T01:24:38Z   \n",
380 |        "4    2021-12-06T23:09:20Z  2021-12-09T20:35:22Z   \n",
381 |        "..                    ...                   ...   \n",
382 |        "206  2022-01-07T02:25:00Z  2022-01-07T02:24:58Z   \n",
383 |        "207  2021-12-15T15:48:36Z  2021-12-15T15:48:33Z   \n",
384 |        "208  2022-01-10T15:00:15Z  2021-12-20T20:08:31Z   \n",
385 |        "209  2022-01-10T12:34:40Z  2022-01-10T12:34:38Z   \n",
386 |        "210  2022-01-24T07:19:02Z  2022-01-26T04:35:24Z   \n",
387 |        "\n",
388 |        "                                      defaultBranchRef  \n",
389 |        "0    {'name': 'master', 'target': {'history': {'edg...  \n",
390 |        "1    {'name': 'master', 'target': {'history': {'edg...  \n",
391 |        "2    {'name': 'master', 'target': {'history': {'edg...  \n",
392 |        "3    {'name': 'master', 'target': {'history': {'edg...  \n",
393 |        "4    {'name': 'master', 'target': {'history': {'edg...  \n",
394 |        "..                                                 ...  \n",
395 |        "206  {'name': 'main', 'target': {'history': {'edges...  \n",
396 |        "207  {'name': 'main', 'target': {'history': {'edges...  \n",
397 |        "208  {'name': 'main', 'target': {'history': {'edges...  \n",
398 |        "209                                               None  \n",
399 |        "210                                               None  \n",
400 |        "\n",
401 |        "[211 rows x 13 columns]"
402 |       ]
403 |      },
404 |      "execution_count": 3,
405 |      "metadata": {},
406 |      "output_type": "execute_result"
407 |     }
408 |    ],
409 |    "source": [
410 |     "org_name = \"vmware\"\n",
411 |     "url = 'https://api.github.com/graphql'\n",
412 |     "headers = {'Authorization': 'token %s' % api_token}\n",
413 |     "\n",
414 |     "has_next_page = True\n",
415 |     "after_cursor = None\n",
416 |     "\n",
417 |     "repo_info_df = pd.DataFrame()\n",
418 |     "\n",
419 |     "while has_next_page:\n",
420 |     "\n",
421 |     "    query = complex_query(after_cursor)\n",
422 |     "\n",
423 |     "    variables = {\"org_name\": org_name}\n",
424 |     "    r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)\n",
425 |     "    json_data = json.loads(r.text)\n",
426 |     "\n",
427 |     "    df_temp = pd.DataFrame(json_data['data']['organization']['repositories']['nodes'])\n",
428 |     "    repo_info_df = repo_info_df.append(df_temp, ignore_index=True)\n",
429 |     "\n",
430 |     "    has_next_page = json_data[\"data\"][\"organization\"][\"repositories\"][\"pageInfo\"][\"hasNextPage\"]\n",
431 |     "\n",
432 |     "    after_cursor = json_data[\"data\"][\"organization\"][\"repositories\"][\"pageInfo\"][\"endCursor\"]\n",
433 |     "\n",
434 |     "repo_info_df"
435 |    ]
436 |   },
437 |   {
438 |    "cell_type": "markdown",
439 |    "metadata": {},
440 |    "source": [
441 |     "# Code of Conduct Example\n",
442 |     "\n",
443 |     "Note: this will work if you shorten `first: 100` to `first: 20`, but this is a relatively small amount of data that should not timeout unless there is a bug or serious performance issue within the `codeOfConduct` object."
444 |    ]
445 |   },
446 |   {
447 |    "cell_type": "code",
448 |    "execution_count": 11,
449 |    "metadata": {},
450 |    "outputs": [],
451 |    "source": [
452 |     "# fails on vmware, bitnami\n",
453 |     "# works on vmware-tanzu, concourse, carbonblack\n",
454 |     "def make_query():\n",
455 |     "    return \"\"\"query RepoQuery($org_name: String!) {\n",
456 |     "             organization(login: $org_name) {\n",
457 |     "               repositories (first: 100){\n",
458 |     "                 pageInfo {\n",
459 |     "                   hasNextPage\n",
460 |     "                   endCursor\n",
461 |     "                 }\n",
462 |     "                 nodes {\n",
463 |     "                   name\n",
464 |     "                   codeOfConduct{\n",
465 |     "                     url\n",
466 |     "                   }\n",
467 |     "                   createdAt\n",
468 |     "                 }\n",
469 |     "                }\n",
470 |     "               }\n",
471 |     "            }\"\"\""
472 |    ]
473 |   },
474 |   {
475 |    "cell_type": "code",
476 |    "execution_count": 14,
477 |    "metadata": {},
478 |    "outputs": [],
479 |    "source": [
480 |     "def make_query():\n",
481 |     "    return \"\"\"query MyQuery($org_name: String!) {\n",
482 |     "  organization(login: $org_name) {\n",
483 |     "    repositories(first: 100) {\n",
484 |     "      pageInfo {\n",
485 |     "        hasNextPage\n",
486 |     "        endCursor\n",
487 |     "      }\n",
488 |     "      nodes {\n",
489 |     "        codeOfConduct {\n",
490 |     "          url\n",
491 |     "        }\n",
492 |     "        createdAt\n",
493 |     "        name\n",
494 |     "      }\n",
495 |     "    }\n",
496 |     "  }\n",
497 |     "}\"\"\"\n"
498 |    ]
499 |   },
500 |   {
501 |    "cell_type": "markdown",
502 |    "metadata": {},
503 |    "source": [
504 |     "# Works fine for some orgs: vmware-tanzu example"
505 |    ]
506 |   },
507 |   {
508 |    "cell_type": "code",
509 |    "execution_count": 5,
510 |    "metadata": {},
511 |    "outputs": [],
512 |    "source": [
513 |     "# fails on vmware, bitnami\n",
514 |     "# works on vmware-tanzu, concourse, carbonblack\n",
515 |     "org_name = \"vmware-tanzu\""
516 |    ]
517 |   },
518 |   {
519 |    "cell_type": "code",
520 |    "execution_count": 6,
521 |    "metadata": {},
522 |    "outputs": [
523 |     {
524 |      "name": "stdout",
525 |      "output_type": "stream",
526 |      "text": [
527 |       "{'data': {'organization': {'repositories': {'pageInfo': {'hasNextPage': True, 'endCursor': 'Y3Vyc29yOnYyOpHOFdNqbg=='}, 'nodes': [{'name': 'sonobuoy', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/sonobuoy/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2017-07-26T18:27:09Z'}, {'name': 'velero', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/velero/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2017-08-02T17:22:11Z'}, {'name': 'velero-plugin-example', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/velero-plugin-example/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2017-11-28T20:25:03Z'}, {'name': 'tgik', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/tgik/blob/master/CODE-OF-CONDUCT.md'}, 'createdAt': '2018-05-07T18:11:06Z'}, {'name': 'carvel-kwt', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-kwt/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2018-09-24T17:59:19Z'}, {'name': 'thepodlets', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/thepodlets/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2018-10-17T21:16:53Z'}, {'name': 'carvel-ytt', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-ytt/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-03-01T00:13:56Z'}, {'name': 'carvel-ytt-library-for-kubernetes', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-ytt-library-for-kubernetes/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-03-09T00:40:19Z'}, {'name': 'carvel-kapp', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-kapp/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-03-15T21:49:25Z'}, {'name': 'carvel-ytt-library-for-kubernetes-demo', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-ytt-library-for-kubernetes-demo/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-03-22T19:32:24Z'}, {'name': 'ytt.vim', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/ytt.vim/blob/master/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-04-14T09:57:23Z'}, {'name': 'tanzu-observability-collector-for-aws-fargate', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/tanzu-observability-collector-for-aws-fargate/blob/master/CODE-OF-CONDUCT.md'}, 'createdAt': '2019-04-17T05:25:35Z'}, {'name': 'carvel-kbld', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-kbld/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-04-19T22:58:51Z'}, {'name': 'carvel-guestbook-example-on-kubernetes', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-guestbook-example-on-kubernetes/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-04-23T22:59:54Z'}, {'name': 'carvel', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-04-24T19:24:20Z'}, {'name': 'cluster-api', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/cluster-api/blob/master/code-of-conduct.md'}, 'createdAt': '2019-05-07T18:59:24Z'}, {'name': 'carvel-simple-app-on-kubernetes', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-simple-app-on-kubernetes/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-05-09T00:39:47Z'}, {'name': 'velero-plugin-for-csi', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/velero-plugin-for-csi/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-06-04T15:04:55Z'}, {'name': 'octant', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/octant/blob/master/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-06-19T17:53:39Z'}, {'name': 'projects-operator', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/projects-operator/blob/master/CODE-OF-CONDUCT.md'}, 'createdAt': '2019-07-31T19:41:47Z'}, {'name': 'dependency-labeler', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/dependency-labeler/blob/master/CODE-OF-CONDUCT.md'}, 'createdAt': '2019-08-01T09:01:19Z'}, {'name': 'crash-diagnostics', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/crash-diagnostics/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2019-10-02T17:39:58Z'}, {'name': 'velero-plugin-for-aws', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/velero-plugin-for-aws/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-10-09T14:54:44Z'}, {'name': 'velero-plugin-for-microsoft-azure', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-10-11T20:37:01Z'}, {'name': 'velero-plugin-for-gcp', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/velero-plugin-for-gcp/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-10-11T20:37:29Z'}, {'name': 'carvel-imgpkg', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-imgpkg/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-11-01T16:04:25Z'}, {'name': 'carvel-kapp-controller', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-kapp-controller/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-11-06T21:09:24Z'}, {'name': 'sonobuoy-plugins', 'codeOfConduct': None, 'createdAt': '2019-11-13T20:25:45Z'}, {'name': 'carvel-vendir', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-vendir/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2019-12-16T03:40:13Z'}, {'name': 'helm-charts', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/helm-charts/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2019-12-17T20:11:21Z'}, {'name': 'astrolabe', 'codeOfConduct': None, 'createdAt': '2019-12-20T19:14:38Z'}, {'name': 'velero-plugin-for-vsphere', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2019-12-20T19:14:53Z'}, {'name': 'difflib', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/difflib/blob/master/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-01-08T17:12:53Z'}, {'name': 'terraform-provider-carvel', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/terraform-provider-carvel/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-01-14T22:24:37Z'}, {'name': 'tanzu-dev-portal', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/tanzu-dev-portal/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-01-29T17:25:40Z'}, {'name': 'carvel-docker-image', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-docker-image/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-02-03T19:22:22Z'}, {'name': 'carvel-secretgen-controller', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-secretgen-controller/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-02-06T14:19:21Z'}, {'name': 'starlark-go', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/starlark-go/blob/master/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-02-07T16:45:31Z'}, {'name': 'octant-example-plugins', 'codeOfConduct': None, 'createdAt': '2020-02-19T17:01:45Z'}, {'name': 'carvel-setup-action', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-setup-action/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-02-22T18:42:22Z'}, {'name': 'vscode-ytt', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/vscode-ytt/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-03-10T16:39:29Z'}, {'name': 'vm-operator-api', 'codeOfConduct': None, 'createdAt': '2020-03-17T16:36:51Z'}, {'name': 'color', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/color/blob/master/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-03-20T16:09:57Z'}, {'name': 'concourse-kpack-resource', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/concourse-kpack-resource/blob/master/CODE-OF-CONDUCT.md'}, 'createdAt': '2020-04-01T20:42:57Z'}, {'name': 'sources-for-knative', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/sources-for-knative/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-04-02T19:17:50Z'}, {'name': 'asdf-carvel', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/asdf-carvel/blob/master/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-04-06T11:03:59Z'}, {'name': 'kpack-cli', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/kpack-cli/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2020-04-06T14:40:53Z'}, {'name': 'k-bench', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/k-bench/blob/master/CODE-OF-CONDUCT.md'}, 'createdAt': '2020-04-17T21:33:48Z'}, {'name': 'carvel-ytt-starter-for-kubernetes', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/carvel-ytt-starter-for-kubernetes/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-05-13T17:42:35Z'}, {'name': 'service-apis', 'codeOfConduct': None, 'createdAt': '2020-07-01T23:44:44Z'}, {'name': 'pinniped', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/pinniped/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-07-02T22:23:20Z'}, {'name': 'pinniped-ci', 'codeOfConduct': None, 'createdAt': '2020-07-02T22:25:16Z'}, {'name': 'cloud-suitability-analyzer', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/cloud-suitability-analyzer/blob/master/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-08-11T14:45:34Z'}, {'name': 'octant-plugin-for-knative', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/octant-plugin-for-knative/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-08-28T14:18:57Z'}, {'name': 'net-operator-api', 'codeOfConduct': None, 'createdAt': '2020-09-10T17:46:42Z'}, {'name': 'plugin-library-for-octant', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/plugin-library-for-octant/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-09-18T20:56:49Z'}, {'name': 'cross-cluster-connectivity', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/cross-cluster-connectivity/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-09-24T22:59:39Z'}, {'name': 'edukates', 'codeOfConduct': None, 'createdAt': '2020-10-02T20:38:43Z'}, {'name': 'community-edition', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/community-edition/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-10-13T19:01:34Z'}, {'name': 'buildkit-cli-for-kubectl', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/buildkit-cli-for-kubectl/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2020-10-22T18:32:09Z'}, {'name': 'cert-injection-webhook', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/cert-injection-webhook/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2020-10-26T19:25:06Z'}, {'name': 'nozzle-for-microsoft-azure-log-analytics', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/nozzle-for-microsoft-azure-log-analytics/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2020-11-03T18:26:02Z'}, {'name': 'convention-controller', 'codeOfConduct': None, 'createdAt': '2020-11-11T22:15:42Z'}, {'name': 'homebrew-carvel', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/homebrew-carvel/blob/develop/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-11-16T21:49:04Z'}, {'name': 'carvel-community', 'codeOfConduct': None, 'createdAt': '2020-11-16T21:49:55Z'}, {'name': 'octant-plugin-for-kind', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/octant-plugin-for-kind/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-11-24T00:34:48Z'}, {'name': 'rotate-instance-identity-certificates', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/rotate-instance-identity-certificates/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2020-11-30T17:50:37Z'}, {'name': 'observability-event-resource', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/observability-event-resource/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2020-12-10T22:13:34Z'}, {'name': 'homebrew-pinniped', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/homebrew-pinniped/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2020-12-18T19:06:33Z'}, {'name': 'community-engagement', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/community-engagement/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2021-01-07T18:55:11Z'}, {'name': 'tanzu-toolkit-for-visual-studio', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/tanzu-toolkit-for-visual-studio/blob/main/CODE_OF_CONDUCT.md'}, 'createdAt': '2021-01-08T19:51:37Z'}, {'name': 'pinniped-ci-pool', 'codeOfConduct': None, 'createdAt': '2021-01-11T21:34:35Z'}, {'name': 'antrea-build-infra', 'codeOfConduct': None, 'createdAt': '2021-01-19T19:06:11Z'}, {'name': 'pinniped-ghsa-wp53-6256-whf9', 'codeOfConduct': None, 'createdAt': '2021-01-22T18:58:18Z'}, {'name': 'pinniped-private', 'codeOfConduct': None, 'createdAt': '2021-01-22T21:31:35Z'}, {'name': 'cluster-api-provider-bringyourownhost', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/blob/main/CODE-OF-CONDUCT.md'}, 'createdAt': '2021-01-25T21:56:30Z'}, {'name': 'vm-operator', 'codeOfConduct': None, 'createdAt': '2021-02-19T16:07:27Z'}, {'name': 'vsphere-kubernetes-drivers-operator', 'codeOfConduct': None, 'createdAt': '2021-02-19T16:08:06Z'}, {'name': 'tsip-planning', 'codeOfConduct': None, 'createdAt': '2021-02-24T16:26:26Z'}, {'name': 'tsip-cell-sequence-aws-admin', 'codeOfConduct': None, 'createdAt': '2021-02-26T13:24:58Z'}, {'name': 'tsip-development', 'codeOfConduct': None, 'createdAt': '2021-02-26T13:26:01Z'}, {'name': 'crispr', 'codeOfConduct': None, 'createdAt': '2021-02-26T13:30:09Z'}, {'name': 'tsip-old-controllers', 'codeOfConduct': None, 'createdAt': '2021-02-26T15:48:04Z'}, {'name': 'crispr-blackstart', 'codeOfConduct': None, 'createdAt': '2021-03-05T14:09:04Z'}, {'name': 'tsip-misc', 'codeOfConduct': None, 'createdAt': '2021-03-05T16:17:18Z'}, {'name': 'tsip-crispr-blackstart-saturn', 'codeOfConduct': None, 'createdAt': '2021-03-08T18:12:52Z'}, {'name': 'vdpp-partner-docs', 'codeOfConduct': None, 'createdAt': '2021-03-09T17:36:17Z'}, {'name': 'tsip-cells', 'codeOfConduct': None, 'createdAt': '2021-03-16T13:16:17Z'}, {'name': 'tanzu-observability-slug-generator', 'codeOfConduct': {'url': 'https://github.com/vmware-tanzu/tanzu-observability-slug-generator/blob/main/Code-of-Conduct.md'}, 'createdAt': '2021-03-16T18:16:37Z'}, {'name': 'git2go-buildpack', 'codeOfConduct': None, 'createdAt': '2021-03-17T19:56:22Z'}, {'name': 'tsip-infra-images', 'codeOfConduct': None, 'createdAt': '2021-03-24T18:29:36Z'}, {'name': 'vscode-tanzu-tools', 'codeOfConduct': None, 'createdAt': '2021-03-29T18:46:37Z'}, {'name': 'tsip-cell-sequence-aws-core', 'codeOfConduct': None, 'createdAt': '2021-04-06T17:28:22Z'}, {'name': 'tsip-cell-sequence-aws-vault', 'codeOfConduct': None, 'createdAt': '2021-04-26T08:51:53Z'}, {'name': 'tsip-cell-sequence-test', 'codeOfConduct': None, 'createdAt': '2021-04-27T13:14:48Z'}, {'name': 'tkg-windows-containers', 'codeOfConduct': None, 'createdAt': '2021-05-04T16:59:55Z'}, {'name': 'tanzu-cli-apps-plugins', 'codeOfConduct': None, 'createdAt': '2021-05-06T17:38:26Z'}, {'name': 'msys2-buildpack', 'codeOfConduct': None, 'createdAt': '2021-05-07T14:25:10Z'}, {'name': 'tsip-aws-init-sequences', 'codeOfConduct': None, 'createdAt': '2021-05-10T13:49:49Z'}, {'name': 'workload-migration', 'codeOfConduct': None, 'createdAt': '2021-05-10T21:12:11Z'}]}}}}\n"
528 |      ]
529 |     }
530 |    ],
531 |    "source": [
532 |     "url = 'https://api.github.com/graphql'\n",
533 |     "headers = {'Authorization': 'token %s' % api_token}\n",
534 |     "\n",
535 |     "query = make_query()\n",
536 |     "\n",
537 |     "variables = {\"org_name\": org_name}\n",
538 |     "r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)\n",
539 |     "json_data_tanzu = json.loads(r.text)\n",
540 |     "print(json_data_tanzu)"
541 |    ]
542 |   },
543 |   {
544 |    "cell_type": "markdown",
545 |    "metadata": {},
546 |    "source": [
547 |     "# Fails on other orgs: vmware org example"
548 |    ]
549 |   },
550 |   {
551 |    "cell_type": "code",
552 |    "execution_count": 7,
553 |    "metadata": {},
554 |    "outputs": [],
555 |    "source": [
556 |     "# fails on vmware, bitnami\n",
557 |     "# works on vmware-tanzu, concourse, carbonblack\n",
558 |     "org_name = \"vmware\""
559 |    ]
560 |   },
561 |   {
562 |    "cell_type": "code",
563 |    "execution_count": 8,
564 |    "metadata": {},
565 |    "outputs": [
566 |     {
567 |      "name": "stdout",
568 |      "output_type": "stream",
569 |      "text": [
570 |       "{'data': None, 'errors': [{'message': 'Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. Please include `F00A:6919:67C008:6E8EA6:61F409AE` when reporting this issue.'}]}\n"
571 |      ]
572 |     }
573 |    ],
574 |    "source": [
575 |     "url = 'https://api.github.com/graphql'\n",
576 |     "headers = {'Authorization': 'token %s' % api_token}\n",
577 |     "\n",
578 |     "query = make_query()\n",
579 |     "\n",
580 |     "variables = {\"org_name\": org_name}\n",
581 |     "r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)\n",
582 |     "json_data_vmware = json.loads(r.text)\n",
583 |     "print(json_data_vmware)"
584 |    ]
585 |   },
586 |   {
587 |    "cell_type": "markdown",
588 |    "metadata": {},
589 |    "source": [
590 |     "## Note: If you remove the codeOfConduct, the rest of the query works fine."
591 |    ]
592 |   },
593 |   {
594 |    "cell_type": "code",
595 |    "execution_count": 9,
596 |    "metadata": {},
597 |    "outputs": [],
598 |    "source": [
599 |     "# fails on vmware, bitnami\n",
600 |     "# works on vmware-tanzu, concourse, carbonblack\n",
601 |     "def make_query_no_coc():\n",
602 |     "    return \"\"\"query RepoQuery($org_name: String!) {\n",
603 |     "             organization(login: $org_name) {\n",
604 |     "               repositories (first: 100){\n",
605 |     "                 pageInfo {\n",
606 |     "                   hasNextPage\n",
607 |     "                   endCursor\n",
608 |     "                 }\n",
609 |     "                 nodes {\n",
610 |     "                   name\n",
611 |     "                   createdAt\n",
612 |     "                 }\n",
613 |     "                }\n",
614 |     "               }\n",
615 |     "            }\"\"\""
616 |    ]
617 |   },
618 |   {
619 |    "cell_type": "code",
620 |    "execution_count": 10,
621 |    "metadata": {},
622 |    "outputs": [
623 |     {
624 |      "name": "stdout",
625 |      "output_type": "stream",
626 |      "text": [
627 |       "{'data': {'organization': {'repositories': {'pageInfo': {'hasNextPage': True, 'endCursor': 'Y3Vyc29yOnYyOpHOCSW2nA=='}, 'nodes': [{'name': 'pg_rewind', 'createdAt': '2013-05-23T10:45:43Z'}, {'name': 'pyvmomi', 'createdAt': '2013-12-13T17:30:30Z'}, {'name': 'pyvmomi-community-samples', 'createdAt': '2014-04-24T20:31:56Z'}, {'name': 'open-vm-tools', 'createdAt': '2014-04-25T21:30:54Z'}, {'name': 'upgrade-framework', 'createdAt': '2014-06-16T17:22:11Z'}, {'name': 'workflowTools', 'createdAt': '2014-07-18T22:16:00Z'}, {'name': 'govmomi', 'createdAt': '2014-08-12T16:15:08Z'}, {'name': 'pyvcloud', 'createdAt': '2014-11-12T19:36:04Z'}, {'name': 'vmw-guestinfo', 'createdAt': '2014-11-29T23:07:44Z'}, {'name': 'vcd-cli', 'createdAt': '2014-12-05T18:52:29Z'}, {'name': 'open-vmdk', 'createdAt': '2014-12-15T17:10:11Z'}, {'name': 'tdnf', 'createdAt': '2015-02-26T00:44:11Z'}, {'name': 'likewise-open', 'createdAt': '2015-02-26T19:58:04Z'}, {'name': 'photon', 'createdAt': '2015-04-15T17:22:47Z'}, {'name': 'lightwave', 'createdAt': '2015-04-15T17:22:59Z'}, {'name': 'vmware.github.io', 'createdAt': '2015-04-18T23:52:14Z'}, {'name': 'vivace', 'createdAt': '2015-05-15T16:59:28Z'}, {'name': 'PowerCLI-Example-Scripts', 'createdAt': '2015-06-04T16:57:56Z'}, {'name': 'ansible-coreos-bootstrap', 'createdAt': '2015-06-07T21:36:00Z'}, {'name': 'ansible-etcd-cluster', 'createdAt': '2015-06-08T06:34:25Z'}, {'name': 'ansible-etcd-ca', 'createdAt': '2015-06-09T14:32:42Z'}, {'name': 'goipmi', 'createdAt': '2015-07-08T21:24:28Z'}, {'name': 'photon-packer-templates', 'createdAt': '2015-07-30T17:28:22Z'}, {'name': 'ansible-coreos-setup', 'createdAt': '2015-07-31T12:59:08Z'}, {'name': 'ansible-fleet-cluster', 'createdAt': '2015-07-31T13:06:06Z'}, {'name': 'ansible-coreos-autofs', 'createdAt': '2015-08-06T09:08:47Z'}, {'name': 'ansible-flannel', 'createdAt': '2015-08-19T15:03:35Z'}, {'name': 'c-rest-engine', 'createdAt': '2015-09-28T21:55:14Z'}, {'name': 'ansible-kubernetes-ca', 'createdAt': '2015-11-17T01:00:38Z'}, {'name': 'ansible-coreos-kubernetes-master', 'createdAt': '2015-11-17T01:10:07Z'}, {'name': 'ansible-coreos-kubernetes-minion', 'createdAt': '2015-11-17T01:14:41Z'}, {'name': 'ansible-coreos-kubelet', 'createdAt': '2015-11-17T16:56:41Z'}, {'name': 'photon-docker-image', 'createdAt': '2015-11-27T16:05:00Z'}, {'name': 'vic', 'createdAt': '2016-01-13T19:53:57Z'}, {'name': 'photonos-netmgr', 'createdAt': '2016-01-16T01:22:23Z'}, {'name': 'LittleProxy', 'createdAt': '2016-02-18T06:49:32Z'}, {'name': 'alb-sdk', 'createdAt': '2016-03-07T21:29:17Z'}, {'name': 'vsphere-automation-sdk-python', 'createdAt': '2016-05-14T05:02:34Z'}, {'name': 'vsphere-automation-sdk-java', 'createdAt': '2016-05-14T05:03:54Z'}, {'name': 'vsphere-automation-sdk-ruby', 'createdAt': '2016-05-14T05:05:09Z'}, {'name': 'powernsx', 'createdAt': '2016-08-17T09:49:35Z'}, {'name': 'vic-product', 'createdAt': '2016-08-18T21:52:44Z'}, {'name': 'priam', 'createdAt': '2016-08-30T23:29:50Z'}, {'name': 'burp-rest-api', 'createdAt': '2016-09-01T20:31:35Z'}, {'name': 'idm', 'createdAt': '2016-09-06T17:11:25Z'}, {'name': 'clarity', 'createdAt': '2016-09-29T17:24:17Z'}, {'name': 'hillview', 'createdAt': '2016-10-03T20:32:15Z'}, {'name': 'powerclicore', 'createdAt': '2016-10-28T19:30:52Z'}, {'name': 'copenapi', 'createdAt': '2016-11-11T22:54:05Z'}, {'name': 'vidm-saml-toolkit', 'createdAt': '2016-12-12T01:29:40Z'}, {'name': 'ansible', 'createdAt': '2016-12-19T17:39:01Z'}, {'name': 'p4c-xdp', 'createdAt': '2016-12-20T22:10:02Z'}, {'name': 'go-nfs-client', 'createdAt': '2017-01-28T01:18:03Z'}, {'name': 'vsphere-automation-sdk', 'createdAt': '2017-03-08T23:15:04Z'}, {'name': 'weathervane', 'createdAt': '2017-03-15T20:54:26Z'}, {'name': 'chap', 'createdAt': '2017-04-01T22:31:42Z'}, {'name': 'pmd', 'createdAt': '2017-04-12T19:01:58Z'}, {'name': 'o11n-plugin-crypto', 'createdAt': '2017-04-26T20:35:26Z'}, {'name': 'terraform-provider-vcd', 'createdAt': '2017-06-05T20:54:05Z'}, {'name': 'container-service-extension', 'createdAt': '2017-06-27T21:51:06Z'}, {'name': 'smb-connector', 'createdAt': '2017-07-24T11:13:32Z'}, {'name': 'vrops-export', 'createdAt': '2017-07-26T19:03:36Z'}, {'name': 'vsphere-storage-for-kubernetes', 'createdAt': '2017-08-07T22:27:50Z'}, {'name': 'replay-app-for-tvos', 'createdAt': '2017-08-14T14:13:33Z'}, {'name': 'pks-ci', 'createdAt': '2017-08-29T08:19:41Z'}, {'name': 'vic-ui', 'createdAt': '2017-09-29T20:46:59Z'}, {'name': 'nsx-integration-for-openshift', 'createdAt': '2017-10-11T04:07:09Z'}, {'name': 'go-vmware-nsxt', 'createdAt': '2017-10-17T19:30:04Z'}, {'name': 'vsphere-guest-run', 'createdAt': '2017-10-20T20:09:15Z'}, {'name': 'vrealize-suite-lifecycle-manager-sdk', 'createdAt': '2017-10-31T05:06:09Z'}, {'name': 'nsx-powerops', 'createdAt': '2017-11-02T18:47:01Z'}, {'name': 'connectors-workspace-one', 'createdAt': '2017-12-14T20:17:14Z'}, {'name': 'guest-introspection-nsx', 'createdAt': '2018-01-31T04:19:00Z'}, {'name': 'test-operations', 'createdAt': '2018-02-15T23:02:04Z'}, {'name': 'vcd-ext-sdk', 'createdAt': '2018-02-16T20:49:45Z'}, {'name': 'django-yamlconf', 'createdAt': '2018-02-20T05:55:53Z'}, {'name': 'ansible-aws-greengrass', 'createdAt': '2018-03-05T20:49:49Z'}, {'name': 'differential-datalog', 'createdAt': '2018-03-20T20:14:11Z'}, {'name': 'kube-fluentd-operator', 'createdAt': '2018-03-26T09:29:35Z'}, {'name': 'harbor-boshrelease', 'createdAt': '2018-03-28T19:41:58Z'}, {'name': 'harbor-tile', 'createdAt': '2018-04-09T16:22:49Z'}, {'name': 'terraform-provider-nsxt', 'createdAt': '2018-04-09T16:54:13Z'}, {'name': 'ansible-module-vcloud-director', 'createdAt': '2018-05-08T18:38:51Z'}, {'name': 'vic-tools', 'createdAt': '2018-06-29T21:34:15Z'}, {'name': 'ansible-for-nsxt', 'createdAt': '2018-07-02T18:26:22Z'}, {'name': 'ansible-role-greengrass-awscli', 'createdAt': '2018-07-10T04:43:16Z'}, {'name': 'ansible-role-greengrass-init', 'createdAt': '2018-07-10T04:46:25Z'}, {'name': 'go-vcloud-director', 'createdAt': '2018-07-12T19:04:07Z'}, {'name': 'vmware-log-collectors-for-public-cloud', 'createdAt': '2018-07-31T17:00:37Z'}, {'name': 'concord-bft', 'createdAt': '2018-08-01T09:36:28Z'}, {'name': 'fluent-plugin-vmware-loginsight', 'createdAt': '2018-08-09T01:06:20Z'}, {'name': 'nsx-t-datacenter-ci-pipelines', 'createdAt': '2018-08-26T17:43:58Z'}, {'name': 'vmware-go-kcl', 'createdAt': '2018-09-06T19:28:53Z'}, {'name': 'esx-boot', 'createdAt': '2018-09-07T19:13:53Z'}, {'name': 'bare-metal-server-integration-with-nsxt', 'createdAt': '2018-09-11T03:18:02Z'}, {'name': 'vmware-openapi-generator', 'createdAt': '2018-09-11T20:51:53Z'}, {'name': 'wavefront-adapter-for-istio', 'createdAt': '2018-09-17T21:56:24Z'}, {'name': 'ansible-role-microsoft-azure-iot', 'createdAt': '2018-10-17T13:54:48Z'}, {'name': 'ansible-role-microsoft-azure-edge', 'createdAt': '2018-10-17T13:57:21Z'}, {'name': 'cloud-to-edge', 'createdAt': '2018-10-17T13:59:57Z'}]}}}}\n"
628 |      ]
629 |     }
630 |    ],
631 |    "source": [
632 |     "url = 'https://api.github.com/graphql'\n",
633 |     "headers = {'Authorization': 'token %s' % api_token}\n",
634 |     "\n",
635 |     "query = make_query_no_coc()\n",
636 |     "\n",
637 |     "variables = {\"org_name\": org_name}\n",
638 |     "r = requests.post(url=url, json={'query': query, 'variables': variables}, headers=headers)\n",
639 |     "json_data_no_coc = json.loads(r.text)\n",
640 |     "print(json_data_no_coc)"
641 |    ]
642 |   }
643 |  ],
644 |  "metadata": {
645 |   "kernelspec": {
646 |    "display_name": "Python 3",
647 |    "language": "python",
648 |    "name": "python3"
649 |   },
650 |   "language_info": {
651 |    "codemirror_mode": {
652 |     "name": "ipython",
653 |     "version": 3
654 |    },
655 |    "file_extension": ".py",
656 |    "mimetype": "text/x-python",
657 |    "name": "python",
658 |    "nbconvert_exporter": "python",
659 |    "pygments_lexer": "ipython3",
660 |    "version": "3.8.5"
661 |   }
662 |  },
663 |  "nbformat": 4,
664 |  "nbformat_minor": 4
665 | }
666 | 


--------------------------------------------------------------------------------