├── 1.environment.md
├── 2.building.md
├── 3.example.md
├── 4.test.md
├── 5.workflow.md
├── 6.ci.md
├── 7.documentation.md
├── README.md
└── images
    ├── EuroSciPydevsprint.jpg
    ├── amueller.jpg
    ├── azure.png
    ├── cidoclint.png
    ├── circleci.png
    ├── linting-crop.png
    └── reshamas.jpg


/1.environment.md:
--------------------------------------------------------------------------------
  1 | # Installing a Python dev environment
  2 | 
  3 | ## git and github
  4 | 
  5 | - [Install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
  6 | (choose [git bash](https://git-scm.com/download/win) on Windows): git is a versioning system used to manage the source code of software projects such as scikit-learn and NumPy.
  7 | 
  8 | - [Create an account on github.com](https://github.com): github is a platform to work collaboratively on the source code of hosted Open Source projects such as scikit-learn and NumPy.
  9 | 
 10 | - [Create an SSH key for GitHub](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) 
 11 | 
 12 | Once you have a github account and installed the `git` command on your system, open a new terminal session (use Git Bash under Windows) type the following commands.
 13 | 
 14 | - Fork scikit-learn in the github web interface: go to https://github.com/scikit-learn/scikit-learn and click the "fork" button. You should be automatically redirected to your personal fork at:  https://github.com/myusername/scikit-learn in your web browser.
 15 | 
 16 | - Then, in the terminal clone your fork with git:
 17 | 
 18 |   ```
 19 |   $ git clone git@github.com:myusername/scikit-learn.git
 20 |   ```
 21 | 
 22 |   Note that the `$` sign is a generic prompt indicator for terminal commands. Please do not copy it when you copy-paste commands from this document.
 23 | 
 24 | - Many open source projects from the Python ecosystem share similar development practices. For instance, you can also (optionally) clone numpy with git if you want to use the development version of numpy instead of a released package:
 25 |   ```
 26 |   $ git clone git@github.com:myusername/numpy/numpy.git
 27 |   ```
 28 |   
 29 | - After cloning those repo, you should see a new local folders with your clones in the output of the `ls` command:
 30 | 
 31 |   ```
 32 |   $ ls
 33 |   ```
 34 |   
 35 | - To locate those folder, use the `pwd` (path to working directory) command:
 36 | 
 37 |   ```
 38 |   $ pwd
 39 |   ```
 40 | 
 41 | - Configure some aliases for the remote repositories:
 42 | 
 43 |   List existing remotes in your scikit-learn clone:
 44 |   ```
 45 |   $ cd scikit-learn
 46 |   $ git remote -v
 47 |   ```
 48 |   You should a similar output to:
 49 |   ```
 50 |   origin          git@github.com:myusername/scikit-learn.git (fetch)
 51 |   origin          git@github.com:myusername/scikit-learn.git (push)
 52 |   ```
 53 |   
 54 |   Add a new remote for the reference scikit-learn repository on GitHub
 55 |   (i.e `scikit-learn/scikit-learn`) which is conventionally called `upstream`:
 56 |   ```
 57 |   $ git remote add upstream https://github.com/scikit-learn/scikit-learn.git
 58 |   ```
 59 |   
 60 |   Check that your the remote has been properly configured:
 61 |   ```
 62 |   $ git remote -v
 63 |   ```
 64 |   You should now get a similar output to:
 65 |   ```
 66 |   origin          git@github.com:myusername/scikit-learn.git (fetch)
 67 |   origin          git@github.com:myusername/scikit-learn.git (push)
 68 |   upstream        git@github.com:scikit-learn/scikit-learn.git (fetch)
 69 |   upstream        git@github.com:scikit-learn/scikit-learn.git (push)
 70 |   ```
 71 |   
 72 | 
 73 | ## conda
 74 | 
 75 | Conda is a command line tool to download software packages and work in isolated environements for different projects.
 76 | 
 77 | The fastest way to install the conda tool is to use a miniforge installer.
 78 | 
 79 | ### Install Miniforge
 80 | 
 81 | - Install Miniforge from
 82 | [the official installation page](https://github.com/conda-forge/miniforge#miniforge) (choose the latest Miniforge installer links for your Operating System version)
 83 | 
 84 | - Initialize the conda command in git bash
 85 |   - Windows: open "Git Bash" and type
 86 |   ```
 87 |   $ cd Downloads/
 88 |   $ ./Miniforge3-Windows-x86_64.exe
 89 |   ```
 90 |   - Linux & macOS:
 91 |   ```
 92 |   $ cd Downloads/
 93 |   $ bash Miniforge3-*.sh
 94 |   ```
 95 | 
 96 |   And follow the instructions.
 97 | 
 98 | - Make sure your initialized your shell environment:
 99 | 
100 |   - Windows (with Git Bash) and Linux
101 |   ```
102 |   $ conda init bash
103 |   ```
104 | 
105 |   - macOS uses zsh instead of bash by default:
106 |   ```
107 |   $ conda init zsh
108 |   ```
109 | 
110 | - then close your shell and start a new one to type:
111 |   
112 |   ```
113 |   $ conda info
114 |   ```
115 | 
116 |   and look for the location of the "base environment".
117 | 
118 |   or
119 |   
120 |   ```
121 |   $ where conda
122 |   ```
123 |   to check that the conda command is in your PATH and useable from your shell.
124 | 
125 | ### conda environments
126 | 
127 | conda environments make it possible to have specific versions of your packages to work on a specific project independently of the dependencies used for other projects. Once your are done with a project it's very easy to delete a conda environment to avoid accumulating packages you no longer need on your system.
128 | 
129 | conda environments also make it easy to make sure that the versions of the packages you use on your developer environment matchs those used by your team members or those required by the production environment for instance.
130 | 
131 | - create an environment named `sklworkshop`:
132 | ```
133 | $ conda create --name sklworkshop -c conda-forge numpy scipy cython joblib threadpoolctl pytest matplotlib pandas
134 | ```
135 | 
136 | if your are on macOS you should add the `compilers` packages to that list:
137 | 
138 | ```
139 | $ conda create --name sklworkshop -c conda-forge numpy scipy cython joblib threadpoolctl pytest matplotlib pandas compilers
140 | ```
141 | 
142 | Note: the `-c conda-forge` flag is not necessary if you installed conda with the miniforge installer, but it is necessary if you use a conda command installed from the Miniconda or Anaconda installers.
143 | 
144 | - activate and deactivate environments
145 | ```
146 | $ conda activate sklworkshop
147 | (sklworkshop)$ conda deactivate
148 | ```
149 | - version, environment and package listing
150 | ```
151 | $ conda --version
152 | $ conda env list
153 | $ conda list
154 | ```
155 | 
156 | ## VS Code
157 | 
158 | VS Code is a very popular open source code editor with a rich set of extensions to turn it into a full fledged Integrated Development Environment (with fast code navigation, auto completion, pytest execution, debugger, jupyter notebook editing and execution...).
159 | 
160 | Here we show the main tips and tricks to get productive when using VS Code to work on Python projects such as scikit-learn (including the most useful keyboard shortcuts).
161 | 
162 | Install VSCode following the
163 | [instructions](https://code.visualstudio.com/docs/setup/setup-overview#_cross-platform) for your Operating System.
164 | 
165 | Launch VS Code. At the first start you might get a popup to ask you to configure [Telemetry settings](https://code.visualstudio.com/docs/getstarted/telemetry). Feel free to disable telemetry if you don't want VS Code to report any data to its developers.
166 | 
167 | Install the Python extension:
168 | - `Ctrl+Shift+X` to open the extension manager
169 | - search for the python extension: install
170 | 
171 | Note to macOS users: replace `Ctrl` by `Command` on most of the keyboard short-cuts presented in this document.
172 | 
173 | Open project folder for scikit-learn: `Ctrl+Shift+P` then type: "File: Open Folder..." and open the `scikit-learn` folder your create when running the `git clone` command above.
174 | 
175 | Open a Python file from the project by clicking on `setup.py` in the left panel named "EXPLORER".
176 | 
177 | In order to work with VSCode in your Python environment
178 | - `Ctrl+Shift+P` then "python select interpreter" and choose "sklworkshop"
179 | 
180 | Optionally, you can open a new project folder for NumPy similarly: `Ctrl+Shift+P` then type: "File: Open Folder..." and select the `numpy` folder.
181 | 
182 | Activate the "sklworkshop" Python interpreter for the numpy project as well.
183 | 
184 | Switch between projects: `Ctrl-r`
185 | 
186 | Browse the code:
187 | - by files `Ctrl-p`
188 | - by symbols `Ctrl-t`
189 | 
190 | At some point VSCode will complain about not finding a linter: scikit-learn uses `flake8`
191 | - Install `flake8` in your conda environment
192 | ```
193 | $ conda activate sklworkshop
194 | (sklworkshop)$ conda install flake8
195 | ```
196 | - Select `flake8` as a linter in VS Code: `Ctrl-Shift-P` "select linter"
197 | 
198 | ## Practical code navigation
199 | 
200 | Find example files that mention the word "importance" in different ways:
201 | - VS Code: `Ctrl-P` "example importance" and open `examples/plot_permutation_importance.py`
202 | - GitHub: go to https://github.com/scikit-learn/scikit-learn in your browser, press `t` and type "example/importance"
203 | 
204 | Navigate to the `RandomForestClassifier` class from the `plot_permutation_importance.py` example:
205 | - VS Code: ctrl-clicking on the class name
206 | - GitHub: clicking on the class name
207 | 
208 | Find the class `KMeans` in scikit-learn in two different ways:
209 | - from the command line, in a bash or zsh terminal: use `git grep "class KMeans"` (note that using the "class" prefix makes the search more specific to only find the line of the class definition. Otherwise you will find all occurrences of the KMeans class, including in documentation, tests, examples...). When your are not sure about the casing, use the `git grep -i "keyword"` for case insensitive search instead.
210 | - VS Code: `Ctrl-t` and type "KMeans". If nothing happens, press `Enter`, and select the `KMeans` class from the list of suggestions.
211 | 
212 | ## Installing C/C++ compilers to be able to build native extensions
213 | 
214 | Building scikit-learn from source requires a C/C++ compiler (to build native extensions
215 | typically written in Cython for instance).
216 | 
217 | If you have never installed a C/C++ compiler for your system you need to do it now.
218 | 
219 | **macOS users:** feel free to install the `compilers` package from conda-forge in your
220 | environment if your did not do it already. After installation, you need to deactivate
221 | and reactivate your environment for this installation to be effective.
222 | 
223 | ```
224 | $ conda install -n sklworkshop compilers
225 | $ conda deactivate
226 | $ conda activate sklworkshop
227 | ```
228 | 
229 | The scikit-learn build instructions link below gives more details
230 | 
231 | See instructions for your OS in the [installation guide](https://scikit-learn.org/stable/developers/advanced_installation.html#building-from-source).
232 | - [Windows](https://scikit-learn.org/dev/developers/advanced_installation.html#windows)
233 | - [Linux](https://scikit-learn.org/dev/developers/advanced_installation.html#linux)
234 | - [macOS](https://scikit-learn.org/dev/developers/advanced_installation.html#macos)
235 | 


--------------------------------------------------------------------------------
/2.building.md:
--------------------------------------------------------------------------------
 1 | # Building the main branch of scikit-learn
 2 | 
 3 | 
 4 | To build scikit-learn from source, we need to make sure that we have scikit-learn build dependencies installed first:
 5 | 
 6 | ```
 7 | $ conda install numpy scipy cython  # should be already installed
 8 | $ cd scikit-learn/
 9 | $ pip install --verbose --no-use-pep517 --no-build-isolation -e .
10 | $ cd ..
11 | $ pip show scikit-learn
12 | ```
13 | 
14 | We can check that we can import scikit-learn in an interactive IPython session:
15 | 
16 | Install ipython:
17 | ```
18 | $ conda install ipython
19 | ```
20 | 
21 | Then launch it to import scikit-learn
22 | 
23 | ```
24 | $ ipython
25 | >>> import sklearn
26 | >>> sklearn.__version__
27 | 1.1.dev0
28 | >>> sklearn.show_versions()
29 | [...] more details
30 | CTRL-D
31 | ```
32 | 
33 | Many Python projects follow similar coding and packaging conventions. For instance, if you also want to build numpy from source (optional), you can do as follows:
34 | 
35 | ```
36 | $ cd numpy/
37 | $ pip install --verbose --no-build-isolation -e .
38 | $ cd ..
39 | $ pip show numpy
40 | $ ipython
41 | >>> import numpy as np
42 | >>> print(np.__version__)
43 | 1.xx.dev0+xxxxx
44 | CTRL-D
45 | ```
46 | 


--------------------------------------------------------------------------------
/3.example.md:
--------------------------------------------------------------------------------
 1 | # Run a scikit-learn example
 2 | 
 3 | The scikit-learn documentation relies heavily on code examples to
 4 | demonstrate how to use the package with actual data, typically on
 5 | standard public datasets.
 6 | 
 7 | All scikit-learn examples are gathered in the [examples/](
 8 | https://github.com/scikit-learn/scikit-learn/tree/main/examples)
 9 | folder and its subfolders.
10 | 
11 | They are used to automatically generate the pages of the example
12 | gallery on the project website:
13 | 
14 | https://scikit-learn.org/stable/auto_examples/index.html
15 | 
16 | The goal of this exercise is to get familiar navigating in those
17 | examples and executing them either from the command-line or from
18 | with-in VS Code, leveraging the built-in matplotlib integration.
19 | 
20 | In particular, we will consider the following example file:
21 | 
22 | - [examples/inspection/plot_permutation_importance.py](
23 |     https://github.com/scikit-learn/scikit-learn/tree/main/examples/inspection/plot_permutation_importance.py)
24 | 
25 | which renders as:
26 | 
27 | - https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance.html
28 | 
29 | Note: how to generate the HTML documentation will be presented later.
30 | 
31 | ## From the command line
32 | 
33 | ```
34 | $ cd scikit-learn
35 | $ ls examples
36 | $ ls examples/inspection
37 | $ python examples/inspection/plot_permutation_importance.py
38 | ```
39 | 
40 | The text output should be displayed directly in the terminal,
41 | while the graphical output will pop-up in a new window managed
42 | by matplotlib. If you have not already installed matplotlib and pandas, you
43 | can do it with conda:
44 | 
45 | ```
46 | $ conda install matplotlib pandas
47 | ```
48 | 
49 | Hint: at any moment use the `pwd` command to find where you are. `pwd` stands
50 | for "Path to Working Directory". For instance here is a typical output one
51 | would get on a Linux machine:
52 | 
53 | ```
54 | $ pwd
55 | /home/YOUR-NAME/code/scikit-learn
56 | ```
57 | 
58 | ## From VS Code
59 | 
60 | - `ctrl-p` "plot permutation importance" then
61 | 
62 | - `Ctrl-Shift-P` "Run Current File in Python Interactive Window"
63 | 


--------------------------------------------------------------------------------
/4.test.md:
--------------------------------------------------------------------------------
  1 | # Run tests with pytest
  2 | 
  3 | Scikit-learn developers use the [pytest](https://docs.pytest.org/en/latest/) command line tool to collect the list of all tests written in the scikit-learn source code, execute them and report any failure message.
  4 | 
  5 | You can install `pytest` with conda:
  6 | 
  7 | ```
  8 | $ conda install pytest
  9 | ```
 10 | 
 11 | A test is typically written as a Python function with assert statement, for instance:
 12 | 
 13 | ```python
 14 | def test_sum_of_two_integers():
 15 |     a = 40
 16 |     b = 2
 17 |     expected_result = 42
 18 |     assert a + b == expected_result
 19 | ```
 20 | 
 21 | Scikit-learn comes with two types of tests:
 22 | - common tests, generic tests that every estimator should follow;
 23 | - tests specific to each scikit-learn module.
 24 | 
 25 | Common tests are stored in the `sklearn/tests` folder. You can run them using:
 26 | ```
 27 | $ pytest sklearn/tests/
 28 | ```
 29 | If you want to run tests on one specific estimator, eg RandomForestClassifier, you can use
 30 | ```
 31 | $ pytest sklearn/tests -k RandomForestClassifier
 32 | ```
 33 | 
 34 | Each sklearn subpackage comes with a `tests` folder that gathers the test files specifically related to it. For instance the folder `sklearn/cluster/tests` holds all the test file written specifically to test clustering algorithms. In particular, `sklearn/cluster/tests/test_k_means.py` is the test file for K-Means clustering algorithm.
 35 | 
 36 | Use VSCode to open the source code for the `RandomForestClassifier` class.
 37 | 
 38 | - What is the path of this file?
 39 | - Can you find the folder that holds the tests for `RandomForestClassifier` in the VSCode file explorer?
 40 | - Can you find the test folder and list its content using the `ls` command?
 41 | - Locate the file `test_forest.py` in that folder.
 42 | 
 43 | Again use the `pwd` command if you are lost.
 44 | 
 45 | To run the test
 46 | ```
 47 | $ pytest --verbose ./<path_to_test_folder>/test_forest.py
 48 | ```
 49 | 
 50 | Edit the source code of `RandomForestClassifier` to change the predict method always return 0.
 51 | Rerun the test: what do you observe? 
 52 | 
 53 | alternatively you can use the `vlx` flags as follows:
 54 | ```
 55 | $ pytest -vlx ./<path_to_test_folder>/test_forest.py
 56 | ```
 57 | Find out about the meaning of those flags with
 58 | ```
 59 | $ pytest --help
 60 | ```
 61 | 
 62 | See also the section [Useful pytest aliases and flags](https://scikit-learn.org/stable/developers/tips.html#useful-pytest-aliases-and-flags) in the scikit-learn documentation.
 63 | 
 64 | # Write a new test of your own
 65 | 
 66 | We will now do an exercise to add a new test function named `test_rf_regressor_prediction_range` in the `test_forest.py` file.
 67 | 
 68 | The goal of this new test will be to check that a `RandomForestRegressor`
 69 | never predicts numerical values outside of the range of values observed in
 70 | the training set.
 71 | 
 72 | The test will use a training set generated at random by [sampling from that
 73 | Normal distribution](https://numpy.org/doc/stable/reference/random/index.html#quick-start).
 74 | 
 75 | For a test we can use a small dataset, for instance 100 samples and 10 features.
 76 | 
 77 | Here is a code template to get started:
 78 | 
 79 | ```python
 80 | def test_rf_regressor_prediction_range():
 81 |     # Create a Random Number Generator with 42 as a fixed seed.
 82 |     rng = np.random.RandomState(42)
 83 | 
 84 |     # TODO: define the n_samples and n_features variables
 85 |     X_train = rng.normal(size=(n_samples, n_features))
 86 |     y_train = rng.normal(size=n_samples)
 87 | 
 88 |     rfr = RandomForestRegressor(random_state=42)
 89 |     rfr.fit(X_train, y_train)
 90 | 
 91 |     # TODO: np.min() and np.max() to measure the minimum and
 92 |     # maximum values of the training set target values y_train.
 93 | 
 94 |     # TODO: generate some random data X_test with the same number
 95 |     # of features.
 96 | 
 97 |     # TODO: compute the model predictions on the test data:
 98 |     # rfr.predict(X_test) and store the results in a variable y_preds.
 99 | 
100 |     # TODO: check that all the values in y_preds lie between the minimum
101 |     # and maximum values of y_train.
102 | ```
103 | 
104 | After each TODO, check that your code works as expected by running only
105 | your test function with the following command:
106 | 
107 | ```
108 | $ pytest -vl -k test_rf_regressor_prediction_range \
109 |      ./<path_to_test_folder>/test_forest.py
110 | ```
111 | 
112 | Bonus exercise: try to modify the code of scikit-learn to make your new test
113 | fail by making it predict very large or very small values.
114 | 
115 | Note: do not commit those modifications with git.
116 | 
117 | # Automatically formatting the code with black
118 | 
119 | It is recommended to format your code with a specific version of `black`:
120 | 
121 | ```
122 | $ conda install black==22.1.0
123 | ```
124 | 
125 | The you can run:
126 | 
127 | ```
128 | $ black path/to/some_file.py
129 | ```
130 | 
131 | or to run it for a full source folder:
132 | 
133 | ```
134 | $ black sklearn/
135 | ```
136 | 
137 | ```
138 | $ black examples/
139 | ```
140 | 
141 | and so on.
142 | 
143 | It's also convenient to run black directly from with-in VS Code:
144 | 
145 | `Ctrl+Shift+P` and then "> Format Current Document"
146 | 
147 | and select the `black` formatter the first time your use this command.
148 | 
149 | # Reverting your changes
150 | 
151 | Once the exercise is done, feel free to undo all your changes with:
152 | 
153 | ```
154 | git reset --hard
155 | ```
156 | 
157 | # Testing docstrings
158 | 
159 | `pytest` allows to also test docstrings describing the estimator directly in the source code.
160 | Assuming the `RandomForestClassifier` docstring has been modified, the following command line
161 | is used to test the docstring compliancy:
162 | ```
163 | $ pytest --doctest-modules sklearn/ensemble/_forest.py -k RandomForestClassifier
164 | ```
165 | 
166 | # Debugging
167 | 
168 | If you see an `ImportError` when running `pytest` you might need to re-build the Cython
169 | extensions. You can do this with the command `pip install -v --no-use-pep517 --no-build-isolation -e .`.
170 | On Unix-like systems you can instead also type `make in` from the top-level folder of the project.
171 | 


--------------------------------------------------------------------------------
/5.workflow.md:
--------------------------------------------------------------------------------
  1 | # Collaborative workflow via github (pull requests and code reviews)
  2 | 
  3 | What does branching mean?
  4 | 
  5 | https://learngitbranching.js.org/
  6 | 
  7 | https://learngitbranching.js.org/?NODEMO
  8 | 
  9 | - Create a new branch locally from your main
 10 | ```
 11 | $ cd scikit-learn
 12 | $ git status
 13 | $ git checkout -b my-awesome-branch
 14 | $ git status
 15 | ```
 16 | - Make your modifications and commit
 17 | 
 18 | - Check that the remote repository named `origin` points to your
 19 |   own github fork:
 20 | ```
 21 | git remote --verbose
 22 | ```
 23 | 
 24 | - Push the new branch to your fork:
 25 | ```
 26 | $ git push origin my-awesome-branch
 27 | ```
 28 | - From the github interface open a Pull Request (PR) to the `main`
 29 |   branch of your own scikit-learn fork. **Please do not open the PR to the
 30 |   scikit-learn/scikit-learn main repository at this point!**
 31 | 
 32 | When a contributor opens a pull request it will show up in the list of
 33 | open pull requests at:
 34 | 
 35 | https://github.com/scikit-learn/scikit-learn/pulls
 36 | 
 37 | This will start a review process where anybody (and hopefully project
 38 | maintainers in particular) can start reviewing the diff of the files
 39 | changed by the PR.
 40 | 
 41 | The original contributors can then take those comments into account
 42 | by making further changes as new commits in the same branch.
 43 | 
 44 | When those commits are pushed to his or her fork, the PR github is
 45 | automatically updated to aggregate those changes.
 46 | 
 47 | The reviewers typically check that:
 48 | 
 49 | - the change is actually implementing what is described in the title and
 50 |   description of the pull request or a related github issue referenced
 51 |   in the PR description or a referenced paper from the scientific
 52 |   literature.
 53 |   
 54 | - if the change implements a new feature, the maintainers first discuss if
 55 |   this falls under the [scope of the scikit-learn project](
 56 |   https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms).
 57 | 
 58 | - the tests are properly updated to demonstrate that the code works as its
 59 |   supposed to and the tests pass with all supported version of
 60 |   Python, numpy, scipy... and on all 3 supported Operating Systems
 61 |   (Windows, macOS and Linux). More on this point in the next section on
 62 |   Continuous Integration.
 63 | 
 64 | - the documentation and examples related to the change have been properly
 65 |   updated.
 66 | 
 67 | - the code is correct, easy to understand, efficient enough to execute
 68 |   and consistent with the rest of the code base. To ensure style consistency
 69 |   the project follows the [PEP8 code style conventions](
 70 |   https://www.python.org/dev/peps/pep-0008/).
 71 | 
 72 | Once a consensus emerges among the reviewers and the contributors, and
 73 | no comment is left unaddressed, maintainers can merge the commits from
 74 | the contributor's branch into the main branch of the project
 75 | (the `main` branch).
 76 | 
 77 | ## How to take over stalled PRs
 78 | 
 79 | Sometimes you want to take over stalled PRs.
 80 | In general a lot of work has already been done by previous contributors and it is fair to have their name in the commit history.
 81 | 
 82 | Assuming the stalled PR comes from the `some-modifications` branch of the `some-contributor` github user, the workflow is:
 83 | ```
 84 | $ git remote add other-contributor https://github.com/some-contributor/scikit-learn.git
 85 | $ git fetch other-contributor
 86 | $ git checkout -b my-modifications -t other-contributor/some-modifications
 87 | ```
 88 | Once pulled the stalled PR, the new branch should be synchronized with the upstream main branch: you can do that merging upstream/main into the new branch.
 89 | Assuming your main branch even with scikit-learn/scikit-learn, run
 90 | ```
 91 | $ git merge main
 92 | ```
 93 | You don't have push rights on other contributors fork, so you need to push to your fork either specifying your remote at each push
 94 | ```
 95 | git push origin my-modifications
 96 | ```
 97 | or setting once for all your origin for this branch
 98 | ```
 99 | git push --set-upstream origin my-modifications
100 | ```
101 | 
102 | Now you are ready to open a new Pull Request: please, refer to the old one in describing your PR, using "Resolve" or "Close" the old PR will automatically
103 | close it when yours is merged (like issues).
104 | 


--------------------------------------------------------------------------------
/6.ci.md:
--------------------------------------------------------------------------------
  1 | # Continuous Integration
  2 | 
  3 | [Continuous integration](https://en.wikipedia.org/wiki/Continuous_integration) (CI) is the process of automating code quality checks during the development process.
  4 | 
  5 | The scikit-learn project maintainers use [several CI services](https://scikit-learn.org/stable/developers/contributing.html#continuous-integration-ci) to perform automated quality assurance on the source code and its documentation:
  6 | - [Azure pipelines](https://azure.microsoft.com/en-us/services/devops/pipelines/) are used for building and running the scikit-learn test suite using `pytest` on Linux, macOS and Windows against different versions of the dependencies (Python, numpy, scipy...).
  7 | - [CircleCI](https://circleci.com/) is used to build the online HTML documentation using `sphinx` (and also to run the tests with PyPy on Linux).
  8 | 
  9 | In practice, CI services are notified whenever a commit is pushed to a branch on any Pull Request opened on the main scikit-learn github repo. The results are summarized in the github page of the said Pull Request.
 10 | 
 11 | 
 12 | ## Lint checks
 13 | 
 14 | [Linting](https://en.wikipedia.org/wiki/Lint_(software)) checks are run to analyse whether the lines changed by a given Pull Request introduce any undefined variable, useless import statements or a violation of the PEP8 code style conventions for instance.
 15 | 
 16 | `flake8` checks are fast to execute and if the lint fails, most of the longer checks that require building scikit-learn from source are not performed, in order to spare computing resources.
 17 | 
 18 | Failing checks are visible at the end of the Pull Request page:
 19 | 
 20 | ![Failing lint check](images/linting-crop.png)
 21 | 
 22 | Clicking on the [Details](https://app.circleci.com/pipelines/github/scikit-learn/scikit-learn/jobs/81249) link will expand
 23 | the reasons of the failure.
 24 | 
 25 | <a href="https://app.circleci.com/pipelines/github/scikit-learn/scikit-learn/jobs/81249" target="_blank">
 26 |   <img src="images/cidoclint.png" width="90%" />
 27 | </a>
 28 | 
 29 | The log file tells you where the lint issues are: the list of the flake8 errors is available in the flake8
 30 | [documentation](https://flake8.pycqa.org/en/latest/user/error-codes.html).
 31 | 
 32 | In our case:
 33 | - `sklearn/model_selection/_search.py`:785:61: E251 unexpected spaces around keyword / parameter equals, means that
 34 |   line 785, column 61 of the file `sklearn/model_selection/_search.py`
 35 |   ```
 36 |   .format(array_means), category = UserWarning)
 37 |   ```
 38 |   should become
 39 |   ```
 40 |   .format(array_means), category=UserWarning)
 41 |   ```
 42 | - `sklearn/model_selection/_search.py`:785:76: W291 trailing whitespace, means that line 785, column 76 there is a trailing
 43 |   space at the end of the line that should be removed.
 44 | - `sklearn/model_selection/tests/test_search.py`:1809:9: F841 local variable 'x_grid' is assigned to but never used, means
 45 |   that `x-grid` assignement might be removed.
 46 | - `sklearn/model_selection/tests/test_search.py`:1812:17: E222 multiple spaces after operator, means that
 47 |   ```
 48 |   kernel =  'epanechnikov' 
 49 |   ```
 50 |   should become
 51 |   ```
 52 |   kernel = 'epanechnikov'
 53 |   ```
 54 | - `sklearn/model_selection/tests/test_search.py`:1823:24: E128 continuation line under-indented for visual indent, means that
 55 |   the code
 56 |   ```
 57 |   with pytest.warns(UserWarning,
 58 |                  match='Some test scores are not finite\\d+'):
 59 |   ```
 60 |   should become
 61 |   ```
 62 |   with pytest.warns(UserWarning,
 63 |                     match='Some test scores are not finite\\d+'):
 64 |   ```
 65 | - `sklearn/model_selection/tests/test_search.py`:1824:25: E231 missing whitespace after ',', means that
 66 |   ```
 67 |   grid.fit(X[:,np.newaxis])
 68 |   ```
 69 |   should become
 70 |   ```
 71 |   grid.fit(X[:, np.newaxis])
 72 |   ```
 73 | 
 74 | 
 75 | To locally check the code you changed, you can run the following command:
 76 | ``
 77 | git diff upstream/main -u -- "*.py" | flake8 --diff
 78 | ``
 79 | 
 80 | ## Documentation
 81 | 
 82 | A number of checks are performed during the [build of the documentation](https://scikit-learn.org/stable/developers/contributing.html#documentation)
 83 | 
 84 | ![CircleCI checks](images/circleci.png)
 85 | 
 86 | - After the lint (1)
 87 | - the documentation build (2) process starts (using `sphinx`).
 88 | - The deploy 3) step is meant to deploy the preview of the documentation for visually checking the rendered HTML, including generated png figures.
 89 | - The "doc artifact" (4) link list the documentation pages that have been modified by the pull request.
 90 | 
 91 |   The last three steps often fail at the same time.
 92 |   - If the artifact links are available, failures are probably due to sphinx warnings introduced by changes included in the pull request: see the details by clicking the "doc artifact" link.
 93 |   - If artifacts are not available, the build it-self has failed: this can be due, for example, to exceptions in the
 94 |     [example scripts](https://github.com/scikit-learn/scikit-learn/tree/main/examples): read the output logs to see the error message and try to reproduce the problem locally.
 95 |     Sometimes this is due to CircleCI system failures, typicall a failing network connection (see for example
 96 |     [here](https://app.circleci.com/pipelines/github/maikia/scikit-learn/128/workflows/50aac418-6e87-4f10-98e8-4d5c5b5df460/jobs/328/steps)),
 97 |     in that case it can be useful to re-trigger the build with an empty commit, with the following commands:
 98 |     ```
 99 |     $ git commit --allow-empty -m "Trigger CI build."
100 |     $ git push origin <my_branch>
101 |     ```
102 | - The doc-min-dependencies (5) step builds the documentation with minimal versions dependencies.
103 |   A failure at this point means that modifications are not compatible with older versions of Python or of the dependencies
104 |   and some too recent functions should be replaced (see for example [here](https://circleci.com/gh/scikit-learn/scikit-learn/106882?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link)).
105 | 
106 | The CircleCI configuration files are stored in the [.circleci/](https://github.com/scikit-learn/scikit-learn/tree/main/.circleci) directory, while the build scripts are available under the [build_tools/circleci/](https://github.com/scikit-learn/scikit-learn/tree/main/build_tools/circle) directory.
107 | 
108 | ## scikit-learn building and testing
109 | 
110 | ![Azure checks](images/azure.png)
111 | 
112 | The Azure checks are defined in the [azure-pipelines.yml](https://github.com/scikit-learn/scikit-learn/blob/main/azure-pipelines.yml).
113 | The Azure configuration files and the build scripts are available under the [build_tools/azure/](https://github.com/scikit-learn/scikit-learn/tree/main/build_tools/azure) directory.
114 | - After the lint (`flake8`) check a number of platforms are tested.
115 | - When the lint is not successful the [pylatest_conda_mkl](https://github.com/scikit-learn/scikit-learn/blob/442abb10ffb54358a750f0f07d983b67d0c73eab/azure-pipelines.yml#L75) is executed anyway. It is the most complete check.
116 |   Not only the library is built and tests are run, but [docstring](https://github.com/scikit-learn/scikit-learn/blob/442abb10ffb54358a750f0f07d983b67d0c73eab/build_tools/azure/posix.yml#L41) are also tested.
117 | - All other platforms are tested only if lint checks has passed: depending on python or libraries versions or architecture
118 |   type some of them might fail while others don't.
119 | 
120 | ## LGTM checks
121 | 
122 | [LGTM](https://lgtm.com/) is a variant analysis platform that checks your code for Common Vulnerabilities and Exposures
123 | (CVEs). It is performed without actually executing the program and check for bad code practices that could expose the
124 | execution to known bugged behavior.
125 | 
126 | ## CodeCov
127 | 
128 | [CodeCov](https://codecov.io/) is a tool to check the test coverage of your code.
129 | During the test run on Azure a report is produced and uploaded to codecov.io, containing the analysis about how many lines
130 | of code are not covered by tests.
131 | It is common that if Azure builds not finished yet codecov check result in a failure: this is a temporary failure due to the
132 | fact that the uploaded report is not covering the entire process.
133 | CodeCov failures are relevant only if all the builds are completed. There are two possible reasons to failures:
134 | - current tests did not pass;
135 | - new tests are needed to cover added lines.
136 | 


--------------------------------------------------------------------------------
/7.documentation.md:
--------------------------------------------------------------------------------
1 | # Building the documentation with sphinx and sphinx gallery
2 | 
3 | Check the documentation [here](https://scikit-learn.org/stable/developers/contributing.html#documentation)
4 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | [![Crash course in contributing to open source projects](images/EuroSciPydevsprint.jpg)](https://www.euroscipy.org/2022/sprint.html)
 2 | 
 3 | # EuroSciPy 2022 - Sprint - Basel
 4 | Repository with materials and instructions to setup your environment for the EuroSciPy sprint.
 5 | 
 6 | _2022, September 2_
 7 | 
 8 | ## Before the Sprint (if you can)
 9 | 
10 | Please try setting up your environment by following the first chapter of the
11 | Workshop instructions (create a GitHub account and follow
12 | [1.environment.md](1.environment.md)).
13 | 
14 | Feel free to familiarize yourself with git branching concepts by following the
15 | [learngitbranching online tutorial](https://learngitbranching.js.org/) (no
16 | installation required for this tutorial).
17 | 
18 | If you have the time, feel free to do as much as you want of the workshop on
19 | your own. Feel free to ask question on our [Gitter Sprint Channel](https://gitter.im/scikit-learn/sprint).
20 | 
21 | ## Workshop for first time contributors
22 | 
23 | - Create your [github account](https://github.com/join?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home) 
24 | if you don't have one already.
25 | - **Set up your development environment** following the instructions in the markdown file of this repo:
26 |   - [1.environment.md](1.environment.md)
27 |   - [2.building.md](2.building.md)
28 | - **Get familiar with the scikit-learn development process**
29 |   - [3.example.md](3.example.md)
30 |   - [4.test.md](4.test.md)
31 |   - [5.workflow.md](5.workflow.md)
32 |   - [6.ci.md](6.ci.md)
33 |   - [7.documentation.md](7.documentation.md)
34 | - Have a look a the resources linked below.
35 | - Feel free too ask for help on our [Discord channel](https://discord.gg/D7k9Ez8u) or on the [Gitter Sprint Channel](https://gitter.im/scikit-learn/sprint), at any time.
36 | 
37 | If you already know how to build scikit-learn from source, run the tests of a
38 | specific sub-module and use git to switch between branch and do pull requests,
39 | feel free to start working on an issues (see below) instead.
40 | 
41 | ## List of issues for the sprint
42 | 
43 | We will use this [sprint project board](https://github.com/orgs/scikit-learn-inria-fondation/projects/5) to
44 | track pull requests during the afternoon (and after the sprint).
45 | 
46 | In particular those "meta-issues" list potential tasks to perform as first time contributors:
47 | 
48 | - https://github.com/scikit-learn/scikit-learn/issues/21350
49 | - https://github.com/scikit-learn/scikit-learn/issues/23462
50 | - https://github.com/scikit-learn/scikit-learn/issues/5435
51 | 
52 | Another one can interest people with a bit more experience:
53 | 
54 | - https://github.com/scikit-learn/scikit-learn/issues/11000
55 | 
56 | When referencing your PR and while linking it to an existing issue, please insert the following hashtag
57 | #euroscipy22
58 | 
59 | ## Additional Resources for beginners
60 | 
61 | #### [Scikit-learn contributor's documentation](https://scikit-learn.org/dev/developers/contributing.html)
62 | 
63 | #### [Craft Minimal Bug Reports](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports)
64 | 
65 | #### Crash course in contributing to open source projects
66 | 
67 | by [Andreas Mueller](https://github.com/amueller)
68 | 
69 | [![Crash course in contributing to open source projects](images/amueller.jpg)](https://www.youtube.com/embed/5OL8XoMMOfA)
70 | 
71 | #### Example of Submitting a Pull request to scikit-learn
72 | 
73 | by [Reshama Shaikh](https://github.com/reshamas/)
74 | 
75 | [![Example of Submitting a Pull request to scikit-learn](images/reshamas.jpg)](https://www.youtube.com/embed/PU1WyDPGePI)
76 | 


--------------------------------------------------------------------------------
/images/EuroSciPydevsprint.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-inria-fondation/EuroSciPy22/002eb66943f74a51d16b6d392c4a23995689c627/images/EuroSciPydevsprint.jpg


--------------------------------------------------------------------------------
/images/amueller.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-inria-fondation/EuroSciPy22/002eb66943f74a51d16b6d392c4a23995689c627/images/amueller.jpg


--------------------------------------------------------------------------------
/images/azure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-inria-fondation/EuroSciPy22/002eb66943f74a51d16b6d392c4a23995689c627/images/azure.png


--------------------------------------------------------------------------------
/images/cidoclint.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-inria-fondation/EuroSciPy22/002eb66943f74a51d16b6d392c4a23995689c627/images/cidoclint.png


--------------------------------------------------------------------------------
/images/circleci.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-inria-fondation/EuroSciPy22/002eb66943f74a51d16b6d392c4a23995689c627/images/circleci.png


--------------------------------------------------------------------------------
/images/linting-crop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-inria-fondation/EuroSciPy22/002eb66943f74a51d16b6d392c4a23995689c627/images/linting-crop.png


--------------------------------------------------------------------------------
/images/reshamas.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-inria-fondation/EuroSciPy22/002eb66943f74a51d16b6d392c4a23995689c627/images/reshamas.jpg


--------------------------------------------------------------------------------