├── LICENSE ├── README.md ├── example.sh └── git-multisect.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Joachim Breitner 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | git multisect: Find commits that matter 2 | ======================================= 3 | 4 | This tool can be thought of as a variant of `git log` that omits those commits that 5 | do not affect the output of a given command. It can also be thought of a variant 6 | of `git bisect` with more than just “bad” and “good” as states. 7 | 8 | The motivation application is if you have a project that depends on another 9 | project, and you are bumping the dependency version, and you want to see the changes 10 | done upstream that affect _your_ project. So you instruct `git multisect` to build 11 | it with all upstream changes, and compare the build output. 12 | 13 | Intro 14 | ----- 15 | 16 | ### Very small example 17 | 18 | Consider this very small git repository, with four commits: 19 | * A initial commit that adds the `example.sh` program, 20 | * one that changes its output, 21 | * one that just refactors the code, 22 | * but does not change the output, and 23 | * yet another changing the output 24 | 25 |
26 | 27 | Look at each change… 28 | 29 | ``` 30 | $ git log --oneline --reverse -p 31 | dcf6dae Initial check-in 32 | diff --git a/example.sh b/example.sh 33 | new file mode 100755 34 | index 0000000..d6954d9 35 | --- /dev/null 36 | +++ b/example.sh 37 | @@ -0,0 +1,3 @@ 38 | +#!/usr/bin/env bash 39 | + 40 | +echo "Hello World!" 41 | 48c68e2 Second version 42 | diff --git a/example.sh b/example.sh 43 | index d6954d9..3f29b95 100755 44 | --- a/example.sh 45 | +++ b/example.sh 46 | @@ -1,3 +1,3 @@ 47 | #!/usr/bin/env bash 48 | 49 | -echo "Hello World!" 50 | +echo "Hello Galaxies!" 51 | d25f474 Refactor 52 | diff --git a/example.sh b/example.sh 53 | index 3f29b95..91bee54 100755 54 | --- a/example.sh 55 | +++ b/example.sh 56 | :...skipping... 57 | dcf6dae Initial check-in 58 | diff --git a/example.sh b/example.sh 59 | new file mode 100755 60 | index 0000000..d6954d9 61 | --- /dev/null 62 | +++ b/example.sh 63 | @@ -0,0 +1,3 @@ 64 | +#!/usr/bin/env bash 65 | + 66 | +echo "Hello World!" 67 | 48c68e2 Second version 68 | diff --git a/example.sh b/example.sh 69 | index d6954d9..3f29b95 100755 70 | --- a/example.sh 71 | +++ b/example.sh 72 | @@ -1,3 +1,3 @@ 73 | #!/usr/bin/env bash 74 | 75 | -echo "Hello World!" 76 | +echo "Hello Galaxies!" 77 | d25f474 Refactor 78 | diff --git a/example.sh b/example.sh 79 | index 3f29b95..91bee54 100755 80 | --- a/example.sh 81 | +++ b/example.sh 82 | @@ -1,3 +1,4 @@ 83 | #!/usr/bin/env bash 84 | 85 | -echo "Hello Galaxies!" 86 | +who=Galaxies 87 | +echo "Hello $who!" 88 | 8764b3f (HEAD -> master) Third version 89 | diff --git a/example.sh b/example.sh 90 | index 91bee54..bd704ea 100755 91 | --- a/example.sh 92 | +++ b/example.sh 93 | @@ -1,4 +1,4 @@ 94 | #!/usr/bin/env bash 95 | 96 | -who=Galaxies 97 | +who=Universe 98 | echo "Hello $who!" 99 | ``` 100 | 101 |
102 | 103 | As a user upgrading from `dcf6dae` to the latest verision, we might want to check the changelog: 104 | ``` 105 | $ git log --oneline 106 | 8764b3f (HEAD -> master) Third version 107 | d25f474 Refactor 108 | 48c68e2 Second version 109 | dcf6dae Initial check-in 110 | ``` 111 | 112 | But as a user we usually do not care about refactorings; we only care about 113 | those changes that really affect us. So we can use `git-multisect` for that: 114 | 115 | ``` 116 | $ git-multisect.py -f dcf6dae -t master -c 'git checkout -q $REV; ./example.sh' 117 | Found 3 commits 118 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing dcf6dae ... 119 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing 8764b3f ... 120 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing 48c68e2 ... 121 | [3 total, 1 relevant, 0 irrelevant, 0 skipped, 2 unknown] inspecing d25f474 ... 122 | [3 total, 2 relevant, 1 irrelevant, 0 skipped, 0 unknown] done 123 | 124 | 48c68e2 Second version 125 | 8764b3f Third version 126 | ``` 127 | 128 | We tell it the range we are interested, and what to do for each revision 129 | (namely, to check it out and run the script). And very nicely it lists only 130 | those commits that change the output, and omits the refactoring commit.. 131 | 132 | ### A variant of the very small example 133 | 134 | Of course, there are other properties of the repsitory we may care about. Maybe we want to know which commits have increased the code size? Let's see 135 | 136 | ``` 137 | $ ../git-multisect.py -f dcf6dae3 -t master --show-output -c 'git cat-file blob $REV:example.sh|wc -c' 138 | Found 3 commits 139 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing dcf6dae ... 140 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing 8764b3f ... 141 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing 48c68e2 ... 142 | [3 total, 1 relevant, 0 irrelevant, 0 skipped, 2 unknown] inspecing d25f474 ... 143 | [3 total, 2 relevant, 1 irrelevant, 0 skipped, 0 unknown] done 144 | 145 | dcf6dae Initial check-in 146 | 41 147 | 48c68e2 Second version 148 | 44 149 | d25f474 Refactor 150 | 53 151 | ``` 152 | 153 | In this example, our test command does not have to actually check out the 154 | revision, as it can use `git cat-file`; presumably directly accessing the git 155 | repo this way is faster. 156 | 157 | We also use the `--show-output` option to see the output next to the commits. 158 | Note that commit `Third version` is skipped (it did not change the file size, 159 | but now we see the refactoring commit! 160 | 161 | ### A realistic example 162 | 163 | In the following example, I want to know which changes to the nixpkgs package repostory affect my server. The command line is a bit log, so here are the relevant bits: 164 | 165 | * The argument to `-f` is the current revision of `nixpkgs` that I am using, which I can query using `nix falke metadata` and a `jq` call to resolve two levels of indirection. 166 | * I want to upgrade to the latest commit on the `release-22.11` branch 167 | * The command that I am running is `nix path-info --derivation`. This prints a 168 | hash (a store path, actually) of the _recipie_ of building my system. It does 169 | not actually build the system (which would take longer), but is a good approximation 170 | for “does this affect my system”. 171 | * I pass `--no-update-lock-file` to not actually touch my configuation. 172 | * And, curcially I use `--override-input` to tell `nix` to use that particular revision. 173 | * With `--hide-stderr` I make it hide the noise that `nix path-info` does on stderr. 174 | * Finally, because most nixpkgs commits are merges and their first line is rather unhelpful, I pass `--log-option=--pretty=medium` to switch to a more elaborate log format. 175 | 176 | At the time of writing, there are 39 new commits, but after inspecting 10 of 177 | them, it found that only two commits are relevant to me. It does not even 178 | look at commits that are between two commits where the program produces the 179 | same output; in this case this saved looking at 29 commits. 180 | 181 | ``` 182 | $ git-multisect.py \ 183 | -C ~/build/nixpkgs \ 184 | -f $(nix flake metadata ~/nixos --json|jq -r ".locks.nodes[.locks.nodes[.locks.root].inputs.nixpkgs].locked.rev") \ 185 | -t release-22.11 \ 186 | -c 'nix path-info --derivation '\ 187 | '~/nixos#nixosConfigurations.richard.config.system.build.toplevel '\ 188 | '--no-update-lock-file '\ 189 | '--override-input nixpkgs ~/"build/nixpkgs?rev=$REV"' \ 190 | --hide-stderr --log-option=--pretty=medium 191 | Found 39 commits 192 | [39 total, 0 relevant, 0 irrelevant, 0 skipped, 39 unknown] inspecing 2fb7d74 ... 193 | [39 total, 0 relevant, 0 irrelevant, 0 skipped, 39 unknown] inspecing 569163e ... 194 | [39 total, 0 relevant, 0 irrelevant, 0 skipped, 39 unknown] inspecing 5642ce8 ... 195 | [39 total, 0 relevant, 0 irrelevant, 0 skipped, 39 unknown] inspecing e0c8cf8 ... 196 | [39 total, 0 relevant, 1 irrelevant, 8 skipped, 30 unknown] inspecing 89d0361 ... 197 | [39 total, 0 relevant, 1 irrelevant, 8 skipped, 30 unknown] inspecing d1c7e63 ... 198 | [39 total, 0 relevant, 2 irrelevant, 9 skipped, 28 unknown] inspecing e6d5772 ... 199 | [39 total, 0 relevant, 3 irrelevant, 9 skipped, 27 unknown] inspecing a099526 ... 200 | [39 total, 1 relevant, 4 irrelevant, 9 skipped, 25 unknown] inspecing 854312a ... 201 | [39 total, 1 relevant, 5 irrelevant, 10 skipped, 23 unknown] inspecing 95043dc ... 202 | [39 total, 1 relevant, 6 irrelevant, 10 skipped, 22 unknown] inspecing 0cf4274 ... 203 | [39 total, 2 relevant, 8 irrelevant, 29 skipped, 0 unknown] done 204 | 205 | commit a0995268af8ba0336a81344a3bf6a50d6d6481b2 206 | Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> 207 | Date: Sat Feb 18 10:45:11 2023 -0800 208 | 209 | linux_{5_15,6_1}: revert patch to fix Equinix Metal bonded networking with `ice` driver (#216955) 210 | … 211 | commit 0cf4274b5d06325bd16dbf879a30981bc283e58a 212 | Merge: 95043dc713d 532f3aa6052 213 | Author: Pierre Bourdon 214 | Date: Sun Feb 19 23:37:48 2023 +0900 215 | 216 | Merge pull request #217121 from NixOS/backport-216463-to-release-22.11 217 | 218 | [Backport release-22.11] sudo: 1.9.12p2 -> 1.9.13 219 | ``` 220 | 221 | 222 | Usage 223 | ----- 224 | 225 | ``` 226 | Usage: git-multisect.py [options] 227 | 228 | Options: 229 | -h, --help show this help message and exit 230 | -C DIR, --repo=DIR Repository (default .) 231 | -f REV, --from=REV First revision 232 | -t REV, --to=REV Last revision (default: HEAD) 233 | --hide-stderr Hide the command stderr. Good if nosiy 234 | --show-output Include the program output after each log line, and 235 | include the first commit in the log 236 | --log-options=LOG_OPTIONS 237 | How to print the git log (default: --oneline) 238 | -c CMD, --cmd=CMD Command to run. Will be passed to a shell with REV set 239 | to a revision. 240 | ``` 241 | 242 | Issues/Roadmap/Caveats/TODO 243 | --------------------------- 244 | 245 | * The tool requires the first commit to be an ancestor of the last commit, and 246 | only looks at the first-parent-line from the last commit. This is good enough 247 | for the usual single-long-running-branch use case, but could be extended to 248 | handle more complex DAGs better. 249 | 250 | * If a command fails, `git-multisect` aborts with a non-pretty error message. 251 | This could be improved. 252 | 253 | * It is designed for small program output, and stores it all in memory. A more 254 | efficient way is possible, but probably not the bother. If you have large 255 | output, run it through a hash calculation in the command. 256 | 257 | * The tool could be packaged more properly and have a test suite. 258 | 259 | * If this turns out to be useful, maybe someone could upstream it with the git project. 260 | 261 | 262 | Contributions at are welcome! 263 | 264 | -------------------------------------------------------------------------------- /example.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | who=Universe 4 | echo "Hello $who!" 5 | -------------------------------------------------------------------------------- /git-multisect.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import sys 4 | import subprocess 5 | import os 6 | from optparse import OptionParser 7 | 8 | parser = OptionParser() 9 | parser.add_option("-C", "--repo", dest="repo", default=".", 10 | help="Repository (default .)", metavar="DIR") 11 | parser.add_option("-f", "--from", dest="start", 12 | help="First revision", metavar="REV") 13 | parser.add_option("-t", "--to", dest="to", default="HEAD", 14 | help="Last revision (default: HEAD)", metavar="REV") 15 | parser.add_option("--hide-stderr", action = "store_true", dest="hidestderr", 16 | help="Hide the command stderr. Good if nosiy") 17 | parser.add_option("--show-output", action = "store_true", dest="show_output", 18 | help="Include the program output after each log line, and include the first commit in the log") 19 | parser.add_option("--log-options", action = "store", dest="log_options", 20 | default="--oneline --no-decorate", 21 | help="How to print the git log (default: --oneline)") 22 | parser.add_option("-c", "--cmd", dest="cmd", 23 | help="Command to run. Will be passed to a shell with REV set to a revision.", metavar="CMD") 24 | #parser.add_option("-q", "--quiet", 25 | # action="store_false", dest="verbose", default=True, 26 | # help="don't print status messages to stdout") 27 | (options, args) = parser.parse_args() 28 | 29 | if options.start is None or options.cmd is None: 30 | print("The --from and --cmd options are required\n") 31 | parser.print_help() 32 | sys.exit(1) 33 | 34 | 35 | def err(msg): 36 | print(msg, file=sys.stderr) 37 | sys.exit(1) 38 | 39 | def info(msg): 40 | print(msg, file=sys.stderr) 41 | 42 | # Check if the first revision is an ancestor of the last revision 43 | # (and that the repo can be read) 44 | ret = subprocess.run(["git", "-C", options.repo, "merge-base", "--is-ancestor", \ 45 | options.start, options.to]) 46 | if ret.returncode == 0: 47 | pass # good 48 | elif ret.returncode == 1: 49 | err(f"Revision {options.start} is not an ancestor of {options.to}, giving up") 50 | else: 51 | err(f"Failed to run: {' '.join(ret.args)}") 52 | 53 | # Get the list of revision 54 | 55 | commits = subprocess.check_output(["git", "-C", options.repo, "log", "--topo-order", "--reverse", "--first-parent", "--pretty=tformat:%H", f"{options.start}..{options.to}"],text=True).splitlines() 56 | 57 | if len(commits) == 0: 58 | info(f"Found no commits in {options.start}..{options.to}") 59 | sys.exit(0) 60 | 61 | info(f"Found {len(commits)} commits") 62 | 63 | # NB! revs indexing is off-by-one compared to commits indexing 64 | start = subprocess.check_output(["git", "-C", options.repo, "rev-parse", options.start], text=True).splitlines()[0] 65 | revs = [start] + commits 66 | 67 | # Stats! 68 | # commits found to be relevant 69 | relevant = set() 70 | irrelevant = 0 71 | skipped = 0 72 | 73 | l = len(str(len(commits))) 74 | def statinfo(msg): 75 | unknown = len(commits) - len(relevant) - irrelevant - skipped 76 | info(f"[{len(relevant):{l}} relevant, {irrelevant:{l}} irrelevant, {skipped:{l}} skipped, {unknown:{l}} unknown] {msg}") 77 | 78 | # memoized output processing 79 | outputs = {} 80 | def get_output(i): 81 | if i not in outputs: 82 | statinfo(f"🔎 {i:{l}}/{len(commits)} {revs[i][:7]} ...") 83 | outputs[i] = subprocess.check_output( 84 | options.cmd, 85 | shell = True, 86 | env = dict(os.environ, REV=revs[i]), 87 | stderr = subprocess.DEVNULL if options.hidestderr else None, 88 | text = options.show_output 89 | ) 90 | return outputs[i] 91 | 92 | # the stack of ranges (indices, inclusive) yet to check for relevant commits. 93 | # invariants: 94 | # * the output of endpoints differ, i.e. there will be some relevant commit 95 | # * the interval is not a singleton 96 | todo = [] 97 | 98 | # adds an interval to the top (end) of the stack, checking the invariants 99 | def add_interval(i,j): 100 | global irrelevant, skipped 101 | if get_output(i) == get_output(j): 102 | # no changes in this range: drop range, mark end as irrelvant and intermediate as skipped 103 | irrelevant += 1 104 | skipped += j-i-1 105 | elif j == i + 1: 106 | # j proven to be relevant 107 | relevant.add(j-1) # NB index shift 108 | else: 109 | global todo 110 | todo += [(i,j)] 111 | 112 | # Add initial interval 113 | add_interval(0, len(revs)-1) 114 | # Keep splitting the intervals, until none are left 115 | while len(todo)>0: 116 | (i,j) = todo.pop() 117 | k = (i+j)//2 118 | add_interval(k,j) 119 | add_interval(i,k) 120 | 121 | def git_log(rev): 122 | subprocess.run([ 123 | "git", 124 | "--no-pager", 125 | "-C", options.repo, 126 | "log", "-n1"] + options.log_options.split() + [rev]) 127 | 128 | statinfo("done") 129 | info("") 130 | if options.show_output: 131 | git_log(start) 132 | sys.stdout.write(get_output(0)) 133 | for i, l in enumerate(commits): 134 | if i in relevant: 135 | git_log(l) 136 | if options.show_output: 137 | sys.stdout.write(get_output(i+1)) 138 | 139 | --------------------------------------------------------------------------------