├── LICENSE
├── README.md
├── example.sh
└── git-multisect.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Joachim Breitner
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | git multisect: Find commits that matter
2 | =======================================
3 |
4 | This tool can be thought of as a variant of `git log` that omits those commits that
5 | do not affect the output of a given command. It can also be thought of a variant
6 | of `git bisect` with more than just “bad” and “good” as states.
7 |
8 | The motivation application is if you have a project that depends on another
9 | project, and you are bumping the dependency version, and you want to see the changes
10 | done upstream that affect _your_ project. So you instruct `git multisect` to build
11 | it with all upstream changes, and compare the build output.
12 |
13 | Intro
14 | -----
15 |
16 | ### Very small example
17 |
18 | Consider this very small git repository, with four commits:
19 | * A initial commit that adds the `example.sh` program,
20 | * one that changes its output,
21 | * one that just refactors the code,
22 | * but does not change the output, and
23 | * yet another changing the output
24 |
25 |
26 |
27 | Look at each change…
28 |
29 | ```
30 | $ git log --oneline --reverse -p
31 | dcf6dae Initial check-in
32 | diff --git a/example.sh b/example.sh
33 | new file mode 100755
34 | index 0000000..d6954d9
35 | --- /dev/null
36 | +++ b/example.sh
37 | @@ -0,0 +1,3 @@
38 | +#!/usr/bin/env bash
39 | +
40 | +echo "Hello World!"
41 | 48c68e2 Second version
42 | diff --git a/example.sh b/example.sh
43 | index d6954d9..3f29b95 100755
44 | --- a/example.sh
45 | +++ b/example.sh
46 | @@ -1,3 +1,3 @@
47 | #!/usr/bin/env bash
48 |
49 | -echo "Hello World!"
50 | +echo "Hello Galaxies!"
51 | d25f474 Refactor
52 | diff --git a/example.sh b/example.sh
53 | index 3f29b95..91bee54 100755
54 | --- a/example.sh
55 | +++ b/example.sh
56 | :...skipping...
57 | dcf6dae Initial check-in
58 | diff --git a/example.sh b/example.sh
59 | new file mode 100755
60 | index 0000000..d6954d9
61 | --- /dev/null
62 | +++ b/example.sh
63 | @@ -0,0 +1,3 @@
64 | +#!/usr/bin/env bash
65 | +
66 | +echo "Hello World!"
67 | 48c68e2 Second version
68 | diff --git a/example.sh b/example.sh
69 | index d6954d9..3f29b95 100755
70 | --- a/example.sh
71 | +++ b/example.sh
72 | @@ -1,3 +1,3 @@
73 | #!/usr/bin/env bash
74 |
75 | -echo "Hello World!"
76 | +echo "Hello Galaxies!"
77 | d25f474 Refactor
78 | diff --git a/example.sh b/example.sh
79 | index 3f29b95..91bee54 100755
80 | --- a/example.sh
81 | +++ b/example.sh
82 | @@ -1,3 +1,4 @@
83 | #!/usr/bin/env bash
84 |
85 | -echo "Hello Galaxies!"
86 | +who=Galaxies
87 | +echo "Hello $who!"
88 | 8764b3f (HEAD -> master) Third version
89 | diff --git a/example.sh b/example.sh
90 | index 91bee54..bd704ea 100755
91 | --- a/example.sh
92 | +++ b/example.sh
93 | @@ -1,4 +1,4 @@
94 | #!/usr/bin/env bash
95 |
96 | -who=Galaxies
97 | +who=Universe
98 | echo "Hello $who!"
99 | ```
100 |
101 |
102 |
103 | As a user upgrading from `dcf6dae` to the latest verision, we might want to check the changelog:
104 | ```
105 | $ git log --oneline
106 | 8764b3f (HEAD -> master) Third version
107 | d25f474 Refactor
108 | 48c68e2 Second version
109 | dcf6dae Initial check-in
110 | ```
111 |
112 | But as a user we usually do not care about refactorings; we only care about
113 | those changes that really affect us. So we can use `git-multisect` for that:
114 |
115 | ```
116 | $ git-multisect.py -f dcf6dae -t master -c 'git checkout -q $REV; ./example.sh'
117 | Found 3 commits
118 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing dcf6dae ...
119 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing 8764b3f ...
120 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing 48c68e2 ...
121 | [3 total, 1 relevant, 0 irrelevant, 0 skipped, 2 unknown] inspecing d25f474 ...
122 | [3 total, 2 relevant, 1 irrelevant, 0 skipped, 0 unknown] done
123 |
124 | 48c68e2 Second version
125 | 8764b3f Third version
126 | ```
127 |
128 | We tell it the range we are interested, and what to do for each revision
129 | (namely, to check it out and run the script). And very nicely it lists only
130 | those commits that change the output, and omits the refactoring commit..
131 |
132 | ### A variant of the very small example
133 |
134 | Of course, there are other properties of the repsitory we may care about. Maybe we want to know which commits have increased the code size? Let's see
135 |
136 | ```
137 | $ ../git-multisect.py -f dcf6dae3 -t master --show-output -c 'git cat-file blob $REV:example.sh|wc -c'
138 | Found 3 commits
139 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing dcf6dae ...
140 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing 8764b3f ...
141 | [3 total, 0 relevant, 0 irrelevant, 0 skipped, 3 unknown] inspecing 48c68e2 ...
142 | [3 total, 1 relevant, 0 irrelevant, 0 skipped, 2 unknown] inspecing d25f474 ...
143 | [3 total, 2 relevant, 1 irrelevant, 0 skipped, 0 unknown] done
144 |
145 | dcf6dae Initial check-in
146 | 41
147 | 48c68e2 Second version
148 | 44
149 | d25f474 Refactor
150 | 53
151 | ```
152 |
153 | In this example, our test command does not have to actually check out the
154 | revision, as it can use `git cat-file`; presumably directly accessing the git
155 | repo this way is faster.
156 |
157 | We also use the `--show-output` option to see the output next to the commits.
158 | Note that commit `Third version` is skipped (it did not change the file size,
159 | but now we see the refactoring commit!
160 |
161 | ### A realistic example
162 |
163 | In the following example, I want to know which changes to the nixpkgs package repostory affect my server. The command line is a bit log, so here are the relevant bits:
164 |
165 | * The argument to `-f` is the current revision of `nixpkgs` that I am using, which I can query using `nix falke metadata` and a `jq` call to resolve two levels of indirection.
166 | * I want to upgrade to the latest commit on the `release-22.11` branch
167 | * The command that I am running is `nix path-info --derivation`. This prints a
168 | hash (a store path, actually) of the _recipie_ of building my system. It does
169 | not actually build the system (which would take longer), but is a good approximation
170 | for “does this affect my system”.
171 | * I pass `--no-update-lock-file` to not actually touch my configuation.
172 | * And, curcially I use `--override-input` to tell `nix` to use that particular revision.
173 | * With `--hide-stderr` I make it hide the noise that `nix path-info` does on stderr.
174 | * Finally, because most nixpkgs commits are merges and their first line is rather unhelpful, I pass `--log-option=--pretty=medium` to switch to a more elaborate log format.
175 |
176 | At the time of writing, there are 39 new commits, but after inspecting 10 of
177 | them, it found that only two commits are relevant to me. It does not even
178 | look at commits that are between two commits where the program produces the
179 | same output; in this case this saved looking at 29 commits.
180 |
181 | ```
182 | $ git-multisect.py \
183 | -C ~/build/nixpkgs \
184 | -f $(nix flake metadata ~/nixos --json|jq -r ".locks.nodes[.locks.nodes[.locks.root].inputs.nixpkgs].locked.rev") \
185 | -t release-22.11 \
186 | -c 'nix path-info --derivation '\
187 | '~/nixos#nixosConfigurations.richard.config.system.build.toplevel '\
188 | '--no-update-lock-file '\
189 | '--override-input nixpkgs ~/"build/nixpkgs?rev=$REV"' \
190 | --hide-stderr --log-option=--pretty=medium
191 | Found 39 commits
192 | [39 total, 0 relevant, 0 irrelevant, 0 skipped, 39 unknown] inspecing 2fb7d74 ...
193 | [39 total, 0 relevant, 0 irrelevant, 0 skipped, 39 unknown] inspecing 569163e ...
194 | [39 total, 0 relevant, 0 irrelevant, 0 skipped, 39 unknown] inspecing 5642ce8 ...
195 | [39 total, 0 relevant, 0 irrelevant, 0 skipped, 39 unknown] inspecing e0c8cf8 ...
196 | [39 total, 0 relevant, 1 irrelevant, 8 skipped, 30 unknown] inspecing 89d0361 ...
197 | [39 total, 0 relevant, 1 irrelevant, 8 skipped, 30 unknown] inspecing d1c7e63 ...
198 | [39 total, 0 relevant, 2 irrelevant, 9 skipped, 28 unknown] inspecing e6d5772 ...
199 | [39 total, 0 relevant, 3 irrelevant, 9 skipped, 27 unknown] inspecing a099526 ...
200 | [39 total, 1 relevant, 4 irrelevant, 9 skipped, 25 unknown] inspecing 854312a ...
201 | [39 total, 1 relevant, 5 irrelevant, 10 skipped, 23 unknown] inspecing 95043dc ...
202 | [39 total, 1 relevant, 6 irrelevant, 10 skipped, 22 unknown] inspecing 0cf4274 ...
203 | [39 total, 2 relevant, 8 irrelevant, 29 skipped, 0 unknown] done
204 |
205 | commit a0995268af8ba0336a81344a3bf6a50d6d6481b2
206 | Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
207 | Date: Sat Feb 18 10:45:11 2023 -0800
208 |
209 | linux_{5_15,6_1}: revert patch to fix Equinix Metal bonded networking with `ice` driver (#216955)
210 | …
211 | commit 0cf4274b5d06325bd16dbf879a30981bc283e58a
212 | Merge: 95043dc713d 532f3aa6052
213 | Author: Pierre Bourdon
214 | Date: Sun Feb 19 23:37:48 2023 +0900
215 |
216 | Merge pull request #217121 from NixOS/backport-216463-to-release-22.11
217 |
218 | [Backport release-22.11] sudo: 1.9.12p2 -> 1.9.13
219 | ```
220 |
221 |
222 | Usage
223 | -----
224 |
225 | ```
226 | Usage: git-multisect.py [options]
227 |
228 | Options:
229 | -h, --help show this help message and exit
230 | -C DIR, --repo=DIR Repository (default .)
231 | -f REV, --from=REV First revision
232 | -t REV, --to=REV Last revision (default: HEAD)
233 | --hide-stderr Hide the command stderr. Good if nosiy
234 | --show-output Include the program output after each log line, and
235 | include the first commit in the log
236 | --log-options=LOG_OPTIONS
237 | How to print the git log (default: --oneline)
238 | -c CMD, --cmd=CMD Command to run. Will be passed to a shell with REV set
239 | to a revision.
240 | ```
241 |
242 | Issues/Roadmap/Caveats/TODO
243 | ---------------------------
244 |
245 | * The tool requires the first commit to be an ancestor of the last commit, and
246 | only looks at the first-parent-line from the last commit. This is good enough
247 | for the usual single-long-running-branch use case, but could be extended to
248 | handle more complex DAGs better.
249 |
250 | * If a command fails, `git-multisect` aborts with a non-pretty error message.
251 | This could be improved.
252 |
253 | * It is designed for small program output, and stores it all in memory. A more
254 | efficient way is possible, but probably not the bother. If you have large
255 | output, run it through a hash calculation in the command.
256 |
257 | * The tool could be packaged more properly and have a test suite.
258 |
259 | * If this turns out to be useful, maybe someone could upstream it with the git project.
260 |
261 |
262 | Contributions at are welcome!
263 |
264 |
--------------------------------------------------------------------------------
/example.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | who=Universe
4 | echo "Hello $who!"
5 |
--------------------------------------------------------------------------------
/git-multisect.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import sys
4 | import subprocess
5 | import os
6 | from optparse import OptionParser
7 |
8 | parser = OptionParser()
9 | parser.add_option("-C", "--repo", dest="repo", default=".",
10 | help="Repository (default .)", metavar="DIR")
11 | parser.add_option("-f", "--from", dest="start",
12 | help="First revision", metavar="REV")
13 | parser.add_option("-t", "--to", dest="to", default="HEAD",
14 | help="Last revision (default: HEAD)", metavar="REV")
15 | parser.add_option("--hide-stderr", action = "store_true", dest="hidestderr",
16 | help="Hide the command stderr. Good if nosiy")
17 | parser.add_option("--show-output", action = "store_true", dest="show_output",
18 | help="Include the program output after each log line, and include the first commit in the log")
19 | parser.add_option("--log-options", action = "store", dest="log_options",
20 | default="--oneline --no-decorate",
21 | help="How to print the git log (default: --oneline)")
22 | parser.add_option("-c", "--cmd", dest="cmd",
23 | help="Command to run. Will be passed to a shell with REV set to a revision.", metavar="CMD")
24 | #parser.add_option("-q", "--quiet",
25 | # action="store_false", dest="verbose", default=True,
26 | # help="don't print status messages to stdout")
27 | (options, args) = parser.parse_args()
28 |
29 | if options.start is None or options.cmd is None:
30 | print("The --from and --cmd options are required\n")
31 | parser.print_help()
32 | sys.exit(1)
33 |
34 |
35 | def err(msg):
36 | print(msg, file=sys.stderr)
37 | sys.exit(1)
38 |
39 | def info(msg):
40 | print(msg, file=sys.stderr)
41 |
42 | # Check if the first revision is an ancestor of the last revision
43 | # (and that the repo can be read)
44 | ret = subprocess.run(["git", "-C", options.repo, "merge-base", "--is-ancestor", \
45 | options.start, options.to])
46 | if ret.returncode == 0:
47 | pass # good
48 | elif ret.returncode == 1:
49 | err(f"Revision {options.start} is not an ancestor of {options.to}, giving up")
50 | else:
51 | err(f"Failed to run: {' '.join(ret.args)}")
52 |
53 | # Get the list of revision
54 |
55 | commits = subprocess.check_output(["git", "-C", options.repo, "log", "--topo-order", "--reverse", "--first-parent", "--pretty=tformat:%H", f"{options.start}..{options.to}"],text=True).splitlines()
56 |
57 | if len(commits) == 0:
58 | info(f"Found no commits in {options.start}..{options.to}")
59 | sys.exit(0)
60 |
61 | info(f"Found {len(commits)} commits")
62 |
63 | # NB! revs indexing is off-by-one compared to commits indexing
64 | start = subprocess.check_output(["git", "-C", options.repo, "rev-parse", options.start], text=True).splitlines()[0]
65 | revs = [start] + commits
66 |
67 | # Stats!
68 | # commits found to be relevant
69 | relevant = set()
70 | irrelevant = 0
71 | skipped = 0
72 |
73 | l = len(str(len(commits)))
74 | def statinfo(msg):
75 | unknown = len(commits) - len(relevant) - irrelevant - skipped
76 | info(f"[{len(relevant):{l}} relevant, {irrelevant:{l}} irrelevant, {skipped:{l}} skipped, {unknown:{l}} unknown] {msg}")
77 |
78 | # memoized output processing
79 | outputs = {}
80 | def get_output(i):
81 | if i not in outputs:
82 | statinfo(f"🔎 {i:{l}}/{len(commits)} {revs[i][:7]} ...")
83 | outputs[i] = subprocess.check_output(
84 | options.cmd,
85 | shell = True,
86 | env = dict(os.environ, REV=revs[i]),
87 | stderr = subprocess.DEVNULL if options.hidestderr else None,
88 | text = options.show_output
89 | )
90 | return outputs[i]
91 |
92 | # the stack of ranges (indices, inclusive) yet to check for relevant commits.
93 | # invariants:
94 | # * the output of endpoints differ, i.e. there will be some relevant commit
95 | # * the interval is not a singleton
96 | todo = []
97 |
98 | # adds an interval to the top (end) of the stack, checking the invariants
99 | def add_interval(i,j):
100 | global irrelevant, skipped
101 | if get_output(i) == get_output(j):
102 | # no changes in this range: drop range, mark end as irrelvant and intermediate as skipped
103 | irrelevant += 1
104 | skipped += j-i-1
105 | elif j == i + 1:
106 | # j proven to be relevant
107 | relevant.add(j-1) # NB index shift
108 | else:
109 | global todo
110 | todo += [(i,j)]
111 |
112 | # Add initial interval
113 | add_interval(0, len(revs)-1)
114 | # Keep splitting the intervals, until none are left
115 | while len(todo)>0:
116 | (i,j) = todo.pop()
117 | k = (i+j)//2
118 | add_interval(k,j)
119 | add_interval(i,k)
120 |
121 | def git_log(rev):
122 | subprocess.run([
123 | "git",
124 | "--no-pager",
125 | "-C", options.repo,
126 | "log", "-n1"] + options.log_options.split() + [rev])
127 |
128 | statinfo("done")
129 | info("")
130 | if options.show_output:
131 | git_log(start)
132 | sys.stdout.write(get_output(0))
133 | for i, l in enumerate(commits):
134 | if i in relevant:
135 | git_log(l)
136 | if options.show_output:
137 | sys.stdout.write(get_output(i+1))
138 |
139 |
--------------------------------------------------------------------------------