├── .github └── workflows │ ├── lint.yml │ ├── release.yml │ └── test.yml ├── .gitignore ├── .golangci.toml ├── Brewfile ├── CODEOWNERS ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.md ├── Makefile ├── README.md ├── counts ├── counts.go ├── counts_test.go ├── human.go └── human_test.go ├── docs ├── BUILDING.md └── RELEASING.md ├── git-sizer.go ├── git ├── batch_header.go ├── batch_obj_iter.go ├── commit.go ├── git.go ├── git_bin.go ├── gitconfig.go ├── gitconfig_test.go ├── obj_head_iter.go ├── obj_iter.go ├── obj_resolver.go ├── oid.go ├── ref_filter.go ├── ref_filter_test.go ├── ref_iter.go ├── reference.go ├── tag.go └── tree.go ├── git_sizer_test.go ├── go.mod ├── go.sum ├── internal ├── refopts │ ├── filter_group_value.go │ ├── filter_value.go │ ├── ref_group.go │ ├── ref_group_builder.go │ └── show_ref_grouper.go └── testutils │ └── repoutils.go ├── isatty ├── isatty_disabled.go └── isatty_enabled.go ├── meter └── meter.go ├── negated_bool_value.go ├── script ├── bootstrap ├── build ├── cibuild ├── ensure-go-installed.sh ├── go ├── gofmt └── install-vendored-go └── sizes ├── explicit_root.go ├── footnotes.go ├── graph.go ├── grouper.go ├── output.go ├── path_resolver.go └── sizes.go /.github/workflows/lint.yml: -------------------------------------------------------------------------------- 1 | name: Lint 2 | on: 3 | push: 4 | paths: 5 | - "**.go" 6 | - go.mod 7 | - go.sum 8 | pull_request: 9 | paths: 10 | - "**.go" 11 | - go.mod 12 | - go.sum 13 | 14 | jobs: 15 | lint: 16 | runs-on: ubuntu-latest 17 | 18 | steps: 19 | - name: Set up Go 20 | uses: actions/setup-go@v2 21 | with: 22 | go-version: 1.17 23 | 24 | - name: Check out code 25 | uses: actions/checkout@v2 26 | 27 | - name: Verify dependencies 28 | run: | 29 | go mod verify 30 | go mod download 31 | 32 | LINT_VERSION=1.43.0 33 | curl -fsSL https://github.com/golangci/golangci-lint/releases/download/v${LINT_VERSION}/golangci-lint-${LINT_VERSION}-linux-amd64.tar.gz | \ 34 | tar xz --strip-components 1 --wildcards \*/golangci-lint 35 | mkdir -p bin && mv golangci-lint bin/ 36 | 37 | - name: Run checks 38 | run: | 39 | STATUS=0 40 | assert-nothing-changed() { 41 | local diff 42 | "$@" >/dev/null || return 1 43 | if ! diff="$(git diff -U1 --color --exit-code)"; then 44 | printf '\e[31mError: running `\e[1m%s\e[22m` results in modifications that you must check into version control:\e[0m\n%s\n\n' "$*" "$diff" >&2 45 | git checkout -- . 46 | STATUS=1 47 | fi 48 | } 49 | 50 | assert-nothing-changed go fmt ./... 51 | assert-nothing-changed go mod tidy 52 | 53 | bin/golangci-lint run --out-format=github-actions --timeout=3m || STATUS=$? 54 | 55 | exit $STATUS 56 | -------------------------------------------------------------------------------- /.github/workflows/release.yml: -------------------------------------------------------------------------------- 1 | name: Release 2 | 3 | on: 4 | push: 5 | tags: 6 | - "v*" 7 | 8 | permissions: 9 | contents: write 10 | 11 | jobs: 12 | lint: 13 | name: Release 14 | runs-on: ubuntu-latest 15 | steps: 16 | - name: Setup 17 | uses: 18 | actions/setup-go@v4 19 | with: 20 | go-version: 1.21 21 | 22 | - name: Checkout 23 | uses: actions/checkout@v4 24 | 25 | - name: Build releases 26 | run: | 27 | make releases VERSION=$GITHUB_REF_NAME 28 | 29 | - name: Release 30 | uses: softprops/action-gh-release@v1 31 | with: 32 | draft: true 33 | files: | 34 | releases/git-sizer-* 35 | -------------------------------------------------------------------------------- /.github/workflows/test.yml: -------------------------------------------------------------------------------- 1 | on: [push, pull_request] 2 | name: Test 3 | jobs: 4 | test: 5 | strategy: 6 | matrix: 7 | os: [ubuntu-latest, macos-latest, windows-latest] 8 | fail-fast: false 9 | runs-on: ${{ matrix.os }} 10 | steps: 11 | - name: Set up Go 12 | uses: actions/setup-go@v2 13 | with: 14 | go-version: '1.17' 15 | 16 | - name: Check out code 17 | uses: actions/checkout@v2 18 | 19 | - name: Get full repo history 20 | run: git fetch --prune --unshallow --tags 21 | 22 | - name: Download dependencies 23 | shell: bash 24 | run: go mod download 25 | 26 | - name: Build 27 | shell: bash 28 | run: | 29 | mkdir -p bin 30 | go build -o bin . 31 | ls -la bin 32 | 33 | - name: Test 34 | shell: bash 35 | run: go test -race -timeout 60s ./... 36 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /bin 2 | /releases 3 | /vendor 4 | -------------------------------------------------------------------------------- /Brewfile: -------------------------------------------------------------------------------- 1 | brew "go" 2 | -------------------------------------------------------------------------------- /CODEOWNERS: -------------------------------------------------------------------------------- 1 | * @github/git-storage-reviewers 2 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | Please note that this project is released with a Contributor Code of Conduct. By 4 | participating in this project you agree to abide by its terms. 5 | 6 | ## Our Pledge 7 | 8 | In the interest of fostering an open and welcoming environment, we as 9 | contributors and maintainers pledge to making participation in our project and 10 | our community a harassment-free experience for everyone, regardless of age, body 11 | size, disability, ethnicity, gender identity and expression, level of experience, 12 | education, socio-economic status, nationality, personal appearance, race, 13 | religion, or sexual identity and orientation. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to creating a positive environment 18 | include: 19 | 20 | * Using welcoming and inclusive language 21 | * Being respectful of differing viewpoints and experiences 22 | * Gracefully accepting constructive criticism 23 | * Focusing on what is best for the community 24 | * Showing empathy towards other community members 25 | 26 | Examples of unacceptable behavior by participants include: 27 | 28 | * The use of sexualized language or imagery and unwelcome sexual attention or 29 | advances 30 | * Trolling, insulting/derogatory comments, and personal or political attacks 31 | * Public or private harassment 32 | * Publishing others' private information, such as a physical or electronic 33 | address, without explicit permission 34 | * Other conduct which could reasonably be considered inappropriate in a 35 | professional setting 36 | 37 | ## Our Responsibilities 38 | 39 | Project maintainers are responsible for clarifying the standards of acceptable 40 | behavior and are expected to take appropriate and fair corrective action in 41 | response to any instances of unacceptable behavior. 42 | 43 | Project maintainers have the right and responsibility to remove, edit, or 44 | reject comments, commits, code, wiki edits, issues, and other contributions 45 | that are not aligned to this Code of Conduct, or to ban temporarily or 46 | permanently any contributor for other behaviors that they deem inappropriate, 47 | threatening, offensive, or harmful. 48 | 49 | ## Scope 50 | 51 | This Code of Conduct applies both within project spaces and in public spaces 52 | when an individual is representing the project or its community. Examples of 53 | representing a project or community include using an official project e-mail 54 | address, posting via an official social media account, or acting as an appointed 55 | representative at an online or offline event. Representation of a project may be 56 | further defined and clarified by project maintainers. 57 | 58 | ## Enforcement 59 | 60 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 61 | reported by contacting the project team at opensource@github.com. All 62 | complaints will be reviewed and investigated and will result in a response that 63 | is deemed necessary and appropriate to the circumstances. The project team is 64 | obligated to maintain confidentiality with regard to the reporter of an incident. 65 | Further details of specific enforcement policies may be posted separately. 66 | 67 | Project maintainers who do not follow or enforce the Code of Conduct in good 68 | faith may face temporary or permanent repercussions as determined by other 69 | members of the project's leadership. 70 | 71 | ## Attribution 72 | 73 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 74 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 75 | 76 | [homepage]: https://www.contributor-covenant.org 77 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | ## Contributing 2 | 3 | [fork]: https://github.com/github/git-sizer/fork 4 | [pr]: https://github.com/github/git-sizer/compare 5 | [code-of-conduct]: CODE_OF_CONDUCT.md 6 | 7 | Hi there! We're thrilled that you'd like to contribute to this project. Your help is essential for keeping it great. 8 | 9 | Please note that this project is released with a [Contributor Code of Conduct][code-of-conduct]. By participating in this project you agree to abide by its terms. 10 | 11 | ## Submitting a pull request 12 | 13 | 1. [Fork][fork] and clone the repository 14 | 2. Configure and install the dependencies: `script/bootstrap` 15 | 3. Make sure the tests pass on your machine: `make test` 16 | 4. Create a new branch: `git checkout -b my-branch-name` 17 | 5. Make your change, add tests, and make sure the tests still pass 18 | 6. Push to your fork and [submit a pull request][pr] 19 | 7. Pat your self on the back and wait for your pull request to be reviewed and merged. 20 | 21 | Here are a few things you can do that will increase the likelihood of your pull request being accepted: 22 | 23 | - Make sure that your code is formatted correctly according to `go fmt`: `go fmt .`. 24 | - Write tests. 25 | - Keep your change as focused as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests. 26 | - Write a [good commit message](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html). 27 | 28 | ## Resources 29 | 30 | - [How to Contribute to Open Source](https://opensource.guide/how-to-contribute/) 31 | - [Using Pull Requests](https://help.github.com/articles/about-pull-requests/) 32 | - [GitHub Help](https://help.github.com) 33 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 GitHub 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | PACKAGE := github.com/github/git-sizer 2 | GO111MODULES := 1 3 | export GO111MODULES 4 | 5 | GO := $(CURDIR)/script/go 6 | 7 | GO_LDFLAGS := -X main.BuildVersion=$(shell git describe --tags --always --dirty || echo unknown) 8 | GOFLAGS := -mod=readonly -ldflags "$(GO_LDFLAGS)" 9 | 10 | ifdef USE_ISATTY 11 | GOFLAGS := $(GOFLAGS) --tags isatty 12 | endif 13 | 14 | .PHONY: all 15 | all: bin/git-sizer 16 | 17 | .PHONY: bin/git-sizer 18 | bin/git-sizer: 19 | mkdir -p bin 20 | $(GO) build $(GOFLAGS) -o $@ . 21 | 22 | # Cross-compile for a bunch of common platforms. Note that this 23 | # doesn't work with USE_ISATTY: 24 | .PHONY: common-platforms 25 | common-platforms: 26 | 27 | # Create releases for a bunch of common platforms. Note that this 28 | # doesn't work with USE_ISATTY, and VERSION must be set on the command 29 | # line; e.g., 30 | # 31 | # make releases VERSION=1.2.3 32 | .PHONY: releases 33 | releases: 34 | 35 | # Define rules for a bunch of common platforms that are supported by go; see 36 | # https://golang.org/doc/install/source#environment 37 | # You can compile for any other platform in that list by running 38 | # make GOOS=foo GOARCH=bar 39 | 40 | define PLATFORM_template = 41 | .PHONY: bin/git-sizer-$(1)-$(2)$(3) 42 | bin/git-sizer-$(1)-$(2)$(3): 43 | mkdir -p bin 44 | GOOS=$(1) GOARCH=$(2) $$(GO) build $$(GOFLAGS) -ldflags "-X main.ReleaseVersion=$$(VERSION)" -o $$@ . 45 | common-platforms: bin/git-sizer-$(1)-$(2)$(3) 46 | 47 | # Note that releases don't include code from vendor (they're only used 48 | # for testing), so no license info is needed from those projects. 49 | .PHONY: releases/git-sizer-$$(VERSION)-$(1)-$(2).zip 50 | releases/git-sizer-$$(VERSION)-$(1)-$(2).zip: bin/git-sizer-$(1)-$(2)$(3) 51 | if test -z "$$(VERSION)"; then echo "Please set VERSION to make releases"; exit 1; fi 52 | mkdir -p releases/tmp-$$(VERSION)-$(1)-$(2) 53 | cp README.md LICENSE.md releases/tmp-$$(VERSION)-$(1)-$(2) 54 | cp bin/git-sizer-$(1)-$(2)$(3) releases/tmp-$$(VERSION)-$(1)-$(2)/git-sizer$(3) 55 | cp $$$$($$(GO) list -f '{{.Dir}}' github.com/spf13/pflag)/LICENSE \ 56 | releases/tmp-$$(VERSION)-$(1)-$(2)/LICENSE-spf13-pflag 57 | rm -f $$@ 58 | zip -j $$@ releases/tmp-$$(VERSION)-$(1)-$(2)/* 59 | rm -rf releases/tmp-$$(VERSION)-$(1)-$(2) 60 | releases: releases/git-sizer-$$(VERSION)-$(1)-$(2).zip 61 | endef 62 | 63 | $(eval $(call PLATFORM_template,linux,amd64)) 64 | $(eval $(call PLATFORM_template,linux,386)) 65 | 66 | $(eval $(call PLATFORM_template,darwin,amd64)) 67 | $(eval $(call PLATFORM_template,darwin,arm64)) 68 | 69 | $(eval $(call PLATFORM_template,windows,amd64,.exe)) 70 | $(eval $(call PLATFORM_template,windows,386,.exe)) 71 | 72 | .PHONY: test 73 | test: bin/git-sizer gotest 74 | 75 | .PHONY: gotest 76 | gotest: 77 | $(GO) test -timeout 60s $(GOFLAGS) ./... 78 | 79 | .PHONY: clean 80 | clean: 81 | rm -rf bin 82 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | _Happy Git repositories are all alike; every unhappy Git repository is unhappy in its own way._ —Linus Tolstoy 2 | 3 | # git-sizer 4 | 5 | Is your Git repository bursting at the seams? 6 | 7 | `git-sizer` computes various size metrics for a local Git repository, flagging those that might cause you problems or inconvenience. For example: 8 | 9 | * Is the repository too big overall? Ideally, Git repositories should be under 1 GiB, and (without special handling) they start to get unwieldy over 5 GiB. Big repositories take a long time to clone and repack, and take a lot of disk space. Suggestions: 10 | 11 | * Avoid storing generated files (e.g., compiler output, JAR files) in Git. It would be better to regenerate them when necessary, or store them in a package registry or even a fileserver. 12 | 13 | * Avoid storing large media assets in Git. You might want to look into [Git-LFS](https://git-lfs.github.com/) or [git-annex](http://git-annex.branchable.com/), which allow you to version your media assets in Git while actually storing them outside of your repository. 14 | 15 | * Avoid storing file archives (e.g., ZIP files, tarballs) in Git, especially if compressed. Different versions of such files don't delta well against each other, so Git can't store them efficiently. It would be better to store the individual files in your repository, or store the archive elsewhere. 16 | 17 | * Does the repository have too many references (branches and/or tags)? They all have to be transferred to the client for every fetch, even if your clone is up-to-date. Try to limit them to a few tens of thousands at most. Suggestions: 18 | 19 | * Delete unneeded tags and branches. 20 | 21 | * Avoid pushing your "remote-tracking" branches to a shared repository. 22 | 23 | * Consider using ["git notes"](https://git-scm.com/docs/git-notes) rather than tags to attach auxiliary information to commits (for example, CI build results). 24 | 25 | * Perhaps store some of your rarely-needed tags and branches in a separate fork of your repository that is not fetched from by normal developers. 26 | 27 | * Does the repository include too many objects? The more objects, the longer it takes for Git to traverse the repository's history, for example when garbage-collecting. Suggestions: 28 | 29 | * Think about whether you are storing very many tiny files that could easily be collected into a few bigger files. 30 | 31 | * Consider breaking your project up into multiple subprojects. 32 | 33 | * Does the repository include gigantic blobs (files)? Git works best with small- to medium-sized files. It's OK to have a few files in the megabyte range, but they should generally be the exception. Suggestions: 34 | 35 | * Consider using [Git-LFS](https://git-lfs.github.com/) for storing your large files, especially those (e.g., media assets) that don't diff and merge usefully. 36 | 37 | * See also the section "Is the repository too big overall?" 38 | 39 | * Does the repository include many, many versions of large text files, each one slightly changed from the one before? Such files delta very well, so they might not cause your repository to grow alarmingly. But it is expensive for Git to reconstruct the full files and to diff them, which it needs to do internally for many operations. Suggestions: 40 | 41 | * Avoid storing log files and database dumps in Git. 42 | 43 | * Avoid storing giant data files (e.g., enormous XML files) in Git, especially if they are modified frequently. Consider using a database instead. 44 | 45 | * Does the repository include gigantic trees (directories)? Every time a file is modified, Git has to create a new copy of every tree (i.e., every directory in the path) leading to the file. Huge trees make this expensive. Moreover, it is very expensive to traverse through history that contains huge trees, for example for `git blame`. Suggestions: 46 | 47 | * Avoid creating directories with more than a couple of thousand entries each. 48 | 49 | * If you must store very many files, it is better to shard them into a hierarchy of multiple, smaller directories. 50 | 51 | * Does the repository have the same (or very similar) files repeated over and over again at different paths in a single commit? If so, the repository might have a reasonable overall size, but when you check it out it balloons into an enormous working copy. (Taken to an extreme, this is called a "git bomb"; see below.) Suggestions: 52 | 53 | * Perhaps you can achieve your goals more effectively by using tags and branches or a build-time configuration system. 54 | 55 | * Does the repository include absurdly long path names? That's probably not going to work well with other tools. One or two hundred characters should be enough, even if you're writing Java. 56 | 57 | * Are there other bizarre and questionable things in the repository? 58 | 59 | * Annotated tags pointing at one another in long chains? 60 | 61 | * Octopus merges with dozens of parents? 62 | 63 | * Commits with gigantic log messages? 64 | 65 | `git-sizer` computes many size-related statistics about your repository that can help reveal all of the problems described above. These practices are not wrong per se, but the more that you stretch Git beyond its sweet spot, the less you will be able to enjoy Git's legendary speed and performance. Especially if your Git repository statistics seem out of proportion to your project size, you might be able to make your life easier by adjusting how you use Git. 66 | 67 | 68 | ## Getting started 69 | 70 | 1. Make sure that you have the [Git command-line client](https://git-scm.com/) installed, **version >= 2.6**. NOTE: `git-sizer` invokes `git` commands to examine the contents of your repository, so **it is required that the `git` command be in your `PATH`** when you run `git-sizer`. 71 | 72 | 2. Install `git-sizer`. Either: 73 | 74 | a. Install a released version of `git-sizer`(recommended): 75 | 1. Go to [the releases page](https://github.com/github/git-sizer/releases) and download the ZIP file corresponding to your platform. 76 | 2. Unzip the file. 77 | 3. Move the executable file (`git-sizer` or `git-sizer.exe`) into your `PATH`. 78 | 79 | b. Build and install from source. See the instructions in [`docs/BUILDING.md`](docs/BUILDING.md). 80 | 81 | 3. Change to the directory containing a full, non-shallow clone of the Git repository that you'd like to analyze. Then run 82 | 83 | git-sizer [