Skip to main content

Repository Cleanup

Your GIT repository takes ages to clone, is bloated and full of large BLOBs? Use this walkthrough - with caution - to clean up your repository history from unwanted files and directories.

Why with caution..?

..you may ask yourself. Well, this procedure cleanses the repository of the whole existence of unwanted inhabitants and rewrites the history of each branch, as if they have never existed.

 C
|
B (a4c29bd) - REMOVE/PATH/
|
A

↓ BEGONE ↓

 C
|
A

Only use this in direct consultation and after approval by a superior

Usage

  1. We'll create a local tracking branch for each existing remote branch
for remote in `git branch -r | grep -v /HEAD`; do git checkout --track $remote ; done
  1. This command lists all objects found in the history listed by size (ASC)
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
  1. We'll filter each branch and specify the path (or file) that should be removed
  • replace PATH/TO/REMOVE/ with your path or file
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch PATH/TO/REMOVE/' --prune-empty --tag-name-filter cat -- --all
  1. Delete original references created by filter-branch
  • These references are backups of your previous commit history before rewriting. If you’re sure the new history is correct and safe, remove them:
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

Explanation: filter-branch creates backup refs in refs/original/. This command deletes those obsolete refs to avoid accidentally pushing them or restoring the old state.

  1. Remove leftover logs and original references from disk
rm -Rf .git/logs .git/refs/original

Explanation: These are Git's internal logs and references that still point to the old history. Deleting them helps reduce repo size and prepares it for garbage collection.

  1. Run aggressive garbage collection to physically remove orphaned objects
git gc --prune=all --aggressive

Explanation: This removes all unreachable objects from the .git directory and repacks the repository. The --aggressive flag optimizes space usage by thoroughly compressing the object database.

  1. Count size of git objects in an ascending human-readable format (optional)
git count-objects -vH