Git search for duplicate commits (by patch ID)

Question

Git search for duplicate commits (by patch ID)

I need a recipe for finding duplicate changes. The patch identifier is likely to be the same, but the commit attributes may not be.

This is apparently intended to use the patch identifier:

git patch-id --help
IOW, you can use this thing to find possible duplicates.

I assume that adding the lines "git log", "git patch-id" and uniq might work poorly, but if someone has a command that performs well, I would appreciate it.

+8

git commit duplicates

bsb Jul 23 '12 at 2:17

source share

6 answers

robinst · Answer 1 · 2012-07-23T07:31:06+0000

Since the repeated changes are most likely not related to the same branch (except when switching between them), you can use git cherry :

git cherry [-v] [<upstream> [<head> [<limit>]]]

Where upstream will be a branch to check for duplicate changes in head .

Slipp D. Thompson · Answer 2 · 2013-11-03T21:08:50+0000

To search for duplicates of a specific commit, this may work for you.

First determine the ID of the target commit patch:

 $ THE_COMMIT_REF_OR_SHA_YOURE_SEEKING_DUPES_OF='7a3e67c' $ git show $THE_COMMIT_REF_OR_SHA_YOURE_SEEKING_DUPES_OF | git patch-id f6ea51cd6acd30cd627ce1a56e2733c1777d5b52 7a3e67ce38dbef471889d9f706b9161da7dc5cf3

The first SHA is the patch identifier. Then list the patch identifiers for each commit and filter out everything that matches:

 $ for c in $(git rev-list --all); do git show $c | git patch-id; done | grep 'f6ea51cd6acd30cd627ce1a56e2733c1777d5b52' f6ea51cd6acd30cd627ce1a56e2733c1777d5b52 5028e2b5500bd5f4637531e337e17b73f5d0c0b1 f6ea51cd6acd30cd627ce1a56e2733c1777d5b52 7a3e67ce38dbef471889d9f706b9161da7dc5cf3 f6ea51cd6acd30cd627ce1a56e2733c1777d5b52 929c66b5783a0127a7689020d70d398f095b9e00

All together with several additional flags and in the form of a utility script :

 test ! -z "$1" && TARGET_COMMIT_SHA="$1" || TARGET_COMMIT_SHA="HEAD" TARGET_COMMIT_PATCHID=$( git show --patch-with-raw "$TARGET_COMMIT_SHA" | git patch-id | cut -d' ' -f1 ) MATCHING_COMMIT_SHAS=$( for c in $(git rev-list --all); do git show --patch-with-raw "$c" | git patch-id done | fgrep "$TARGET_COMMIT_PATCHID" | cut -d' ' -f2 ) echo "$MATCHING_COMMIT_SHAS"

Using:

 $ git list-dupe-commits 7a3e67c 5028e2b5500bd5f4637531e337e17b73f5d0c0b1 7a3e67ce38dbef471889d9f706b9161da7dc5cf3 929c66b5783a0127a7689020d70d398f095b9e00

It's not very fast, but for most repositories you need to complete the task (just 36 seconds for a repo with 826 commits and 158 MB .git for 2.4 GHz Core 2 Duo).

bsb · Answer 3 · 2012-07-25T04:30:30+0000

I have a project that works on a repo game, but since it holds patch-> commit map in memory, there may be a problem with large repositories:

 # print commit pairs with the same patch-id for c in $(git rev-list HEAD); do \ git show $c | git patch-id; done \ | perl -anle '($p,$c) =@F ;print "$c $s{$p}" if $s{$p};$s{$p}=$c'

The output should be a pair of commits with the same patch identifier (3 duplicate ABCs come out as "AB", then "BC").

Modify the git rev-list command to limit marked commits:

 git log --format=%H HEAD somefile

Add "| xargs git show -s -oneline" to see the commits in detail, or "| xargs git show -s -oneline" for a summary:

 0569473 add 6-8 5e56314 add 6-8 again bece3c3 comment e037ed6 add comment again

It turns out that the patch identifier did not work in my original case, as further changes were made in a subsequent commit. "git log -S" was more useful.

Gidfiddle · Answer 4 · 2017-07-28T12:26:11+0000

The excellent team offered by bsb requires a few small tweaks:

(1) Instead of git show , which runs git diff-tree --cc , the command should use

  git diff-tree -p

Otherwise, git patch-id generates false null SHA1 hashes.

(2) When the xargs pipe is used, xargs must have the -L 1 argument. Otherwise, triple fixation will not be associated with equivalent fixation.

There is an alias ~/.gitconfig :

 dup = "!f() { for c in $(git rev-list HEAD); do git diff-tree -p $c | git patch-id; done | perl -anle '($p,$c) =@F ;print \"$c $s{$p}\" if $s{$p};$s{$p}=$c' | xargs -L 1 git show -s --oneline; }; f" # "git dup" lists duplicate commits

unagi · Answer 5 · 2017-08-30T01:49:31+0000

To search for duplicate commit $hash excluding merge transactions:

 git rev-list --no-merges --all | xargs -r git show | git patch-id \ | grep ^$(git show $hash|git patch-id|cut -c1-40) | cut -c42-80 \ | xargs -r git show -s --oneline

To find a duplicate merge commit $mergehash replace $(git show $hash|git patch-id|cut -c1-40) above with one of the two patch identifiers (1st column) given by git diff-tree -m -p $mergehash | git patch-id git diff-tree -m -p $mergehash | git patch-id . They correspond to differences in association with each of his two parents.

To find duplicates of all commits excluding merge transactions:

 git rev-list --no-merges --all | xargs -r git show | git patch-id \ | sort | uniq -w40 -D | cut -c42-80 \ | xargs -r git log --no-walk --pretty=format:"%h %ad %an (%cn) %s" --date-order --date=iso

The search for duplicate commits can be expanded or limited by changing the arguments to git rev-list , which accepts many options. For example, to limit the search to a specific branch, specify its name instead of the --all option; or to search in the last 100 commits, arguments HEAD ^HEAD~100 are passed.

Note that these commands are fast because they do not use a shell loop, and the batch process commits.

To enable compilation, remove the --no-merges and replace xargs -r git show with xargs -r -L1 git diff-tree -m -p . This is much slower because git diff-tree runs once to commit.

Explanation:

The first line generates a patch identifier map with commit hashes (data from two columns of 40 characters each).
The second row stores commit hashes (2nd column) corresponding to duplicate patch identifiers (1st column).
The last line prints user information about duplicate commits.

James close · Answer 6 · 2019-06-17T16:07:35+0000

For those who want to do this in Windows PowerShell, the equivalent unagi command answer is:

 git rev-list --no-merges --all | %{&git.exe show $_} | git patch-id | ConvertFrom-String -PropertyNames PatchId, Commit | Group-Object PatchId | Where-Object count -gt 1 | %{$_.group.Commit + " "}

Gives an output like:

 1605e0e1e13d7b3f456c20432d8edec664ca7117 1e8efa8f2f01962a2c08fd25caf687d330383428 b45b6db084b27ae420ac8e9cf6511110ebb46513 4a2e1e3ba5a9a1d5db1d00343813e1404f6124e2

With duplicate commits, hashed together.

ATTENTION: In my repo, it was a slow command, so be sure to filter the call in the rev-list accordingly!

Git search for duplicate commits (by patch ID)

More articles: