Git search for duplicate commits (by patch ID)

I need a recipe for finding duplicate changes. The patch identifier is likely to be the same, but the commit attributes may not be.

This is apparently intended to use the patch identifier:

git patch-id --help

IOW, you can use this thing to find possible duplicates.

I assume that adding the lines "git log", "git patch-id" and uniq might work poorly, but if someone has a command that performs well, I would appreciate it.

+8
source share
6 answers

Since the repeated changes are most likely not related to the same branch (except when switching between them), you can use git cherry :

git cherry [-v] [<upstream> [<head> [<limit>]]] 

Where upstream will be a branch to check for duplicate changes in head .

+11
source

To search for duplicates of a specific commit, this may work for you.

First determine the ID of the target commit patch:

 $ THE_COMMIT_REF_OR_SHA_YOURE_SEEKING_DUPES_OF='7a3e67c' $ git show $THE_COMMIT_REF_OR_SHA_YOURE_SEEKING_DUPES_OF | git patch-id f6ea51cd6acd30cd627ce1a56e2733c1777d5b52 7a3e67ce38dbef471889d9f706b9161da7dc5cf3 

The first SHA is the patch identifier. Then list the patch identifiers for each commit and filter out everything that matches:

 $ for c in $(git rev-list --all); do git show $c | git patch-id; done | grep 'f6ea51cd6acd30cd627ce1a56e2733c1777d5b52' f6ea51cd6acd30cd627ce1a56e2733c1777d5b52 5028e2b5500bd5f4637531e337e17b73f5d0c0b1 f6ea51cd6acd30cd627ce1a56e2733c1777d5b52 7a3e67ce38dbef471889d9f706b9161da7dc5cf3 f6ea51cd6acd30cd627ce1a56e2733c1777d5b52 929c66b5783a0127a7689020d70d398f095b9e00 

All together with several additional flags and in the form of a utility script :

 test ! -z "$1" && TARGET_COMMIT_SHA="$1" || TARGET_COMMIT_SHA="HEAD" TARGET_COMMIT_PATCHID=$( git show --patch-with-raw "$TARGET_COMMIT_SHA" | git patch-id | cut -d' ' -f1 ) MATCHING_COMMIT_SHAS=$( for c in $(git rev-list --all); do git show --patch-with-raw "$c" | git patch-id done | fgrep "$TARGET_COMMIT_PATCHID" | cut -d' ' -f2 ) echo "$MATCHING_COMMIT_SHAS" 

Using:

 $ git list-dupe-commits 7a3e67c 5028e2b5500bd5f4637531e337e17b73f5d0c0b1 7a3e67ce38dbef471889d9f706b9161da7dc5cf3 929c66b5783a0127a7689020d70d398f095b9e00 

It's not very fast, but for most repositories you need to complete the task (just 36 seconds for a repo with 826 commits and 158 MB .git for 2.4 GHz Core 2 Duo).

+9
source

I have a project that works on a repo game, but since it holds patch-> commit map in memory, there may be a problem with large repositories:

 # print commit pairs with the same patch-id for c in $(git rev-list HEAD); do \ git show $c | git patch-id; done \ | perl -anle '($p,$c) =@F ;print "$c $s{$p}" if $s{$p};$s{$p}=$c' 

The output should be a pair of commits with the same patch identifier (3 duplicate ABCs come out as "AB", then "BC").

Modify the git rev-list command to limit marked commits:

 git log --format=%H HEAD somefile 

Add "| xargs git show -s -oneline" to see the commits in detail, or "| xargs git show -s -oneline" for a summary:

 0569473 add 6-8 5e56314 add 6-8 again bece3c3 comment e037ed6 add comment again 

It turns out that the patch identifier did not work in my original case, as further changes were made in a subsequent commit. "git log -S" was more useful.

+3
source

The excellent team offered by bsb requires a few small tweaks:

(1) Instead of git show , which runs git diff-tree --cc , the command should use

  git diff-tree -p 

Otherwise, git patch-id generates false null SHA1 hashes.

(2) When the xargs pipe is used, xargs must have the -L 1 argument. Otherwise, triple fixation will not be associated with equivalent fixation.

There is an alias ~/.gitconfig :

 dup = "!f() { for c in $(git rev-list HEAD); do git diff-tree -p $c | git patch-id; done | perl -anle '($p,$c) =@F ;print \"$c $s{$p}\" if $s{$p};$s{$p}=$c' | xargs -L 1 git show -s --oneline; }; f" # "git dup" lists duplicate commits 
+2
source

To search for duplicate commit $hash excluding merge transactions:

 git rev-list --no-merges --all | xargs -r git show | git patch-id \ | grep ^$(git show $hash|git patch-id|cut -c1-40) | cut -c42-80 \ | xargs -r git show -s --oneline 

To find a duplicate merge commit $mergehash replace $(git show $hash|git patch-id|cut -c1-40) above with one of the two patch identifiers (1st column) given by git diff-tree -m -p $mergehash | git patch-id git diff-tree -m -p $mergehash | git patch-id . They correspond to differences in association with each of his two parents.

To find duplicates of all commits excluding merge transactions:

 git rev-list --no-merges --all | xargs -r git show | git patch-id \ | sort | uniq -w40 -D | cut -c42-80 \ | xargs -r git log --no-walk --pretty=format:"%h %ad %an (%cn) %s" --date-order --date=iso 

The search for duplicate commits can be expanded or limited by changing the arguments to git rev-list , which accepts many options. For example, to limit the search to a specific branch, specify its name instead of the --all option; or to search in the last 100 commits, arguments HEAD ^HEAD~100 are passed.

Note that these commands are fast because they do not use a shell loop, and the batch process commits.

To enable compilation, remove the --no-merges and replace xargs -r git show with xargs -r -L1 git diff-tree -m -p . This is much slower because git diff-tree runs once to commit.

Explanation:

  • The first line generates a patch identifier map with commit hashes (data from two columns of 40 characters each).

  • The second row stores commit hashes (2nd column) corresponding to duplicate patch identifiers (1st column).

  • The last line prints user information about duplicate commits.

+1
source

For those who want to do this in Windows PowerShell, the equivalent unagi command answer is:

 git rev-list --no-merges --all | %{&git.exe show $_} | git patch-id | ConvertFrom-String -PropertyNames PatchId, Commit | Group-Object PatchId | Where-Object count -gt 1 | %{$_.group.Commit + " "} 

Gives an output like:

 1605e0e1e13d7b3f456c20432d8edec664ca7117 1e8efa8f2f01962a2c08fd25caf687d330383428 b45b6db084b27ae420ac8e9cf6511110ebb46513 4a2e1e3ba5a9a1d5db1d00343813e1404f6124e2 

With duplicate commits, hashed together.

ATTENTION: In my repo, it was a slow command, so be sure to filter the call in the rev-list accordingly!

0
source

Source: https://habr.com/ru/post/921096/


All Articles