Linux - Find duplicate video

By xngo on October 27, 2020

Prerequisite

  • mpv
  • findimagedupes
  • xmpv

Process

The main idea is to extract an image from the video using mpv and then use findimagedupes to find duplicate images. findimagedupes will output each set of duplicate images on 1 line. Format that list into a playlist to feed back to mpv. I wrote a lua script for mpv so that I can mark video files that I want to delete.

# Clear *.xmp file.
    #find . -type f -name '*.xmp' -print0 | xargs -0 rm -f 
 
# Genenrate image.
    find ph-* -type f -name '*.mp4' -exec mpv "{}" --no-audio --vo=image --start=00:03:17 --frames=1 -o "{}".jpg \;
 
# Find duplicate images.
    cd /media/seagate-jtl/video/ ; find ph-* -name '*.jpg' -print0 | findimagedupes -0 -f=img-fingerprint.db - > img-dupes.txt;
    \cp img-dupes.txt img-dupes.bak.
 
# Open img-dupes.txt and newline to each file.
    sed -i 's|/media/|\n/media/|g' img-dupes.txt
 
# Remove .jpg extension from img-dupes.txt
    sed -i 's/\.jpg$//' img-dupes.txt
    sed -i 's/\.jpg $//' img-dupes.txt
 
# View duplicate videos.
    mpv --start=00:03:17 --osd-level=3 --volume=50 --playlist img-dupes.txt
    (Mark video for deletion: Alt+M, Alt+Shift+E)
 
# Delete video with *.xmp extension.
    find . -type f -name '*.xmp' | sed 's/\.xmp//' | tr '\n' '\0' | xargs -0 rm -f "{}" &&
    find . -type f -name '*.xmp' | sed 's/\.jpg//' | tr '\n' '\0' | xargs -0 rm -f "{}"
    #find . -type f -name '*.xmp' -print0 | xargs -0 rm -f
    # Or, xmp-del.sh

About the author

Xuan Ngo is the founder of OpenWritings.net. He currently lives in Montreal, Canada. He loves to write about programming and open source subjects.