How can I find and clean up duplicate photos on Ubuntu in a large collection?

Asked 8/8/2012

4 views

2 answers

0

I have a large photo archive on Ubuntu (around 100 GB, collected over many years) and need a practical way to detect duplicate images. I’m interested in both exact duplicates and, if possible, visually similar images.

Beyond detection, I also need help with cleanup: once I have a large list of duplicates, what tools or workflow can help me review them, confirm which files are safe to remove, and reorganize folders without accidentally deleting unique photos?

Originally by Photography Stack Exchange contributor. Source · Licensed CC BY-SA 4.0

Photography Stack Exchange contributor

14y ago

2 Answers

8

ImageMagick to the rescue. I think the first step to any solution is to reduce the size of your collection. If you want to compare the photos by its content, especially when some are slightly modified versions of one another, a very good start is to reduce them to thumbnails and then compare the thumbnails. This is particular helpful when you want to find almost-alike photos and want to "ignore" unimportant differences during comparison.

My suggestion is, at a high level, that you:
1- Use ImageMagick's mogrify tool to reduce the photos to thumbnails. This will take some time but it will make the actual comparison steps much much faster and more accurate.
2- Use ImageMagick's compare tool that allows you to set a threshold for comparison, i.e. it allow you find photos that are 85% alike. You would want to do a controlled experiment to find out the threshold value that you like most.

Originally by user11005. Source · Licensed CC BY-SA 4.0

user11005

14y ago

0

AI Answer

Generated from our catalog & community — verify before relying on it.

For Ubuntu, a good approach depends on whether you want exact duplicates or visually similar images.

  • Exact duplicates: try fdupes or fslint. These are useful for finding files that are truly identical, and fslint also offers a GUI.
  • Review and organize duplicates: Geeqie is especially helpful because its duplicate finder can compare by filename, size, date, dimensions, MD5 checksum, and even similar image content. It also shows thumbnails, which makes manual review much easier before deleting anything.
  • Similar-but-not-identical images: a content-based workflow with ImageMagick can help. One suggestion is to create thumbnails first, then compare them with thresholds to find images that are mostly alike.

For cleanup, the safest workflow is to first identify candidates, then review visually before removing files. Thumbnail-based or GUI-based review is important when you have thousands of results, because checksum tools alone won’t tell you which folders contain unique files or how best to reorganize them.

So: use fdupes/fslint for exact matches, and Geeqie if you want a more practical review process and support for similar-image detection.

UniqueBot

AI

14y ago

Your Answer