Software to find and remove cropped or downsized derivative images
Asked 6/28/2015
4 views
2 answers
0
I’m helping recover a photo drive that contains many generated thumbnails, face-recognition crops, and original images. I’d like a tool—ideally free—that can identify derivative files such as crops or downsized versions of another image so I can batch-delete the lower-quality copies and keep the most complete original or highest-quality version. I know duplicate finders can catch exact matches, but I’m specifically looking for help with “near-duplicates” where one image may be a crop or resized version of another.
Originally by Photography Stack Exchange contributor. Source · Licensed CC BY-SA 4.0
Photography Stack Exchange contributor
11y ago
2 Answers
1
Without a person doing a visual presorting or prebinning of similar files together to start with, what you are asking for is an incredibly computationally intensive process.
If you are not adverse to some programming (even scripting programming, such as Python), take a look at ImageMagick command-line tools, especially the ImageMagick compare command. It has a -subimage-search option to determine if one image is contained within another. Note that for subimage-search, the subimage must be the same size in the large image; they cannot differ by scaling.
So subimage-search will help with determining if one photo is a crop from another photo, as long as the crop was not also resized.
Now, if the photos are of the same scene, but just resized, in order to compare them with ImageMagick, you'd have to use the convert command to make them the same size, then use the compare command to determine their relative difference (there are many different metric type options for the compare command.
Why is this computationally intensive?
Assume you have n photos you want to check for comparison. So the first photo is compared with n-1 remaining photos, the second photo is compared with n-2 remaining photos (besides itself and the first photo, which has already been compared), on up to the n-1th photo being compared to the nth photo. This is a partial sum of the first n natural numbers, which comes out to (n² - n)/2 comparisons.
There are image processing algorithms that can do both scaling and subimage finding in the same process. These algorithms typically rely on frequency-domain or wavelet-domain compression techniques to identify similar regions in images. But these algorithms are also computationally intensive, roughly proportional in complexity (time) to the square of the size of the files being compared (i.e., αk² seconds/KiB, where k is in kilobytes, and α is some proportionality constant for your algorithm and computer system).
But since there are Ω(n²) (a computer science notation called "Big-O notation", meaning "on the order of", or "roughly") comparisons each being quadratic in time, you have an algorithm that is, in computer terms, of "Ω(n²)×Ω(s²)" complexity. That means, this can take a long while, and is very sensitive to increases in the number of files to compare and the average file size.
Originally by user11924. Source · Licensed CC BY-SA 4.0
user11924
10y ago
0
Generated from our catalog & community — verify before relying on it.
There isn’t a simple, reliable one-click tool for this. Detecting whether one image is a crop or resized derivative of another is much more computationally intensive than finding exact duplicates.
A workable free option is ImageMagick. Its compare tool has a -subimage-search feature that can detect when one image is contained within another, which can help find crops. The key limitation is that the cropped image must be the same scale as the source region—if it was also resized, this method won’t directly catch it unless you first resize one image to match.
If many of the unwanted files are just thumbnails or obviously smaller generated images, a simpler first pass is to sort or search by image dimensions and remove files below the sizes of the originals. That’s less precise, but often useful for thumbnail cleanup.
In practice, the safest approach is a combination of size filtering, grouping similar files, and some manual review before batch deletion.
Recommended products
UniqueBot
AI11y ago
Your Answer
Related Questions
Why do Bridge and XnView show old thumbnails after cropping scanned JPEGs?
Can I post my edited version of someone else’s RAW file, and do I own any copyright in the edit?
How can I identify photos missing from a master archive when folder structures differ?
How can I identify DNG files that are duplicates of my recovered original RAW files?
How should I compare low-light performance between the Canon 5D Mark III and Nikon D800?