How can I tell whether two JPEG files contain the same image data if their metadata differs?
Asked 5/13/2019
3 views
2 answers
0
I’m trying to automate photo deduplication when copying images from my phone to my PC. Two JPEGs may have different EXIF/IPTC tags or other metadata, so a normal file hash won’t match even if the underlying image is identical.
Is there a reliable way to compare only the actual image data, without being misled by metadata differences? Ideally I want to detect files that contain the exact same JPEG image data, not just visually similar photos or recompressed versions.
Originally by Photography Stack Exchange contributor. Source · Licensed CC BY-SA 4.0
Photography Stack Exchange contributor
7y ago
2 Answers
2
There are different levels of "The same image"
Some programs mentioned in the "Possible duplicate of How can I identify duplicate image files? analyze a fingerprint of an image. I understand that they analyze sort of a low-resolution version of the images and compare them.
This is to compare a small matrix of data to reduce time.
Some of them mark "similar images" for example photos from the same photoshoot.
Some of them actually identify pretty well the same image, even if they are a different resolution for example. They are the same but not really.
Another level of "The same image" can be the same resolution but different compression.
But this would need to examine all the data and compare them.
Some programs only read the byte count and date for example. If they both measure exactly the same bits and has exactly the same date, they are probably a duplicate.
But if you want to see if they are exactly the same image, as a photographer you can stack one on top of the other and use the difference blending mode. If the image is totally black, this means the image has no difference in pixels.
I guess one way to compare them is to open an image as a binary data or as hexa data. For example, in Irfanview, you can open an image, press F3 and see both formats.
There are some libraries for PHP that let you convert an image to hex code as text, so probably you can use some of that.
Originally by user37321. Source · Licensed CC BY-SA 4.0
user37321
7y ago
0
Generated from our catalog & community — verify before relying on it.
Yes. If the only differences are metadata, strip the metadata from both files and then compare the results.
A simple workflow is:
- Make temporary copies of both files.
- Remove metadata from each copy, for example with
exiftool -all=. - Compare the stripped files with a binary compare tool such as
cmp, or hash those stripped files.
Example:
cp img1.jpg tmp1.jpgcp img2.jpg tmp2.jpgexiftool -all= tmp1.jpgexiftool -all= tmp2.jpgcmp tmp1.jpg tmp2.jpg
If cmp reports no differences, the files contain the same remaining JPEG data after metadata removal.
This is useful when duplicate files differ only by EXIF/IPTC/orientation-style metadata. It is not the same as finding “similar” images, resized copies, or recompressed JPEGs. Those require image fingerprinting or full decode-and-compare methods, and may treat near-matches as duplicates.
Also note that JPEG files can vary in structure, so this approach is best for the case you described: exact same encoded image content with different metadata.
Recommended products
UniqueBot
AI7y ago
Your Answer
Related Questions
How can I verify whether two RAW files contain the same sensor data but different metadata?
Can metadata crop an image without re-encoding the JPEG?
Does editing EXIF metadata change JPEG image quality?
Why do RAW+JPEG photos from my Canon 1100D show up as two JPEGs on my PC?
Why can photos with the same resolution and format have very different file sizes?
