What do "scene-referred," "linear," and 8-bit mean in lossy DNG compared with JPEG?

Question

I understand what lossy compression means, but I’m confused about the claims that a lossy DNG can be 8-bit yet still be "scene-referred" and "linear," and that it preserves the original dynamic range better than a normal JPEG. What do "scene-referred" and "output-referred" mean in practice? Is 0,0,0 black / 255,255,255 white in an 8-bit lossy DNG fundamentally different from 0,0,0 / 255,255,255 in a standard 8-bit JPEG? Also, what’s the visual difference between clipping a 12-bit capture down to a smaller range versus compressing the full range into 8 bits, and where does linearity matter?

dng jpeg raw dynamic-range color

Originally by Photography Stack Exchange contributor. Source · Licensed CC BY-SA 4.0

Photography Stack Exchange contributor

12y ago

user11392 · Answer

When outputting a standard JPEG, the camera tries to adjust for what it thinks will look good for the image output. It may not use the full real-world dynamic range that the camera can read, so values may be truncated on the ends in output-referred, making it so that two values that were previously not both full white or full black become full white or full black. Scene-referred prevents this by keeping the max black and max white points, but makes the steps bigger for each value. The maximum black and white points are kept proportional to everything else the sensor read, but both still lose the ability to represent the same number of colors.

In many cases, the output-referred may actually produce a better image in the long run since it focuses the color distinctions that matter to the output instead of throwing them away on a purely even fashion.

To illustrate the difference, lets say we have a sequence of numbers:

1,4,5,6,5,7,6,10

If we compress them using output compression then perhaps the camera realizes that 1 and 10 are outliers and discards them, so the final output when reducing the color space ends up being something like:

1,1,2,3,2,4,3,4

As you can see, the brightest point becomes significantly dimmer since the new value of 1 corresponds to the old value of 4 and the new value of 4 corresponds to the old value of 10, but the differences between the majority of the image are preserved well. If we used scene-referred linear though, we have to preserve the values on the upper and lower ends, so we get something like the following:

1,2,2,2,2,3,2,4

We may have preserved that the darkest and brightest parts of the image were significantly brighter and darker, but now the entire middle of the image is the same color because we didn't have sufficient resolution to distinguish the colors that were close together.

When they mention linear, they mean that that is how the colors get mapped. If you had an input like:

1,1,2,2,2,1,2,3,3,4,1,3,10

and you wanted to capture it as best possible, using a non-linear conversion could allow you to preserve much of the detail in the dark parts but still capture the bright part since there are no moderately bright parts, but the means the loss of color information is uneven across the image. Again, this is good if you know what is important about the image, but if you don't, then it may discard important information. Using a linear curve minimizes the chance of an outlying value getting lost, but it means that detail is lost in areas of higher concentration of color.

What do "scene-referred," "linear," and 8-bit mean in lossy DNG compared with JPEG?

2 Answers

Your Answer

Related Questions