How are YCbCr components stored in a JPEG image?

Question

I understand that an uncompressed 24-bit RGB image typically uses 8 bits each for red, green, and blue. In a JPEG, images are often converted to YCbCr before compression. For a JPEG derived from a 24-bit image, how are the Y, Cb, and Cr components actually represented? Are bits still divided evenly among the channels, or does JPEG store them differently?

compression ycbcr color-space jpeg chroma-subsampling

Originally by user3720. Source · Licensed CC BY-SA 4.0

user3720

15y ago

user3422 · Answer

A JPEG may start out with 8 bits per R, G and B channel, but when stored in the JPEG it is stored very differently, where there is no real "bit depth" but instead values are stored as frequency coefficients of a given precision.

In JPEG what's more relevant is the quantization rate, which affects how much information is thrown away during the quantization stage of compression and thus how precise each coefficient is. This quantization rate is set by the "quality" setting when you save a JPEG in photoshop. It is not related to the bit depth as in a raster image though, and you could even say that a JPEG image doesn't have a bit depth while in JPEG format, although JPEG encoders/decoders start with/end with a 24-bit raster image.

The other major factor relevant in saving a JPEG is the chroma sub-sampling type. In a JPEG, you have the option of halving the horizontal, or both the horizontal and vertical, resolution of the color (Pr and Pb) channels relative to the luminance (lightness) channel. When decompressing, the color channels are interpolated and in most photographic subject matter it doesn't make a huge amount of difference.

Here's a rough summary of how an image gets turned into a JPEG.

RGB values are converted to Y, Pb, Pr values. The YPbPr color space is better suited for efficient compression because it keeps the luminance information, which carries the most detail, in only one channel. This conversion is a simple arithmetic operation which is perfectly reversible, apart from if there is any rounding error.
If using any chroma-subsampling (in other words, using anything other than 4:4:4 mode), then the vertical and/or horizontal resolution of the Pb and Pr channels only are halved. Thus these channels will have different pixel dimensions to the luminance channel. This leads to permanent loss of resolution in the color channels.
For each channel, the image is divided up into blocks of 8 pixels by 8 pixels, which gives 64 linear values for each such block in each channel. If a channel is not a multiple of 8 pixels in either dimension, then the edge pixels are repeated (and will be thrown out when decompressing - thus JPEG compression is always more efficient with dimensions that are multiples of 8 pixels, or 16 if you factor in chroma subsampling).
The 64 values in each block undergo a transformation from the space domain into the frequency domain, in this case called a discrete cosine transformation. You end up with 64 coefficients, each representing the amplitude of a particular frequency map over the area taken by that block. The first value is the lowest frequency which is effectively the average value of all the pixels, right up until the last values which describe the highest frequency component of the block. The earlier values all deviate a lot more, and are more important to the look of the final image than the later values in a block. This operation is perfectly reversible as long as you use enough precision.
Then there is the quantization step, where each of the 64 coefficients you got to in the previous step is divided by some number (called the quantization factor), and the remainder is thrown out. This is where the precision of the samples are affected the most, but it's where you get the huge space savings from JPEG compared to lossless compression. Since everything is in the frequency domain since the previous transformation, this loss of accuracy does its best job at preserving perceptual image quality than simply reducing bit depth/accuracy of pixels would before this transformation. The reverse of this procedure is simply to multiply by the same number you divided the coefficients by, but of course since you threw the remainders away you end up with less precision of the coefficients. This results in permanent loss of quality, but not on a pixel-by-pixel basis but affecting the 8x8 block as a whole according to the frequency pattern of those coefficients.
After this quantization it's typical for many of the later, less significant coefficients to be zero, so these are thrown out. Then a (lossless) variable-length coding routine encodes all the remaining coefficients in an efficient way, even though each one may use a different number of bits.

It's impossible to say that a certain quantization factor is equivalent to a certain bit depth since quantization does not give banding like when you reduce the bit depth, but instead gives an overall perceptual loss in detail, starting in the parts where you'd notice it less because it's of such low amplitude for its frequency.

How are YCbCr components stored in a JPEG image?

2 Answers

Your Answer

Related Questions