Why are image formats different if an image is just pixel values?

Question

I work with images mostly as data arrays, so I’m trying to understand what actually distinguishes file formats. If an image is eventually displayed as RGB pixel values, why do formats like RAW, TIFF, JPEG, GIF, or grayscale images differ so much? Why don’t all images with the same dimensions have the same file size?

I’m also confused about bit depth. What is the practical difference between 8-bit, 16-bit, and 32-bit images if I think of an image as numeric pixel values?

And for lossy compression such as JPEG: if the compression changes some pixel values slightly, how does that reduce file size, and why might that affect visual quality more or less visibly?

In short, what assumptions in “an image is just a 3-channel array of integers from 0–255” are incorrect?

user72870 · Answer

Sorry, but your basic premise is wrong: an image can be encoded as an array of RBG pixels with 8 bits per value, but there are a lot of other ways:

one channel with one bit/channel (pure black and white),
one channel with x bit/channel (grayscale formats, x will usually be 8 or 16, giving 256 or 65536 values),
various palette-based formats (cf.GIF)
full-colour with (at least in theory) as many channels as you wish with any required bit depth.

And that's for the image as stored in the computer's RAM during editing/viewing. I'm ignoring the various RAW image formats that exist (here and in the rest of this post).

For photography, most common are 3 channels with 8, 16 or 32 bit/channel (usually integer, but at least some programs work internally with 32-bit floating point numbers). Often there's a 4th channel (alpha), especially when the program allows the use of layers. And somewhere, the dimensions of the image array need to be stored.

There are various reasons for these different formats. For the in-memory format, an important consideration used to be the size of the data, and the speed (much faster to manipulate one 8-bit channel than 4 32-bit channels). Those are less important nowadays, but we got full colour management with various colour spaces. Some of those (e.g. prophoto RGB) need at least 16 bits/channel to keep differences between neighbouring colours small enough to avoid visible banding. And as treatments get more complicated, there are advantages to using 32-bit floating point numbers (where colours are encoded with values between 0.0 and 1.0, and the treatment allows intermediate values outside this range).

If you want to be able to store the image to file, and reload it to the same in-memory data, you'll need to use at least as many bits per channel as the im-memory format, and you must store information about image dimensions, bit depth and colour space.

Users of those images also like to store some additional information about the image (caption, title, who took the image, etc...). Again various ways to store this information.

Then there are different ways of compressing the image data for file storage. One of the simpler ones is RLE (Run Length Encoding), where you store a count and a pixel value whenever you encounter a repeated pixel value. Others, like jpeg, are a lot more complicated, but also give a lot more compression. E.g. jpeg uses a cosine transform, and throws away the (less visible) high-frequency information, giving high compression rates at the cost of information loss (there's more to it, but this is getting too long as it is).

This already gives a lot of ways to store the information on disk, but whatever way you pick, the format must be well specified to allow correct interpretation on loading the image.

Then there is a constant development in e.g. lossless compression techniques, which existing formats can't always handle.

So we end up with a variety of file formats, with various trade-offs between fidelity of the stored information, disk space occupied and speed of reading, writing and transmitting (compare the size of a non-compressed TIFF and a decent quality jpg).

After seeing the edited question, some additional aspects:

If you get handled an in-memory image, it will be in the form of one or more arrays. At that point, the original file format shouldn't play a role anymore. I'll assume you get handled your data with 8 bits/channel.

But you will have to know if you have a processed image or a raw image, as there are two important differences between those:

raw images typically have 1 colour per pixel, and the pixels are usually arranged in a Bayer array with 2 green, 1 red and 1 blue pixel per square of 4 pixels. The values are proportional with the scene intensity (except very low and very high values).
processed images can be arranged as a 2D array of records containing 3 numerical values, or as colour planes (3 2D arrays, one for each of R, G, B). In addition, the values usually are not proportional with the scene intensities. Worse, the exact relation between pixel values and scene intensities depends on the processing the image has had. And the balance between the colours has been adjusted to correspond to the response of the human eye (White Balance, red and blue are amplified relative to the green).

So if you get a raw image with 3 colour values per pixel, that raw image has had some treatment already (at least either demosaicing, or simple binning of 4 raw pixels to 1 image pixel). Whether that's acceptable, will depend on your application.

Why are image formats different if an image is just pixel values?

2 Answers

Your Answer

Related Questions