Why does stacking many JPEG webcam frames reveal an 8×8 grid pattern?
Asked 12/8/2016
3 views
2 answers
0
I averaged about 200,000 frames from a webcam pointed at a very dark, uniform scene. The camera and scene were static, and individual frames are extremely underexposed and noisy. After auto-equalising the stacked result, I can see the lens mount shading as expected, but I also see a fine grid across the whole image that looks aligned to 8×8 JPEG blocks.
I expected stacking so many frames to smooth out random noise. Why does the 8×8 grid remain so visible instead of averaging away?
Originally by Photography Stack Exchange contributor. Source · Licensed CC BY-SA 4.0
Photography Stack Exchange contributor
9y ago
2 Answers
4
TL;DR: The edge-discontinuity errors between 8×8 blocks introduced by the DCT compression used in JPEGs is magnified by your image stack, which is why the "grid" is so prominent.
If the frame stacking /smoothing process had failed or not used enough frames, I would have expected 8x8 blocks of varying colours, not just their edges.
You appear to have a slight misunderstanding about compression used in JPEG images. You are correct that the compression in (most) JPEGs is done on blocks of 8×8 pixels. Your expectation would be true if the compression were merely as simple as just replacing each 8×8 block with the average pixel value for each block (trivially achieving 64:1 "compression" ratio).
Incidentally, this would be identical to simply downsampling your image(s) by 8 for each dimension, then "blowing up" the downsampled image by a factor 8 without interpolation. This would produce poor-quality images, which is why it isn't done.
JPEG compression takes advantage of the discrete cosine transform (DCT) of each 8×8 block. Like any Fourier-like transform, the DCT converts spatial information (i.e., images) (or time-domain information, such as audio) into frequency information. The DCT is favored in JPEG compression over other transforms (such as the discrete sine transform (DST), or discrete Fourier transform (DFT)) for 2 reasons:
The DCT coefficients (the result of performing a discrete cosine transform over data) settle quicker to near zero than other transforms; and
The DCT "behaves well" at the edges of the data sample. Qualitatively, this means that the DCT introduces the least edge-discontinuity between neighboring pixel blocks. However, while the edge-discontinuity is small (compared to other transforms), the slope of signal change at either side of a block boundary is not continuous.
Mathematically, a discrete cosine transform would result in fractional numbers that would require high-precision math to maintain accuracy. The "discrete" part of DCT means that the coefficients have been discretized into integer values that can be stored in a byte. This discretization is part of the absolute error between a digital image and its JPEG equivalent.
The other part of the error, and where the compression comes in, is to truncate, literally throw away, the high-frequency DCT coefficients. This is analogous to representing the number ⅓ in decimal: 3 tenths (0.3), plus 3 hundredths (0.03), plus 3 thousandths (0.003), ad infinitum. This is 0.3333... never ending. For our purposes, we say 0.333 is a decent approximation (the error is one part in 1,000, or 0.1%).
While the error is small, nevertheless it is there. Specifically, even though the DCT is better at the edges than other Fourier-like transforms, the error is most visible at the edges. This is what you're seeing in your composite image.
Stacking / averaging has the property of more-or-less eliminating random (stochastic) noise (i.e., non-biased sensor noise), because the noise has equal probability of being positive or negative. When you throw 200,000 fair dice, you will see that statistically, all numbers come up roughly equally. This is why you don't see magnified sensor noise in your composite image.
However, biased data, whether there by its very nature (i.e., the image of your stepped lens mount), or introduced externally, is magnified. The fact that your images were all JPEGs, meaning each frame was DCT-compressed, is magnified in your stack.
The reason you are seeing a pronounced grid is because the very nature of DCT compression, due to quantization error and low-pass filtering of spatial frequencies a block at a time, magnifies the slight-but-cumulative edge discontinuity errors between 8×8 blocks.
Originally by user11924. Source · Licensed CC BY-SA 4.0
user11924
9y ago
0
Generated from our catalog & community — verify before relying on it.
The grid is most likely JPEG compression artifact, not random sensor noise.
JPEG encodes images in 8×8 pixel blocks using a DCT transform. This process can introduce small discontinuities at block boundaries. In a single dark, noisy frame those artifacts may be buried in the noise, but when you average 200,000 frames, the random noise cancels out while any repeatable structure stays.
Because the camera, scene, and JPEG processing are all fixed from frame to frame, the block-boundary errors occur in the same places every time. Stacking therefore reinforces the 8×8 pattern instead of removing it. That’s why you see the grid lines so clearly after equalisation.
You wouldn’t necessarily get whole blocks of different flat colors; JPEG isn’t just replacing each 8×8 block with one average value. It preserves more detail than that, but can still leave subtle boundary errors that become obvious after extreme averaging and contrast stretching.
If you want this to disappear, use uncompressed or raw output if possible, or capture before JPEG compression is applied.
Recommended products
Shape Remote Extension Handle with 360deg. Rotation for Canon 18-80mm Lens
$228.89
View →
Sylvania CAW/CAX 50W/120V Projector Lamp
$17.56
View →
Saramonic USB-CP30 USB Mono Output Connector Cable for Wireless Mic Systems
$49.00
View →
Davis and Sanford Fusion Aluminum Tripod with 3-Way Fluid Head
$79.00
View →
UniqueBot
AI9y ago
Your Answer
Related Questions
Why would averaging 10,000 webcam frames cause a blue color shift?
Will averaging several high-ISO shots match the noise of one longer low-ISO exposure?
Will a flat subject photographed straight-on have the same scale at the center and edges?
Can you create a better still photo by averaging frames from a video?
Why can’t cameras blur each pixel individually to prevent moiré?