OpenGL Bloom Tutorial
Quick-n-Dirty Box Filter
For a crude but easy effect, draw the bright portions of the scene (eg,
light sources) into an FBO, then downsample it using GL_LINEAR
minification several times. For the final render, simply combine the
original scene with the downsampled FBO's.
In the following example, the original scene is 128 x
128 and is downsampled 3 times. This requires 4 framebuffer objects.
The original scene is shown in the upper-left and the final rendering
in the upper-right.
|
generated with cheap.c
|
Note that the upper-left FBO requires a depth attachment but all the
others don't. To save graphics memory, I recommend paying attention to
which FBO's are used for 3D rendering and which ones are used only for
2D image-processing. Here's how to create a FBO using
GL_EXT_framebuffer_object:
void phCreateSurface(PHsurface *surface, GLboolean depth)
{
GLenum internalFormat = GL_RGBA;
GLenum type = GL_UNSIGNED_BYTE;
GLenum filter = GL_LINEAR;
// create a color texture
glGenTextures(1, &surface->texture);
glBindTexture(GL_TEXTURE_2D, surface->texture);
glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, surface->width, surface->height, 0, GL_RGBA, type, 0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, filter);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, filter);
glBindTexture(GL_TEXTURE_2D, 0);
phCheckError("Creation of the color texture for the FBO");
// create depth renderbuffer
if (depth) {
glGenRenderbuffersEXT(1, &surface->depth);
glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, surface->depth);
glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT24, surface->width, surface->height);
phCheckError("Creation of the depth renderbuffer for the FBO");
} else {
surface->depth = 0;
}
// create FBO
glGenFramebuffersEXT(1, &surface->fbo);
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, surface->fbo);
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_2D, surface->texture, 0);
if (depth)
glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, surface->depth);
phCheckFBO();
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);
phCheckError("Creation of FBO");
} Notice the calls to two error-checking routines, phCheckError() and phCheckFBO(). It's always good practice to check your GL error state,
but it's even more critical for FBO's due to the wide variation in hardware support for various formats.
phCheckError() uses glGetError() while phCheckFBO() uses glCheckFramebufferStatusEXT().
|
Gaussian Filter
The Gaussian filter produces a much more pleasing result than the box
filter. It works by sampling a local neighborhood of texels and
producing a weighted average. The size of the neighborhood is called
the kernel size. The weights are based on the bell curve:
|
generated with gauss.c (uses cairo)
|
As a mathematical aside, the bell curve has some cool properties:
- The area under the curve is 1.
- If
you flip a huge number of coins, then the apex of the curve represents
the likelihood that 50% of your coins will land on heads.
- As you compute more and more rows of Pascal's triangle, you approach the curve.
That last bullet gives us an easy way to compute our weights. To
generate Pascal's triangle, write it out as a sequence of rows. Each
number is the sum of two other numbers: the number directly above, and
the number above and to the left. It's like this:
|
generated with pascal.c (uses cairo)
|
Incidentally, odd numbers are highlighted to show how this is related to the Sierpinski fractal. Is that cool or what?
But I digress...
To determine the weights of an n x n kernel, select the row that has n numbers in it, then convolve it. This basically means do a vector multiply with its own transpose.
For example, the 5 x 5 kernel weights are determined like this:
|
|
You can supply these values to your shader as uniforms. Be sure to
normalize the values on the CPU by dividing each weight by the sum of
all weights. In the following example, we use a kernel size of 5 x 5.
The original scene is 128 x 128 and is downsampled 3 times. This
requires 8 framebuffer objects.
|
generated with naive.c
|
Exploit Separability
The previous example used 25 texture lookups per fragment. That's
crazy! Better performance can be achieved if we reduce the number of
texture lookups. To accomplish this, we can split the 25 x 25 filter
back into the original 5 x 1 and 1 x 5 filters. We'll use two passes:
first a horizontal pass, then a vertical pass.
|
The following example appears to use 12 FBO's, but actually uses only 8; two sets of FBO's are "ping-ponged".
|
generated with separable.c
|
Exploit Hardware Filtering
Believe it or not, we can compute a 5 x 5 filter with less than 5 texture lookups. That's crazy talk, right?
Nope, it's actually quite simple! First, visualize the filtering of a yellow texel during the horizontal pass like so:
|
generated with filter.c (uses cairo)
|
The problem with the above representation is that it assumes we're using GL_NEAREST filtering.
If we use GL_LINEAR, then it's more accurate to visualize each texel as a gradient, kinda like this:
|
generated with filter.c (uses cairo)
|
Now comes the sneaky part: we can sample between the texel centers! If we choose the offset carefully, we can combine 2 samples into 1 sample, like this:
|
generated with filter.c (uses cairo)
|
Whoa! Is that cool or what? Only three samples! Let's jazz up our demo scene a little before applying this technique:
|
generated with sneaky.c
|
HDR Bloom
High-dynamic range imaging is a sweet feature of newer graphics hardware.
Color values no longer need to be clamped to [0,1]. Unfortunately most displays are still LDR, but
bloom is one way of "faking" super-brightness by simulating the bounce effect that happens inside a camera (or your eye).
|
In our example, we apply unclamped lighting on a sphere to generate an
HDR image. The specular highlight is super-bright, so that's the
portion that we want to bloom. The first step is removing low image
intensities. That's how we get from the upper-left FBO to the FBO on its
right.
|
generated with hdr.c
|
The upper-left FBO uses a half-float internal format (16 bits per
component) but the other 8 FBO's can use a plain ol' integer-based
format. (again, it's 8 rather than 12 because we can ping-pong between
two sets of FBO's)
Here's how to create a half-float FBO using GL_ARB_half_float_pixel:
void phCreateFloatSurface(PHsurface *surface, GLboolean depth)
{
GLenum internalFormat = GL_RGBA16F_ARB;
GLenum type = GL_HALF_FLOAT_ARB;
GLenum filter = GL_NEAREST;
// create a color texture
glGenTextures(1, &surface->texture);
glBindTexture(GL_TEXTURE_2D, surface->texture);
glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, surface->width, surface->height, 0, GL_RGBA, type, 0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, filter);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, filter);
glBindTexture(GL_TEXTURE_2D, 0);
phCheckError("Creation of the color texture for the FBO");
// create depth renderbuffer
if (depth) {
glGenRenderbuffersEXT(1, &surface->depth);
glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, surface->depth);
glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT24, surface->width, surface->height);
phCheckError("Creation of the depth renderbuffer for the FBO");
} else {
surface->depth = 0;
}
// create FBO
glGenFramebuffersEXT(1, &surface->fbo);
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, surface->fbo);
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_2D, surface->texture, 0);
if (depth)
glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, surface->depth);
phCheckFBO();
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);
phCheckError("Creation of FBO");
}
Note that we use GL_NEAREST instead of GL_LINEAR. Using linear with a
float buffer is often detrimental to performance (or unsupported).
However our example still uses linear filtering for all the non-float
FBO's.
Well, that about wraps it up. I invite you to check out the source code below.
I used plain old C and OpenGL 2.0, so it should be fairly portable.
You can copy it, mutilate it, or use it however you want. Happy blooming!
|
Demo Code
bloom.tar.gz |
164k |
Unix EOL chars; also includes cairo source for 2D diagrams |
bloom.zip |
641k |
Windows EOL chars; also includes win32 binaries |
|
|