Philip Rideout

OpenGL Bloom Tutorial

  1. Quick-n-Dirty Box Filter
  2. Gaussian Filter
  3. Exploit Separability
  4. Exploit Hardware Filtering
  5. HDR Bloom
  6. Demo Code

Quick-n-Dirty Box Filter

For a crude but easy effect, draw the bright portions of the scene (eg, light sources) into an FBO, then downsample it using GL_LINEAR minification several times. For the final render, simply combine the original scene with the downsampled FBO's.

In the following example, the original scene is 128 x 128 and is downsampled 3 times. This requires 4 framebuffer objects. The original scene is shown in the upper-left and the final rendering in the upper-right.

OpenGL screenshot
generated with cheap.c

Note that the upper-left FBO requires a depth attachment but all the others don't. To save graphics memory, I recommend paying attention to which FBO's are used for 3D rendering and which ones are used only for 2D image-processing. Here's how to create a FBO using GL_EXT_framebuffer_object:
void phCreateSurface(PHsurface *surface, GLboolean depth)
{
    GLenum internalFormat = GL_RGBA;
    GLenum type = GL_UNSIGNED_BYTE;
    GLenum filter = GL_LINEAR;
 
    // create a color texture
    glGenTextures(1, &surface->texture);
    glBindTexture(GL_TEXTURE_2D, surface->texture);
    glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, surface->width, surface->height, 0, GL_RGBA, type, 0);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, filter);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, filter);
    glBindTexture(GL_TEXTURE_2D, 0);
    phCheckError("Creation of the color texture for the FBO");
 
    // create depth renderbuffer
    if (depth) {
        glGenRenderbuffersEXT(1, &surface->depth);
        glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, surface->depth);
        glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT24, surface->width, surface->height);
        phCheckError("Creation of the depth renderbuffer for the FBO");
    } else {
        surface->depth = 0;
    }
 
    // create FBO
    glGenFramebuffersEXT(1, &surface->fbo);
    glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, surface->fbo);
    glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_2D, surface->texture, 0);
    if (depth)
        glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, surface->depth);
    phCheckFBO();
    glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);
    phCheckError("Creation of FBO");
}
Notice the calls to two error-checking routines, phCheckError() and phCheckFBO(). It's always good practice to check your GL error state, but it's even more critical for FBO's due to the wide variation in hardware support for various formats. phCheckError() uses glGetError() while phCheckFBO() uses glCheckFramebufferStatusEXT().

Gaussian Filter

The Gaussian filter produces a much more pleasing result than the box filter. It works by sampling a local neighborhood of texels and producing a weighted average. The size of the neighborhood is called the kernel size. The weights are based on the bell curve:

cairo-generated image
generated with gauss.c (uses cairo)


As a mathematical aside, the bell curve has some cool properties:
  • The area under the curve is 1.
  • If you flip a huge number of coins, then the apex of the curve represents the likelihood that 50% of your coins will land on heads.
  • As you compute more and more rows of Pascal's triangle, you approach the curve.
That last bullet gives us an easy way to compute our weights. To generate Pascal's triangle, write it out as a sequence of rows. Each number is the sum of two other numbers: the number directly above, and the number above and to the left. It's like this:

cairo-generated image
generated with pascal.c (uses cairo)

Incidentally, odd numbers are highlighted to show how this is related to the Sierpinski fractal. Is that cool or what? But I digress...

To determine the weights of an n x n kernel, select the row that has n numbers in it, then convolve it. This basically means do a vector multiply with its own transpose. For example, the 5 x 5 kernel weights are determined like this:
paint shop pro image
You can supply these values to your shader as uniforms. Be sure to normalize the values on the CPU by dividing each weight by the sum of all weights. In the following example, we use a kernel size of 5 x 5. The original scene is 128 x 128 and is downsampled 3 times. This requires 8 framebuffer objects.

OpenGL screenshot
generated with naive.c


Exploit Separability

The previous example used 25 texture lookups per fragment. That's crazy! Better performance can be achieved if we reduce the number of texture lookups. To accomplish this, we can split the 25 x 25 filter back into the original 5 x 1 and 1 x 5 filters. We'll use two passes: first a horizontal pass, then a vertical pass.

The following example appears to use 12 FBO's, but actually uses only 8; two sets of FBO's are "ping-ponged".

OpenGL screenshot
generated with separable.c


Exploit Hardware Filtering

Believe it or not, we can compute a 5 x 5 filter with less than 5 texture lookups. That's crazy talk, right? Nope, it's actually quite simple! First, visualize the filtering of a yellow texel during the horizontal pass like so:
cairo-generated image
generated with filter.c (uses cairo)

The problem with the above representation is that it assumes we're using GL_NEAREST filtering. If we use GL_LINEAR, then it's more accurate to visualize each texel as a gradient, kinda like this:
cairo-generated image
generated with filter.c (uses cairo)

Now comes the sneaky part: we can sample between the texel centers! If we choose the offset carefully, we can combine 2 samples into 1 sample, like this:

cairo-generated image
generated with filter.c (uses cairo)

Whoa! Is that cool or what? Only three samples! Let's jazz up our demo scene a little before applying this technique:

OpenGL screenshot
generated with sneaky.c


HDR Bloom

High-dynamic range imaging is a sweet feature of newer graphics hardware. Color values no longer need to be clamped to [0,1]. Unfortunately most displays are still LDR, but bloom is one way of "faking" super-brightness by simulating the bounce effect that happens inside a camera (or your eye).
In our example, we apply unclamped lighting on a sphere to generate an HDR image. The specular highlight is super-bright, so that's the portion that we want to bloom. The first step is removing low image intensities. That's how we get from the upper-left FBO to the FBO on its right.

OpenGL screenshot
generated with hdr.c

The upper-left FBO uses a half-float internal format (16 bits per component) but the other 8 FBO's can use a plain ol' integer-based format. (again, it's 8 rather than 12 because we can ping-pong between two sets of FBO's)

Here's how to create a half-float FBO using GL_ARB_half_float_pixel:
void phCreateFloatSurface(PHsurface *surface, GLboolean depth)
{
    GLenum internalFormat = GL_RGBA16F_ARB;
    GLenum type = GL_HALF_FLOAT_ARB;
    GLenum filter = GL_NEAREST;
 
    // create a color texture
    glGenTextures(1, &surface->texture);
    glBindTexture(GL_TEXTURE_2D, surface->texture);
    glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, surface->width, surface->height, 0, GL_RGBA, type, 0);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, filter);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, filter);
    glBindTexture(GL_TEXTURE_2D, 0);
    phCheckError("Creation of the color texture for the FBO");
 
    // create depth renderbuffer
    if (depth) {
        glGenRenderbuffersEXT(1, &surface->depth);
        glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, surface->depth);
        glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT24, surface->width, surface->height);
        phCheckError("Creation of the depth renderbuffer for the FBO");
    } else {
        surface->depth = 0;
    }
 
    // create FBO
    glGenFramebuffersEXT(1, &surface->fbo);
    glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, surface->fbo);
    glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_2D, surface->texture, 0);
    if (depth)
        glFramebufferRenderbufferEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, surface->depth);
    phCheckFBO();
    glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);
    phCheckError("Creation of FBO");
}
Note that we use GL_NEAREST instead of GL_LINEAR. Using linear with a float buffer is often detrimental to performance (or unsupported). However our example still uses linear filtering for all the non-float FBO's.

Well, that about wraps it up. I invite you to check out the source code below. I used plain old C and OpenGL 2.0, so it should be fairly portable. You can copy it, mutilate it, or use it however you want. Happy blooming!

Demo Code

bloom.tar.gz 164k Unix EOL chars; also includes cairo source for 2D diagrams
bloom.zip 641k Windows EOL chars; also includes win32 binaries