Split-free occlusion sorting

Overview
Polygon-aligned binary space partition
Split-free alternative to BSP trees
Conclusion

Overview

Visibility sorting on the CPU is an interesting thing to do in some situations, such as:

When you don’t have a Z buffer. (see my svg3d library)
When you’re rasterizing a scene with complex transparency.
When you are using a Z buffer, but want to use an optimal draw order.

In its initial implementation, my svg3d library sorts polygons according to the Z value of each centroid. This is not at all robust. Here’s an example of a case where it breaks down. The long thin triangles in the avocado should all be rendered before the small triangles in the pit, but this failed to occur in a few places.

Of course, in some situations, no valid ordering exists:

Sometimes this is called an occlusion cycle, which refers to the occlusion state being represented as a directed graph whose topology changes as the camera moves around. Each node in the graph is a polygon, and if A→B then A obscures B. You can also annotate edges with a flag that denotes complete occlusion, which is useful for occlusion culling, which is outside the scope of this article.

Note that scenes can have occlusion cycles when any two polygons intersect each other at an angle. This is perhaps more common than what’s depicted in the above figure. Another thing to notice is that occlusion is neither commutative nor transitive, although complete occlusion is transitive.

The above figure (and intersecting polygons) are possible to render with the painter’s algorithm only by first splitting up polygons, thus breaking the occlusion cycles. It turns out that splitting polygons is useful for another reason: BSP trees.

Polygon-aligned binary space partition

The most well-established way of finding a visibility ordering is to build a polygon-aligned BSP. That’s exactly what games like Doom did before the era of Z buffering.

Polygon-aligned BSP trees are great for the visibility problem: you build the tree only once, and as the camera moves you simply change the traversal. To walk the tree, start at the root plane and do the following:

Determine the side of the dividing plane that the camera is in.
Recurse on the opposite side of the camera.
Emit the polygon(s) in the current node.
Recurse on the camera side.

The above traversal will emit a perfect back-to-front ordering.

Some of the polygons in the scene are chosen to be dividing planes (which can be done at random in an initial implementation). Note that a single BSP node might contain a group of polygons if they happen to be coplanar. For example, your scene might have a tessellated floor mesh. Since the floor polygons are coplanar, they might all belong to a single BSP node. Every polygon in the scene has one of the following relationships to a particular dividing plane:

Completely in the plane.
On the positive side of the plane.
On the negative side of the plane.
Crossing the plane.

If a polygon crosses the plane, then your BSP builder must execute a Solomonic Judgment by splitting the polygon in twain, and each half must be rendered separately. Even when the scene has no occlusion cycles, you still need to draw the two halves of split polygons separately, otherwise the scene will look incorrect. (This was not obvious to me at first.)

Perfect BSP trees are hierarchies that do not require splitting polygons. It turns out these are super difficult to find [1]. In fact, a perfect BSP is impossible for many scenes, even if the scene has acyclic visibility. Consider the following scene. This is a situation where it is possible to build a valid occlusion ordering, but is not possible to build a perfect BSP.

Split-free alternative to BSP trees

My svg3d library cannot perform polygon splitting because it needs to honor the user’s expectations for stroking and filling. Since it emits SVG, the only way it could perform seamless splitting would be to simulate stroke using thin secondary paths. I want to avoid simulating stroke since it would make the library much more complex.

I was not able to find much research about split-free visibility determination [2] so I wanted to come up with my own solution that would be easy to implement and would at least avoid the quadratic time complexity that a brute force solution would have. I want the ordering to be as correct as possible; if it has cycles, so be it, but it should at least be able to handle scenes like the avocado robustly.

The idea I had is to borrow from physics engines and basically perform collision detection between polygons and their occlusion shadows, which are truncated prisms that extend along the viewing axis. By leveraging temporal coherence, I think reasonable performance could be achieved in the average case, much like how physics engines can use the sweep-and-prune algorithm efficiently when there are only a few changes from one frame to the next.

As an example, let’s study a scene that has 10 semitransparent quads at different orientations:

Looking at the scene from the side is a bit more useful. Here’s a side view that includes a depiction of the occlusion shadows.

To make life easier, we can perform visibility sorting in clip space instead of NDC or window space. In other words, we can examine the polygons after they’ve been transformed by the projection matrix and ignore the W component of the homogeneous coordinate. Now it looks like this:

The first thing to notice is that the prisms fit nicely into axis-aligned bounding boxes. (in 2D the prisms are actually trapezoids)

At a high level our algorithm will look like:

Build an occlusion graph represented by an array of ordered pairs.
Emit a valid ordering by traversing the ordered pairs in some way.

Building the occlusion graph

For step 1, we need to think about the rather huge collection of AABB’s. Two key insights arise:

We should consider only one axis at a time. e.g. Only think about the projections of the AABB’s onto the Z axis.
For a particular axis, we should try throwing all the interval points into a sorted container, mixing together the min points and max points.

For the latter bullet, it helps to define a tiny struct for the interval points:

struct Endpoint {
    unsigned int polygon_index : 31;
    bool min_or_max : 1;
    float value;
};

By sweeping over the bag of endpoints in order of increasing Z, we have the humble beginnings of an algorithm:

Z_Endpoints = Gather_and_Sort_Z_Endpoints()
Active_Prism_Z_Endpoints = empty set
Active_Polygon_Z_Endpoints = empty set
Occlusion_Pairs = empty array

for A in Z_Endpoints:

    # When the sweep hits the right edge of A, remove it from the active set.
    if A.min_or_max == max:
        remove A from Active_Polygon_Z_Endpoints
        continue

    # Otherwise, the sweep line is on the left edge of A...

    # First, determine if A is occluded by any active polygons.
    T = getPolygon(A.polygon_index)
    for B in Active_Prism_Z_Endpoints:
        P = getPrism(B.polygon_index)
        if aabb_overlap(P, T) and sat_overlap(P, T):
            insert (B→A) into Occlusion_Pairs

    # Next, determine if A occludes any active polygons.
    P = getPrism(A.polygon_index)
    for B in Active_Polygon_Z_Endpoints:
        T = getPolygon(B.polygon_index)
        if aabb_overlap(P, T) and sat_overlap(P, T):
            insert (A→B) into Occlusion_Pairs

    insert A into Active_Prism_Z_Endpoints
    insert A into Active_Polygon_Z_Endpoints

One issue with the above algorithm is that it approaches quadratic complexity as the active set of prisms grows without bound. One way to mitigate this is to store two additional active sets sorted by X rather than Z. This allows the inner loops to iterate over a constrained range.

Separating axis theorem

Note the two prism-polygon intersection functions in the above pseudocode: aabb_overlap and sat_overlap. The former’s implementation is obvious, but the latter uses the separating axis theorem. In 2D, the test can be stated as:

If n-sided convex polygon A and m-sided convex polygon B do not intersect, then they must have at least one separating axis that is parallel to one of their sides. The separating axis is one of (n + m) infinite lines that can be projected to, such that vertices projected from A and vertices projected from B do not have overlapping intervals.

The above also works if one of the shapes is a line segment, which is a degenerate polygon whose sole “side” is parallel to the line direction.

The 3D case is a bit more interesting:

Two nonintersecting convex polyhedra A and B can be separated by a plane that is parallel to one of their faces, or to one of their cross-product edge combinations. If A has F0 faces + E0 edges, and B has F1 faces + E1 edges, then the number of axes is (F0 + F1) + E0 * E1. This can be reduced when certain faces and edges are known to be parallel. For example, cuboids have 15 separating axes because F0=F1=3 and E0=E1=3.

The above also works if one of the shapes is a triangle, which is a degenerate polyhedron that has one face and three edges.

The projection operation is just a dot product, so the pseudocode looks like:

def sat_overlap(polyhedron A, polyhedron B):
    for axis in separating_axes:

        mina = inf, maxa = -inf
        for vertex in A:
            d = dot(vertex, axis)
            mina = min(mina, d)
            maxa = max(maxa, d)

        minb = inf, maxb = -inf
        for vertex in B:
            d = dot(vertex, axis)
            minb = min(minb, d)
            maxb = max(maxb, d)

        if maxa < minb return False
        if mina > maxb return False

    return True

As a side note, temporal coherence can be exploited by stashing the separating axis and ensuring that it is the first axis that gets tested in subsequent frames.

Traversing the occlusion graph

Given the list of occlusion pairs that was generated in the previous step, the task of generating a back-to-front ordering is fairly simple. The following algorithm proposes one way of doing it. It’s not at all clever but it is O(n) if the scene is fairly coherent with the previous frame.

First compute a number_of_occludees field for each polygon by iterating once over the pairs and incrementing the number_of_occludees for the appropriate polygon along the way. Note that the lowest number_of_occludees is not necessarily zero (consider a scene consisting solely of intersecting triangles).
Next, sort the primitives according to their number_of_occludees, or use the ordering from the previous frame to leverage temporal coherence. Call this the source list.
Emit the first occluder in the source list, pop it off, and decrement the number_of_occludees of its occluders.
Ensure that the source list stays sorted by swapping the occluders into new positions if necessary. This could be an insertion sort. Alternatively, you could store the source list in a min-heap (aka priority_queue in C++).
Repeat steps 3 and 4 until the source list becomes empty.

Conclusion

To find a valid occlusion ordering, the simplest solution is to use a classic polygon-aligned BSP, even if splitting is difficult to live with. Splitting is often something you’ll need to support anyway, since it’s otherwise impossible to create a valid occlusion ordering for scenes that have occlusion cycles.

If you absolutely cannot support splitting, then you might consider the approach that I’ve outlined on this page, or check out the method described in [2].

Bibliography

[1] De Berg, Mark, Marko M. de Groot, and Mark H. Overmars. “Perfect binary space partitions.” Computational geometry 7, no. 1-2 (1997): 81-91.

[2] Snyder, John, and Jed Lengyel. “Visibility sorting and compositing without splitting for image layer decompositions.” In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp. 219-230. ACM, 1998.

Discuss

Tried to come up with an algorithm that can generate a “fairly correct” occlusion order without splits… turns out it's hard.
— Philip Rideout (@prideout) August 1, 2019