This week I want to dive into a rather technical – but very important – aspect of rendering in Privacy: visibility culling. Visibility culling is the term game developers use for the collection of techniques that together decide which elements of the game world need to be drawn to render the current frame. Since there is a cost to everything we draw, by far the biggest impact on rendering performance is made by cleverly and quickly deciding what we can leave out.
Read on to find out which techniques I use in Privacy to ensure the game is able to render at 60 frames per second.
The first technique is called frustum culling and thanks to VPRO’s excellent documentary on the making of Horizon Zero Dawn*, you probably already know how this part works:
*The video is blocked in the Netherlands, but you can watch it NPO Start instead.
The camera frustum is a volume that describes everything the camera can see. It’s shaped like a square pyramid with the top chopped off (hence the name). For every frame rendered, the engine compares the bounding box of every model in the world against the current camera frustum and only renders the models for which that bounding box overlaps. Without frustum culling, the engine is not able to maintain a decent frame rate.
Instancing is not technically part of visibility culling, but I want to mention it here because it is relevant for the next step. In the most basic terms, instancing means bundling together draw commands for models that appear more than once in a frame. Instead of sending out separate draw commands to draw the same rock five times, I send a single draw command that tells the graphics card to draw ‘Rock04’ in five places. Enabling instancing in Privacy improved the framerate by about 10%.
Depth Sorting & Pre-pass
In order to reduce overdraw, I sort the models that remain after frustum culling from front to back. When a model appears multiple times in the frame, I use the closest model as the reference for sorting. Rendering the models from front to back helps to ensure that the most expensive part of rendering – the pixel shader – isn’t called for pixels that are later covered up by other models.
Furthermore, I use a depth-only pre-pass to completely remove overdraw when the final frame is rendered. After the pre-pass, the depth buffer is filled with the nearest depth at every pixel, so that when I render the final shaded image my super expensive all singing and dancing fancy pants pixel shader runs exactly once for every pixel on screen. Taken together, the sorting and pre-pass shave off a good 20 percent of the rendering time.
The final piece of the puzzle is a process called occlusion culling. While the previous steps combined have greatly reduced the workload of the graphics card, we’re still doing far too much work. To illustrate what I mean, consider the following image where the player is standing in the hallway and looking into Daryl’s room.
Now, let’s close the door:
The workload sent to the graphics card for both of these images is exactly the same, except for the position and rotation of the door model, because the camera frustum hasn’t changed. Everything you see in the image with the door open is also sent to the graphics card when the door is closed. Thanks to my front-to-back sorting, the door is rendered first and everything behind is then rejected by the depth buffer… but the graphics card still has to process all those invisible models. Worse still, I’m rendering a shadow map that you’ll never even see as long as the door is closed. Such a waste!
There are numerous solutions to deal with occlusion, which in this context means that one model in the view wholly or partially obscures another model. For Privacy, I’ve chosen to implement a variation of a technique called (hold your breath) Hierarchical Z-Buffer Occlusion Culling or HiZ.
In my case, a software rasterizer is used to render a lower resolution depth-only view of the world from the current camera position. Once again, I’m rendering the objects nearest to the camera first. However, my software rasterizer has a secondary function: it can test the bounding box of the model against the (in-progress) HiZ buffer to determine if a model would be visible. Only the models determined to be visible in this way are then sent to the graphics card.
Mixing it up
To be most effective, the techniques outlined above aren’t implemented as separate steps but they interlock in several ways. The world of Privacy is divided into smaller sections and after the section nearest to the player has been passed through the occlusion rasterizer, the engine can then test the bounding boxes of the other visible sections to determine whether it needs to perform fine grained frustum culling on all the objects in a particular section. Similarly, the occlusion rasterizer may reject instances of models that are obscured, while the remaining instances can still be batched up and sent in one go.