Journal of Computer Graphics Techniques 2020
When developing a new renderer we usually want a way to check if it was implemented cor- rectly. Conventionally this is done by comparing to the output of a reference implementation. However, such tests require a large number of samples to be reliable, and sometimes are un- able to reveal very subtle differences that are caused by bias but overshadowed by random noise. We propose using a statistical test, Welch’s t-test, which reliably finds small bias even at low sample counts. Welch’s t-test is an established method in statistics to determine if two sample sets have the same underlying mean, based on sample statistics. We adapt it to test whether two renderers converge to the same image, i.e., the same mean per pixel or pixel re- gion. We also present two strategies for visualizing and analyzing the test’s results, assisting us in localizing especially problematic image regions and detecting biased implementations with high confidence at low sample counts both for the reference and tested implementation.