Sampling Techniques and ReSTIR Rendering Algorithm

2024-09-16

This article is some prerequisite knowledge and paper reading notes on the ReSTIR DI algorithm.

About Sampling Techniques

Monte Carlo Integration and Importance Sampling

Monte Carlo integration is a method of estimating integrals through sampling:

$$ \int_x f(x)dx = \frac{1}{N} \Sigma_{i=1}^N f(x_i)/p(x_i), x_i \sim p(x_i) $$

One property is that when the distribution of the samples, p, is close to the shape of f, the variance of the estimate is small; when p is scaled to equal f, the variance becomes 0, meaning the estimate obtained with just one sample is the true value.

For example, with $ f(x) = x, x \in [1,3]$, if we choose p(x) = x/4, then for any Xi, f(Xi)/p(Xi) = Xi / (Xi / 4) = 4, which is the true value of the integral of f(x) over [1,3].

Importance Sampling is a sampling method that strives to make p close to f to reduce the variance of the estimate.

Multiple Importance Sampling (MIS)

Multiple Importance Sampling (MIS) involves using samples from several different distributions p1(x), ..., pn(x). Note that mathematically, it is not equivalent to sampling multiple times from a mixed distribution of p1(x), ..., pn(x) (see Appendix 1).

Generally, assuming each distribution samples ni times, the estimate is

$$ F = \Sigma_{i=1}^n \frac{1}{n_i} \Sigma_{j=1}^{n_i} w_i(X_{ij}) f(X_{ij}) / P_i(X_{ij}) $$

When for any X, $\Sigma_i w_i(X) = 1$, this estimate is unbiased. Proof: Assume $n_i=1$ to simplify the notation,

$$ \begin{align*} E[F] &= \Sigma_{i=1}^n \int_X w_i(X) \frac{f(X)}{P_i(X)} P_i(X) dX \\ &= \Sigma_{i=1}^n \int_X w_i(X) f(X) dX \\ &= \int_X (\Sigma_{i=1}^n w_i(X)) f(X) dX \\ &= \int_X f(X) dX \end{align*} $$

A common choice for $w_i$ is the Balance Heuristic:

$$ w_i(X) = \frac{p_i(X)} {\Sigma_k p_k(X)} $$

Importance Resampling (RIS)

Importance Resampling (Importance Resampling or Resampled Importance Sampling, RIS) allows you to sample from a distribution $\hat{p} (x)$ that may not be normalized. The method is:

Start with a distribution p(x) that is easy to sample from, and obtain M samples x1, x2,..., xM
Assign each sample a weight $w(x_i) = \hat{p} (x_i) / p(x_i)$
Randomly sample one of the samples based on the normalized weights, denoted as y. When M is large enough, the distribution of y is close enough to $\hat{p}(x)$ (for this, p(x) needs to meet certain conditions, see Appendix 2).

In the above steps, p(x) is called the candidate distribution, and $\hat{p} (x)$ is called the target distribution.

Furthermore, it can be used to estimate integrals: Let the estimator F be

$$ F = \frac{f(y)} {\hat{p} (y)} * \frac{\Sigma w(x_i)}{M} $$

The expectation of F is

$$ \begin{align*} E[F] &= E_{x_1,...,x_M} E_y F \\ &= E_{x_1,...,x_M} \Sigma_{i=1}^M p(Y=X_i) * \frac{f(x_i)} {\hat{p} (x_i)} * \frac{\Sigma_j w(x_j)}{M} \\ &= E_{x_1,...,x_M} \Sigma_{i=1}^M \frac{w(x_i)} {\Sigma_j w(x_j)} * \frac{f(x_i)} {\hat{p} (x_i)} * \frac{\Sigma_j w(x_j)}{M} \\ &= E_{x_1,...,x_M} \Sigma_{i=1}^M \frac{w(x_i) f(x_i)}{M \hat{p} (x_i)} \\ &= E_{x_1,...,x_M} \Sigma_{i=1}^M \frac{f(x_i)}{M p(x_i)} \\ &= \frac{1}{M} \Sigma_{i=1}^M E_{x_i} \frac{f(x_i)}{p(x_i)} \\ &= \int_x f(x) \end{align*} $$

Weighted Reservoir Sampling (WRS)

Is it possible to perform weighted sampling of N samples from x1,...,xM (with a weight function w(xi)) using only O(N) additional storage space (i.e., x1,...,xM are input to the algorithm as a stream)?

For the case of N=1, it is indeed possible to process "streaming": only keep a pool of one sample and maintain the sum of weights of the elements processed so far. When processing element xi, replace the sample already in the pool with xi with a probability of

$$ P_i = w(x_i) / \Sigma_{j=1}^i w(x_j) $$

After processing all elements, the sample in the pool is equivalent to the result of weighted sampling. This algorithm is called Weighted Reservoir Sampling (WRS).

ReSTIR Rendering Algorithm

References: [1], [2], [3]

Paper:
"Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting"

Inspired by the RIS and WRS algorithms, the paper proposes the streaming algorithm ReSTIR (REservior-based Spatial Temporal Importance Resampling), which solves the problem of efficiently computing direct lighting in scenes with millions of light sources.

Solving the Rendering Equation for Direct Lighting

The rendering equation that ReSTIR aims to solve is as follows:

$$ L(y,w) = \int_{x\in A} \rho(y, yx \leftrightarrow w) L_e(x\rightarrow y) G(x \leftrightarrow y) V(x \leftrightarrow y) $$

The equation integrates over all light-emitting (excluding reflective) surface areas, with the coordinates of the light-emitting element being x. We consider the light path from the light-emitting element at coordinate x to object y to the camera, calculating the direct lighting at point y (reflected radiance due to direct lighting). Where:

rho is the BSDF (Bidirectional Scattering Distribution Function) at point y on the object, which is the sum of the Bidirectional Reflectance Distribution Function (BRDF) and the Bidirectional Transmittance Distribution Function (BTDF), capable of handling surfaces with transmission like the interface between glass and air. The input directions are the incident direction from the light-emitting element to y and the direction from y to the camera.
Le is the emitted radiance of the element.
G is the inverse square distance between x and y multiplied by the cosine term.
V is the visibility between x and y.

The article aims to guide RIS sampling based on the integrand $f(x) = \rho(x) L_e(x) G(x) V(x)$, using the weighted sum of RGB, i.e., "luminance" as a single-value function for guiding sampling. However, since determining visibility requires tracing a ray, which is costly, the actual guiding sampling uses $\hat{p} (x) = \rho(x) L_e(x) G(x)$ excluding V(x).

At this point, consider two questions:
Q: Why don't the calculations of $L_e(x\leftrightarrow y)$ and $G(x\leftrightarrow y)$ require tracing a ray?
A: First, x is sampled from all surfaces, rather than selecting a direction on the hemisphere at point y and then tracing a ray to find the intersection (i.e., contrary to the usual rendering process). Then, Le and G only relate to the vector xy (its direction and length) and are independent of what happens along the path, so no ray tracing is needed.
Q: In this RIS sampling, what is the candidate distribution? What is the target distribution?
A: The candidate distribution is a distribution that is easy to sample and can support the target distribution, such as a uniform distribution according to the area of the light-emitting sources (just a suggestion, please correct if wrong); the target distribution is $\hat{p} (x)$, which may also be referred to as the "light source distribution".

Streaming RIS with WRS

Using RIS allows the information obtained from multiple samples to be "condensed" into a single sample point and its associated pool weights. This process can be done in a streaming manner thanks to WRS (code copied from the paper's Algorithm 3, with modifications):

class Reservior
  y, w_sum, M = 0, 0, 0
  def update(xi, wi)
    w_sum, M = w_sum + wi, M + 1
    if rand() < wi / w_sum
      y = xi

def RIS(q)
  Reservior r
  for i in 1..M
    generate xi ~ p
    r.update(xi, p_hat(xi)/p(xi))
  return r
def shadePixel(Reservior r, q)
  return f(r.y)/p_hat(r.y) (1/M r.w_sum) # Only trace a shadow ray here, since f(r.y)/p_hat(r.y) is V(r.y)
def render()
  for pixel q in image
    image[q] = shadePixel(RIS(q), q)

Note: The above method is discussed in a chapter of Ray tracing Gems.

Merging RIS Pools for Spatiotemporal Reuse

Notice that the $\hat{p}$ functions of neighboring pixels or frames are likely similar, so we can merge these RIS pools into the pool of the current pixel (sum the weights and draw a representative sample based on the total weight).

This algorithm additionally aims to adjust the weights by a ratio of $ \hat{p}_{q} (r.y) / \hat{p}_{q'} (r.y) $ when applying the pool of q' to q: Let each pool store an additional new variable W: r.W = r.w_sum / p_hat(r.y) / r.M. This way, the adjusted weight sum does not need q'. The second parameter passed to the update function (the weight-adjusted w_sum) can be written in a form independent of q':

r.w_sum * p_hat(q, r.y) / p_hat(q', r.y)
= ( r.W * p_hat(q', r.y) * r.M ) * p_hat(q, r.y) / p_hat(q', r.y)
= r.W * p_hat(q, r.y) * r.M

The reorganized algorithm is Algorithm 4 in the paper:

def combineReservoirs(q, r1..rk)
  Reservior s
  for r in r1..rk
    s.update(r.y, p_hat(q, r.y) * r.W * r.M)
  s.M = r1.M + r2.M + ... + rk.M
  s.W = s.w_sum / p_hat(s.y) / s.M
  return s

In each iteration, update each pixel's pool by merging it with the pools of k surrounding pixels. If n iterations are performed, the computational cost is only O(nk+M), but each pixel thus obtains information from $ O(k^n M)$ samples in the best case, approximately equivalent to samples obtained by p_hat sampling, with the accompanying pdf also correct.

The method for temporal reuse is to merge the current pool with one from all previous pools (determined by Motion Vector?).

Limitations and Extensions of the ReSTIR Algorithm

According to colleagues who have implemented ReSTIR, a common issue in practical use is coherence between pixels: A significantly brighter light source sample may be adopted by the sample pools of several adjacent pixels, resulting in bright and dark band artifacts.

A possible solution is to apply several steps of the Metropolis-Hasting algorithm to the elements in each pixel's sample pool after determining it, to reduce coherence between neighboring pixel samples (how should the shadePixel function be rewritten in this case?).

The original paper's ReSTIR algorithm for computing direct illumination is called ReSTIR DI. It was later extended to support global illumination, called ReSTIR GI, where more variables are stored in the pool. Subsequently, the ReSTIR concept was borrowed for a path reuse algorithm, namely ReSTIR PT (Path-Tracing). You can refer to this blog post.

Appendix

Appendix 1: Combining Samples from Multiple Distributions is Not Equivalent to Sampling Multiple Times from a Mixture Distribution

For example, p1(x) = U[0, 1], p2(x) = U[2, 3]. The following two methods:

Sample x1 from p1(x), sample x2 from p2(x), and shuffle the order.
Sample x1, x2 from ( p1(x)+p2(x) ) / 2.

The joint distribution of x1, x2 obtained is not the same. The former will always yield one sample in the [0, 1] range and one in the [2, 3] range; the latter has a probability of both samples being entirely in the [0, 1] interval.

Appendix 2: Conditions the Candidate Distribution Should Meet

The support of the candidate distribution needs to cover the support of the target distribution. Support can be understood as the "non-zero region of the function" or "the region where samples may appear."