Paper Reading Notes (1): 3D Gaussian Splatting for Real-Time Radiance Field Rendering

Link to paper

Main Achievements

  • 30fps novel view synthesis at 1080p

Key Elements

  • 3D Gaussians for scene representation
  • Anisotropic covariance optimization for accurate scene representation
  • Fast, visibility-aware splatting algorithm, using a tile-based and sorting renderer

Core Algorithm

Differentiable 3D Gaussian Splatting

3D Guassians are defined as ellipsoids at $ \mathbf x $ with full covariance matrix $ \Sigma $:

$$ G(\mathbf x) = e^{-\frac{1}{2}\mathbf x^T \Sigma^-1 \mathbf x} $$ and this can be alternatively represented with an epplisoid with rotation: $$ \Sigma = \mathbf R \mathbf S \mathbf S^T \mathbf R^T $$ so it can be efficiently represented with a 7-element vector (3 for scaling and 4 for quaternion). Each 3D Gaussian also uess sphereical harmonics to represent its color component, which can be written as the SH coefficient vector $ \mathbf c $. To summerize, each 3D Gaussian can be represented with the tuple $ (\mathbf x, \mathbf r, \mathbf q, \mathbf c, \alpha) $ where $ \alpha $ is the opacity and $ \mathbf r $ is the rotation vector.

Adaptive Density Control

  • Initial 3D Gaussians are formed with SfM points;
  • Adaptive control focuses on "under reconstruction" (not enough 3D Gaussians for the detailed geometric features) and "over reconstructions" (Gaussians cover too much area?) regions;
  • Densify Gaussians based on view-space gradients: clone in "under-reconstruction" regions and split in over reconstruction regions;

Fast Differentiable Rasterization

Tile-based rendering and pre-sorting all primitives for the entire image:

  • Split the screen into 16x16 tiles
  • Cull 3D Gaussians against view frustum and tiles
  • Instantiate Gaussians and assign them to tiles using depth and tile ID as key
  • Sort Gaussians using fast GPU Radix sort
  • For each tile, create a list of Gaussians and rasterize each tile independently
  • Warm up the computation at low resolution then upsample twice
  • Gradual optimization for SH bands: start from zero-th order and then add more bands

Unsorted Thoughts

  • Very efficient choice for scene representation is key to enabling real-time rendering at 1080p; at the same time the representation may not be the most suitable for digital humans.
  • How do we enable it for motion? One recent attempt is this work.
  • The Guassians placement/generation algorithm is still quite empirical. Could we derive that from image features automatically? Maybe even make it trainable?

References