Paper Reading Notes (1): 3D Gaussian Splatting for Real-Time Radiance Field Rendering
Main Achievements
- 30fps novel view synthesis at 1080p
Key Elements
- 3D Gaussians for scene representation
- Anisotropic covariance optimization for accurate scene representation
- Fast, visibility-aware splatting algorithm, using a tile-based and sorting renderer
Core Algorithm
Differentiable 3D Gaussian Splatting
3D Guassians are defined as ellipsoids at $ \mathbf x $ with full covariance matrix $ \Sigma $:
$$ G(\mathbf x) = e^{-\frac{1}{2}\mathbf x^T \Sigma^-1 \mathbf x} $$ and this can be alternatively represented with an epplisoid with rotation: $$ \Sigma = \mathbf R \mathbf S \mathbf S^T \mathbf R^T $$ so it can be efficiently represented with a 7-element vector (3 for scaling and 4 for quaternion). Each 3D Gaussian also uess sphereical harmonics to represent its color component, which can be written as the SH coefficient vector $ \mathbf c $. To summerize, each 3D Gaussian can be represented with the tuple $ (\mathbf x, \mathbf r, \mathbf q, \mathbf c, \alpha) $ where $ \alpha $ is the opacity and $ \mathbf r $ is the rotation vector.
Adaptive Density Control
- Initial 3D Gaussians are formed with SfM points;
- Adaptive control focuses on "under reconstruction" (not enough 3D Gaussians for the detailed geometric features) and "over reconstructions" (Gaussians cover too much area?) regions;
- Densify Gaussians based on view-space gradients: clone in "under-reconstruction" regions and split in over reconstruction regions;
Fast Differentiable Rasterization
Tile-based rendering and pre-sorting all primitives for the entire image:
- Split the screen into 16x16 tiles
- Cull 3D Gaussians against view frustum and tiles
- Instantiate Gaussians and assign them to tiles using depth and tile ID as key
- Sort Gaussians using fast GPU Radix sort
- For each tile, create a list of Gaussians and rasterize each tile independently
- Warm up the computation at low resolution then upsample twice
- Gradual optimization for SH bands: start from zero-th order and then add more bands
Unsorted Thoughts
- Very efficient choice for scene representation is key to enabling real-time rendering at 1080p; at the same time the representation may not be the most suitable for digital humans.
- How do we enable it for motion? One recent attempt is this work.
- The Guassians placement/generation algorithm is still quite empirical. Could we derive that from image features automatically? Maybe even make it trainable?
References
- Differentiable Point-Based Radiance Fields for Efficient View Synthesis
- Point-NeRF: Point-based Neural Radiance Fields
- VoGE: A Differentiable Volume Renderer using Neural Gaussian Ellipsoids for Analysis-by-Synthesis
- EWA volume splatting
- Pulsar: Efficient Sphere-based Neural Rendering
- Point-Based Neural Rendering with Per-View Optimization
- NeRFs: The Search for the best 3D Representation
- Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis