Computing the Singular Value Decomposition of 3x3 matrices with minimal branching and elementary floating point operations
A. McAdams, A. Selle, R. Tamstorf, J. Teran and E. Sifakis
A numerical method for the computation of the Singular Value Decomposition of 3 × 3 matrices is presented. The proposed methodology robustly handles rank-deficient matrices and guarantees orthonormality of the computed rotational factors. The algorithm is tailored to the characteristics of SIMD or vector processors. In particular, it does not require any explicit branching beyond simple conditional assignments (as in the C++ ternary operator ?:, or the SSE4.1 instruction VBLENDPS), enabling trivial data-level parallelism for any number of operations. Furthermore, no trigonometric or other expensive operations are required; the only floating point operations utilized are addition, multiplication, and an inexact (yet fast) reciprocal square root which is broadly available on current SIMD/vector architectures. The performance observed approaches the limit of making the 3 × 3 SVD a memory-bound (as opposed to CPU-bound) operation on current SMP platforms.
A. McAdams, A. Selle, R. Tamstorf, J. Teran and E. Sifakis, “Computing the Singular Value Decomposition of 3x3 matrices with minimal branching and elementary floating point operations”, University of Wisconsin - Madison technical report TR1690, May 2011 [PDF]
Version 1.0 (released 8 July 2011) [ZIP]
Version 1.1 (released 26 August 2013) [ZIP]
Version 1.2 (released 15 May 2018; includes AVX-512 support) [ZIP]