Any rotation matrix splits exactly into a dense part (entries below a threshold τ) and a sparse correction (entries above τ). Routing the bf16-safe path through the dense part strictly reduces the associativity error budget when τ < ρ.
This is the constructive optimization step that turns the Bounded Arithmetic monotonicity theorem into an engineering action: find τ, prove the split exact, prove the budget improves, hand the dense part to the bf16 kernel and the sparse part to a higher-precision fallback.
Lean anchors.
densePart_add_sparsePart,
densePart_entrywise_bound,
densePart_bf16_budget,
densePart_budget_strictly_improves
Exact split.
Strict budget improvement.
In English. Thresholding at τ splits R exactly (no approximation error in the split itself). The dense part is entrywise bounded by τ, so sending it through a bf16 kernel inherits the tighter budget. Strict improvement holds whenever τ is strictly below the original rotation envelope ρ.
Important caveat. The split introduces a sparse correction that must be handled separately (e.g. in fp32). The net win depends on the sparsity ratio at τ and the relative cost of the two paths. The Lean theorems certify the error-bound side only.
The split is algebraically exact: no rounding in the decomposition itself. Error enters only when the dense part goes through bf16 arithmetic — and that error is now bounded by the tighter budget at τ.
Route the small-magnitude part through bf16, handle large entries in fp32 without losing the rotation.
densePart_budget_strictly_improves gives a Lean proof that τ < ρ always wins.
The split has zero reconstruction error — no approximation until the bf16 kernel fires.
Depends on Bounded Arithmetic Semantics for the budget definition and monotonicity lemma.