A symbolic surrogate for hardware associativity error, monotone in the dynamic range of the rotation matrix. The point is not to fully formalize bf16; it is to make the dependency on dynamic range explicit enough for downstream optimization theorems to consume.
Higham-style worst-case estimate, not a calibrated tensor-core variance model. If the target kernel is dominated by fp32 accumulation with variance-style cancellation, these bounds may be highly pessimistic. The value is in the order structure, not the constants.
Lean anchors.
associativityBudget_nonneg,
associativityBudget_mono_rotation,
bf16_non_associative_bound
Budget definition.
Monotonicity theorem.
In English. The bf16 associativity error is bounded by a quantity proportional to the squared dynamic range of the rotation matrix. Shrinking the rotation envelope strictly improves the error budget. This is the load-bearing algebraic fact for the Dense–Sparse refactor.
Axiomatic interface.
BF16Axiomatics is a typeclass that abstracts the exact
hardware semantics. The associator bound holds for any instance;
concrete bf16 formalization can refine the instance later without
changing theorems that depend only on monotonicity.
Thresholding the rotation to dynamic range τ < ρ moves you left on this curve. The quadratic shape means even modest reductions in dynamic range give large budget savings.
Bounds make explicit which part of the rotation drives bf16 error — the large-magnitude tail.
Typeclass interface lets downstream theorems be proved once; hardware instance can be refined later.
Monotonicity is the key lemma consumed by densePart_budget_strictly_improves.