Preprint Lean 4 · Mathlib — symmetry classes — theorems

Verified neural compilation of RoPE and gauge equivalences in transformers.

Different RoPE bands are different clocks. No exact rewrite can mix them — a Lean 4 theorem, not a metaphor.

Transformer weights admit non-trivial symmetries: rewrites under which the computed function is preserved exactly, and weaker loss-level degeneracies that training cannot distinguish. This survey catalogs the ones we have isolated so far — RoPE-commuting rotations, sign/phase gauges, parabolic stabilizers, observable quotients, and probe-induced fake equivalences — together with the newer boundary ledger: theorem-backed obstruction classes such as Class K bias/generation decoupling, and explicit bf16 precision budgets (bounded arithmetic semantics, dense–sparse refactor). Each is stated as a Lean 4 theorem; most are proven, the rest explicitly deferred (counts below). Gauge claims must survive probe refinement — the ones that do not are cataloged as fakes, not symmetries.

—

Theorems proven

—

Symmetry classes

—

Deferred

FIG 1 · Taxonomy of the 12 core symmetry classes · EDGES = subsumption / composition · full catalog below ■ gauge-free ■ training-invariant ■ probe-fake

01 · OVERVIEW

What is verified neural compilation?

Weight rewrites as compiler passes, each one carrying a Lean proof that the network still computes the same function.

A trained transformer is a program, and its weights are the source code. Many different weight settings compute exactly the same function — the parameterization is redundant. Verified neural compilation treats that redundancy the way a compiler treats source rewrites: each weight transformation is a pass, and each pass ships with a machine-checked Lean 4 theorem that the rewritten network computes the same function as the original. The equalities are exact, in real arithmetic; this survey claims no approximate passes. Floating-point execution is outside the exact theorems' scope — where the survey touches it at all, it does so through an explicit axiomatic error budget (bounded arithmetic semantics, below), never through bit-level equality claims.

RoPE is what makes this tractable. Rotary position embedding turns the symmetry question into a commutant problem: which rewrites $R$ commute with the rotary operator $\Theta$? The flat-matrix classification answers this in full at the flat-head level, under nondegeneracy and pairwise-distinct-band hypotheses, and the commutant turns out to be rigid enough to support exact offline rewriting results. On the algebra side, distinct RoPE bands behave like different clocks, so any exact intertwiner between unequal bands vanishes. What survives on each band is exactly the two-parameter family $aI + bJ$ — complex multiplication on the corresponding 2D plane.

That exact classification has a direct engineering consequence: a rewrite that commutes with RoPE can be folded into the projection weights and absorbed back out by the output projection, with zero change in the exact attention block output and zero runtime cost. The rest of the survey maps out how far this picture extends — and, just as importantly, where it is an artifact of a too-coarse probe rather than a property of the network.

02 · SYMMETRY CLASSES

The three kinds of "same network" — and where they stop

RoPE commutants, gauge equivalences, and foldable reparameterizations — what each one preserves, and why each one matters. Then the boundary ledger: obstruction classes and precision budgets, where the exact picture ends.

RoPE commutants — what attention provably cannot see

At the local $2\times2$ level, the commutant of one nondegenerate rotation block is exactly the family $aI + bJ$. At the flat-head level, pairwise distinct cosine bands force every off-diagonal intertwiner block to vanish, so the full commutant becomes block-diagonal with one surviving complex scalar on each RoPE plane. Physically: different bands are different clocks, and a fixed exact rewrite cannot keep different clocks phase-locked.

The repeated-frequency case is not a failure of rigidity but a refinement of it. Equal-angle planes may mix, but only inside one angle class, and still only in the same $aI + bJ$ language. The newest theorem-side upgrade shows these repeated-frequency sectors are closed under composition: one angle class behaves like a genuine fiberwise matrix algebra of complex-scalar blocks. The classification also holds for a geometric band law matching the inverse-frequency decay used by real RoPE model families — not just a toy schedule.

Why it matters. This is the exact inventory of rewrites that are invisible to attention: anything inside the commutant is a free move; anything outside it changes the attention function on some input. The scope is exact, input-independent rewrites — approximate or data-dependent rotations, the kind compression and merging schemes often use, are outside what these theorems license and would need separate error analysis.

Gauge equivalences — what the measurement interface cannot see

The commutant question asked what commutes with the dynamics. Observable quotients ask a different question: what changes are invisible to the chosen observable family $\mathcal{O}$? Two states are identified, $x \sim_{\mathcal{O}} x'$, when every observable agrees on them. This single idea organizes a family of gauges: the sign/phase gauge (a feature is a ray, not a vector), the depth gauge (absorb part of one layer into its neighbor without changing the boundary story), and parabolic stabilizers, where sectors are derived as the stabilizer subgroup of the observable family rather than hand-labeled. At the objective level, sector exchange and sector attractors capture degeneracies the loss cannot distinguish — training-invariant rather than exact.

The guardrail for all of this is fake symmetry: a rewrite that looks invisible to a coarse probe but is exposed by a finer one. The symmetry is not in the system — it is in the instrument. Every gauge claim in this survey is therefore required to survive probe refinement, and the ones that do not are cataloged as fakes.

Why it matters. Equivalence claims are only meaningful relative to a declared observable family. This class supplies the discipline: it says which mechanistic stories are gauge-dependent bookkeeping, and the writable region packages the invisible freedom that remains. Whether SGD exploits that freedom as low-loss transport directions is conjectured, not shown — it sits on the open-problems list.

Foldable reparameterizations — when a symmetry becomes a free engineering move

If an orthogonal rewrite $R$ commutes with the RoPE operator, it can be folded into $W_q$, $W_k$, and $W_v$, and absorbed back out by the output projection $W_o R^\top$ — with zero change in the exact attention block output. The current Lean line carries that equality through the wrapped multi-head picture: headwise commuting rewrites survive concatenation, one shared output projection, and the residual add. Lifting this wrapper theorem to a more faithful Transformer shell — richer masking and projection plumbing — is open; the corresponding statements are tracked as deferred in the status table below.

Why it matters. Foldability turns an exact symmetry into a zero-runtime-cost compiler pass: an offline rewrite to checkpoint weights, with a machine-checked guarantee that the attention function is unchanged in real arithmetic. Floating-point execution can still differ at rounding level — the theorem governs the function, not the bits; bounded arithmetic semantics tracks that gap as an explicit budget. This is the "compilation" in verified neural compilation.

Obstructions and precision budgets — the boundary ledger

Three newer classes mark where the exact picture stops. Class K — bias/generation decoupling is the sharpest theorem-backed obstruction: the same intervention moves a bias-path readout while the generation-path readout stays flat. In the survey's language it is a concrete fake-symmetry pattern — the coarse bias probe declares a direction writable, and refining the observable to the generation path collapses that writable region. The guardrail theorem shows the nontrivial case is genuinely two-readout, not a disguised single-readout null.

On the hardware side, bounded arithmetic semantics makes bf16 associativity error explicit as a symbolic budget, monotone in the rotation's dynamic range. It is an axiomatic interface — a Higham-style worst-case surrogate, not a calibrated hardware model — so its value is the order structure, not the constants. The dense–sparse refactor turns that monotonicity into an engineering action: split a rotation exactly into a small-magnitude dense part and a sparse correction, and the dense part inherits a strictly smaller budget. The Lean theorems certify the error-bound side only; the net win depends on the sparsity ratio and the cost of the higher-precision fallback path.

Why it matters. These classes keep the survey honest at both ends. Class K records where an apparent symmetry dies under probe refinement — a real obstruction, not a missing trick. The precision pair records what happens to "exact" when it meets finite arithmetic: not a bit-level equality claim, but a proved budget that downstream compilation passes can consume.

How to read the survey. This page is step 1 of 7 on the main path — exact RoPE. The path continues through foldability, observable quotients, sign/phase, sector exchange, and fake symmetry, ending at the writable region. Six deepenings — angle classes, sector attractors, transport geometry, parabolic stabilizers, depth gauge, and toy synthesis — sharpen individual steps. Three extension pages sit past the end of the path: Class K (bias/generation decoupling) records the sharpest probe-refinement obstruction, and bounded arithmetic semantics with the dense–sparse refactor carry the exact rewrites into bf16 precision budgets.

03 · CATALOG

Symmetry catalog

One card per symmetry class in the survey. Each links to its detail page with the full invariant, derivation, and Lean theorem list.

04 · WIDGETS

Interactive invariants

Three demos, plain JS. Each illustrates one idea from the survey — they are illustrations, not proofs; the guarantees live in the Lean theorems. Move a control and the invariant updates in real time.

05 · LEAN STATUS

Proof status

Per-symmetry counts, auto-populated from _data/lean_status.json · last build:

Symmetry	Proven	Deferred	Axiomatic	Stub	Total	Distribution

06 · ARTIFACTS

Paper & code

Manuscript, source, and formal proofs. Cite as below.

pending PDF Manuscript PDF build pending — latexmk not yet run

pending arXiv Preprint submission pending — awaiting PDF

Repository github.com / d3banjan / symmetry-survey-paper Lean 4 · Mathlib · MIT Citation BibTeX entry below · one-click copy

@article{basu2026symmetry,
  title   = {Symmetry Survey for Verified Neural Compilation},
  author  = {Basu, Debanjan},
  year    = {2026},
  note    = {Lean 4 companion at github.com/d3banjan/symmetry-survey-paper}
}