Appendix — how the `LoopEqn` Jacobian translator works¶

This page is for readers who want to extend the Jacobian translator or diagnose why a particular body dropped to the slow path. If you’re only using LoopEqn, the main guide at Indexed equations with LoopEqn is enough — come back here once you hit a term shape that the printer can’t classify.

Glossary¶

Term	Meaning
Phase J1	Constant Jacobian blocks: `∂F/∂x` has no free Var symbols, so the block value is baked into `_data_` at module-build time and the runtime `J` just slices it.
Phase J2	Mutable Jacobian blocks classified into `Diag(var)` / `Mat_Mul(Diag(var), Matrix)` / `Mat_Mul(Matrix, Diag(var))` / `Diag(u) @ Matrix @ Diag(v)` (biscale) terms. Each contributes via a compiled scatter-add loop over the matrix’s nnz — no `scipy.sparse` construction at call time.
Phase J3	The fallback: a per-entry Python-loop `LoopEqnDiff` kernel that evaluates the canonical derivative expression at each nnz position. Correct for any shape; slower than J2 because the loop body isn’t vectorised.
KD	Shorthand for `sympy.KroneckerDelta`. `KD(i, j)` is 1 when `i == j`, zero otherwise — the canonicalised form of the `∂x[i]/∂x[k]` derivative sympy produces.
`_LoopJacEye(n)`	Solverz primitive for the `n × n` identity matrix. Emitted when the canonical form reduces to `KD(outer, diff)`.
`_LoopJacSelectMat(col_map, n_outer, n_diff)`	Solverz primitive for a sparse selection matrix whose row `i` has a single 1 at column `col_map[i]`. Emitted for `KD(diff, map[outer])`.
CSR walker	The per-row sparse iteration pattern that the F-side printer uses for `Sum(M[i, j] * x[j], j)` when `M` is declared `sparse=True`.
Biscale	Phase J2 term shape `Diag(u) @ Matrix @ Diag(v)` where `Matrix` is a constant sparse matrix. Each nnz `(r, c, d)` scatters `u[r] * v[c] * d` to output position `(r, c)`.

Supported body shapes (F side)¶

What the F-side printer (_translate_loop_body_njit) will accept inside a LoopEqn.body without raising. If the body stays within this grammar, module_printer(jit=True).render() succeeds.

Body element	Syntax	Notes
Direct Var / Param reference	`m.x[i]`, `m.G[i, j]`	`i` / `j` are bounded `Idx`
Indirect gather	`m.x[m.map[i]]`, `m.G[m.ns[i], j]`	`map` / `ns` are 1-D int Params; equivalent to `m.x[i]` when `i` came from a non-identity `Set.idx(...)`
Bare scalar Param	`m.Cp`, `m.Cp * something`	scalar Param used as coefficient
Nested `Sum` over `Idx`	`Sum(body, j)` with `j = Idx('j', n)`	printer emits `for j in range(n)`
Sparse 2-D Param walker	`Sum(m.G[i, j] * m.x[j], j)` with `G` declared `sparse=True`	CSR row walk emitted automatically
Arithmetic	`+`, `-`, ``, `/`, `*`	element-wise in the loop body
Unary functions	`sin`, `cos`, `exp`, `log`, `sqrt`, `Abs`, `Sign`, `heaviside`, `atan2`	dispatched to `np.` / `SolCF.`

Phase J2 translator coverage¶

The canonical derivative of a LoopEqn.body is computed via sympy.diff, run through canonicalize_kronecker (which pulls KD factors out of Sums where safe and collapses Sum(f(j) · KD(j, k), j) → f(k) collapses), and then classified term-by-term. Terms that match one of the shapes below land on the compiled fast path; the rest fall to Phase J3.

Shape (after canonicalisation)	Emission	Where (file)
`KD(outer, diff)`	`_LoopJacEye(n)` — constant identity	`loop_jac.py::_translate_kronecker_delta`
`KD(diff, map[outer])`	`_LoopJacSelectMat(map)` — constant selection	same
`Indexed(Param, outer, diff)`	`Para(name, dim=2)` — constant 2-D block	`loop_jac.py::_translate_indexed_param`
`Indexed(Param, outer, diff) * Indexed(Var, outer)`	`Mat_Mul(Diag(Var), Para)` — row-scale	`_classify_and_translate_term`
`Indexed(Param, outer, diff) * Indexed(Var, diff)`	`Mat_Mul(Para, Diag(Var))` — col-scale	same
`KD(outer, diff) * Sum(Param[outer, j] * Var[j], j)`	`Diag(Mat_Mul(Para, iVar))` — DiagTerm	`_classify_and_translate_term`
`KD(diff, map[outer]) * Sum(...)` with identity `map`	same as the direct-KD DiagTerm above	`_classify_and_translate_term` + `_indirect_kd_is_identity`
`KD(diff, map[outer]) * Π scalar_factors(outer)`	`Diag(scalar_vec) @ SelectMat(map)` (direct: just `Diag(scalar_vec)`)	`_try_kd_outer_scalars`
`Π outer_scalars(outer) * Sum(...)` with a Sum-KD inside	`Diag(outer_vec) @ m_inner`	`_try_outer_scalar_times_sum_kd`
`Sum(f(outer, dummy) * KD(diff, map[dummy]), dummy)` identity `map`	bare `Para` / `Mat_Mul(SelectMat_row, Para)` / `Mat_Mul(matrix_expr, Diag(scalar_vec))`	`_try_sum_kd_collapse`
same, non-identity injective `map`, single matrix factor	`Mat_Mul(Para, SelectMat_col)`	same

Mutable-matrix analyser decompositions¶

Once a term is classified as Phase J2, it becomes one entry in a MutableMatBlockMapping structure. The analyser recognises four term kinds:

diag_terms — Diag(inner), scatters inner[i] to output position (i, i).
row_scale_terms — Mat_Mul(Diag(u), Matrix), scatters u[r] · M.data[k] to (r, c) at each matrix nnz.
col_scale_terms — Mat_Mul(Matrix, Diag(v)), symmetric.
biscale_terms — Mat_Mul(Diag(u), Matrix, Diag(v)) in any of the N-arg flat, 2-arg nested, or wrapped--1 forms, scatters u[r] · v[c] · M.data[k].

Matrix in any of the above may be a Para, a _LoopJacSelectMat, or a Mat_Mul chain of either — _sparse_matrix_nnz evaluates the chain to a single COO sparse matrix at analyser time. _extract_sign_and_core strips any numeric coefficient into a stored sign field (allowing -2 * Diag(x) to be handled without falling back).

Not supported (Phase J3 fallback)¶

The following shapes still hit the Phase J3 LoopEqnDiff kernel (correct but ~3–7× slower than J2 for small nnz). Rewrite the body to avoid them when the inner loop is on the hot path:

Multi-arg sympy Function without simplification — e.g. a user-defined sp.Function('f')(x[i], y[i]) whose sp.diff keeps the multi-arg form. Prefer elementary operations (the built-in atan2 is fine because sp.diff(atan2(x, y), x) simplifies to y / (x² + y²)).
Bilinear outer–dummy products inside Sum — e.g. Sum(Vm[i] * Vm[j] * cos(Va[i] - Va[j]), j) (polar power flow in the full form). None of the J2 patterns express two free axes under the same Sum in their Diag / Mat_Mul grammar. The diagonal derivative part simplifies cleanly, but the bilinear off-diagonal falls to J3.
Non-identity-map Sum-KD with Var-dependent scalar factors — e.g. Sum(Sign(ms[orig[p]]) * KD(k, map[p]) * V[i, p], p) with map a non-identity injection. The identity-map variant is supported (Pattern 4 mutable); the non-identity case needs an inverse-map gather the current scatter-add primitives don’t express.
Piecewise / conditional bodies — canonicalize_kronecker unwraps the first non-zero branch heuristically, which produces the wrong result for non-trivial conditions. Avoid sp.Piecewise in LoopEqn bodies.
Stencil PDE bodies (SolPde-style multi-arg callables) — inherently J3. The sparsity analyser still reports the per-cell contribution pattern correctly, so the J3 kernel iterates only O(nnz) entries per Newton step.

Body-level limitations¶

Per-pipe 1-D Vars with different lengths (e.g. pre-refactor Tsp_0, Tsp_1, …) cannot be indexed by a LoopEqn dummy: sympy.IndexedBase can select array elements but not symbolic names. The SolMuseum solution is the “flatten” refactor that fuses per-pipe states into a single flat Tsp_all Var with explicit (pipe_offset, segment) addressing.
Nested Sum with reset is not supported. A body like Sum(Sum(body, (inner, …)), (outer, …)) where the inner sum resets each outer iteration cannot be expressed as a flat linear-prelude. Split into two bodies or flatten the sum algebra.

Debugging a fall-to-J3¶

Set SZ_DEBUG_PATTERN4MUT=1 or SZ_DEBUG_ANALYZER=1 in the environment when rendering to print every term the analyser / Pattern 4 translator sees and accepts or rejects. The output points at the exact canonical expression that wasn’t classifiable, so you can decide whether to (a) rewrite the body to avoid the shape, (b) extend the translator with a new pattern, or (c) accept the Phase J3 fallback.

Appendix — how the LoopEqn Jacobian translator works¶