Appendix — how the LoopEqn Jacobian translator works

This page is for readers who want to extend the Jacobian translator or diagnose why a particular body dropped to the slow path. If you’re only using LoopEqn, the main guide at Indexed equations with LoopEqn is enough — come back here once you hit a term shape that the printer can’t classify.

Glossary

Term

Meaning

Phase J1

Constant Jacobian blocks: ∂F/∂x has no free Var symbols, so the block value is baked into _data_ at module-build time and the runtime J just slices it.

Phase J2

Mutable Jacobian blocks classified into Diag(var) / Mat_Mul(Diag(var), Matrix) / Mat_Mul(Matrix, Diag(var)) / Diag(u) @ Matrix @ Diag(v) (biscale) terms. Each contributes via a compiled scatter-add loop over the matrix’s nnz — no scipy.sparse construction at call time.

Phase J3

The fallback: a per-entry Python-loop LoopEqnDiff kernel that evaluates the canonical derivative expression at each nnz position. Correct for any shape; slower than J2 because the loop body isn’t vectorised.

KD

Shorthand for sympy.KroneckerDelta. KD(i, j) is 1 when i == j, zero otherwise — the canonicalised form of the ∂x[i]/∂x[k] derivative sympy produces.

_LoopJacEye(n)

Solverz primitive for the n × n identity matrix. Emitted when the canonical form reduces to KD(outer, diff).

_LoopJacSelectMat(col_map, n_outer, n_diff)

Solverz primitive for a sparse selection matrix whose row i has a single 1 at column col_map[i]. Emitted for KD(diff, map[outer]).

CSR walker

The per-row sparse iteration pattern that the F-side printer uses for Sum(M[i, j] * x[j], j) when M is declared sparse=True.

Biscale

Phase J2 term shape Diag(u) @ Matrix @ Diag(v) where Matrix is a constant sparse matrix. Each nnz (r, c, d) scatters u[r] * v[c] * d to output position (r, c).

Supported body shapes (F side)

What the F-side printer (_translate_loop_body_njit) will accept inside a LoopEqn.body without raising. If the body stays within this grammar, module_printer(jit=True).render() succeeds.

Body element

Syntax

Notes

Direct Var / Param reference

m.x[i], m.G[i, j]

i / j are bounded Idx

Indirect gather

m.x[m.map[i]], m.G[m.ns[i], j]

map / ns are 1-D int Params; equivalent to m.x[i] when i came from a non-identity Set.idx(...)

Bare scalar Param

m.Cp, m.Cp * something

scalar Param used as coefficient

Nested Sum over Idx

Sum(body, j) with j = Idx('j', n)

printer emits for j in range(n)

Sparse 2-D Param walker

Sum(m.G[i, j] * m.x[j], j) with G declared sparse=True

CSR row walk emitted automatically

Arithmetic

+, -, *, /, **

element-wise in the loop body

Unary functions

sin, cos, exp, log, sqrt, Abs, Sign, heaviside, atan2

dispatched to np.* / SolCF.*

Phase J2 translator coverage

The canonical derivative of a LoopEqn.body is computed via sympy.diff, run through canonicalize_kronecker (which pulls KD factors out of Sums where safe and collapses Sum(f(j) · KD(j, k), j) f(k) collapses), and then classified term-by-term. Terms that match one of the shapes below land on the compiled fast path; the rest fall to Phase J3.

Shape (after canonicalisation)

Emission

Where (file)

KD(outer, diff)

_LoopJacEye(n) — constant identity

loop_jac.py::_translate_kronecker_delta

KD(diff, map[outer])

_LoopJacSelectMat(map) — constant selection

same

Indexed(Param, outer, diff)

Para(name, dim=2) — constant 2-D block

loop_jac.py::_translate_indexed_param

Indexed(Param, outer, diff) * Indexed(Var, outer)

Mat_Mul(Diag(Var), Para) — row-scale

_classify_and_translate_term

Indexed(Param, outer, diff) * Indexed(Var, diff)

Mat_Mul(Para, Diag(Var)) — col-scale

same

KD(outer, diff) * Sum(Param[outer, j] * Var[j], j)

Diag(Mat_Mul(Para, iVar)) — DiagTerm

_classify_and_translate_term

KD(diff, map[outer]) * Sum(...) with identity map

same as the direct-KD DiagTerm above

_classify_and_translate_term + _indirect_kd_is_identity

KD(diff, map[outer]) * Π scalar_factors(outer)

Diag(scalar_vec) @ SelectMat(map) (direct: just Diag(scalar_vec))

_try_kd_outer_scalars

Π outer_scalars(outer) * Sum(...) with a Sum-KD inside

Diag(outer_vec) @ m_inner

_try_outer_scalar_times_sum_kd

Sum(f(outer, dummy) * KD(diff, map[dummy]), dummy) identity map

bare Para / Mat_Mul(SelectMat_row, Para) / Mat_Mul(matrix_expr, Diag(scalar_vec))

_try_sum_kd_collapse

same, non-identity injective map, single matrix factor

Mat_Mul(Para, SelectMat_col)

same

Mutable-matrix analyser decompositions

Once a term is classified as Phase J2, it becomes one entry in a MutableMatBlockMapping structure. The analyser recognises four term kinds:

  • diag_termsDiag(inner), scatters inner[i] to output position (i, i).

  • row_scale_termsMat_Mul(Diag(u), Matrix), scatters u[r] · M.data[k] to (r, c) at each matrix nnz.

  • col_scale_termsMat_Mul(Matrix, Diag(v)), symmetric.

  • biscale_termsMat_Mul(Diag(u), Matrix, Diag(v)) in any of the N-arg flat, 2-arg nested, or wrapped--1 forms, scatters u[r] · v[c] · M.data[k].

Matrix in any of the above may be a Para, a _LoopJacSelectMat, or a Mat_Mul chain of either — _sparse_matrix_nnz evaluates the chain to a single COO sparse matrix at analyser time. _extract_sign_and_core strips any numeric coefficient into a stored sign field (allowing -2 * Diag(x) to be handled without falling back).

Not supported (Phase J3 fallback)

The following shapes still hit the Phase J3 LoopEqnDiff kernel (correct but ~3–7× slower than J2 for small nnz). Rewrite the body to avoid them when the inner loop is on the hot path:

  • Multi-arg sympy Function without simplification — e.g. a user-defined sp.Function('f')(x[i], y[i]) whose sp.diff keeps the multi-arg form. Prefer elementary operations (the built-in atan2 is fine because sp.diff(atan2(x, y), x) simplifies to y / (x² + y²)).

  • Bilinear outer–dummy products inside Sum — e.g. Sum(Vm[i] * Vm[j] * cos(Va[i] - Va[j]), j) (polar power flow in the full form). None of the J2 patterns express two free axes under the same Sum in their Diag / Mat_Mul grammar. The diagonal derivative part simplifies cleanly, but the bilinear off-diagonal falls to J3.

  • Non-identity-map Sum-KD with Var-dependent scalar factors — e.g. Sum(Sign(ms[orig[p]]) * KD(k, map[p]) * V[i, p], p) with map a non-identity injection. The identity-map variant is supported (Pattern 4 mutable); the non-identity case needs an inverse-map gather the current scatter-add primitives don’t express.

  • Piecewise / conditional bodiescanonicalize_kronecker unwraps the first non-zero branch heuristically, which produces the wrong result for non-trivial conditions. Avoid sp.Piecewise in LoopEqn bodies.

  • Stencil PDE bodies (SolPde-style multi-arg callables) — inherently J3. The sparsity analyser still reports the per-cell contribution pattern correctly, so the J3 kernel iterates only O(nnz) entries per Newton step.

Body-level limitations

  • Per-pipe 1-D Vars with different lengths (e.g. pre-refactor Tsp_0, Tsp_1, …) cannot be indexed by a LoopEqn dummy: sympy.IndexedBase can select array elements but not symbolic names. The SolMuseum solution is the “flatten” refactor that fuses per-pipe states into a single flat Tsp_all Var with explicit (pipe_offset, segment) addressing.

  • Nested Sum with reset is not supported. A body like Sum(Sum(body, (inner, …)), (outer, …)) where the inner sum resets each outer iteration cannot be expressed as a flat linear-prelude. Split into two bodies or flatten the sum algebra.

Debugging a fall-to-J3

Set SZ_DEBUG_PATTERN4MUT=1 or SZ_DEBUG_ANALYZER=1 in the environment when rendering to print every term the analyser / Pattern 4 translator sees and accepts or rejects. The output points at the exact canonical expression that wasn’t classifiable, so you can decide whether to (a) rewrite the body to avoid the shape, (b) extend the translator with a new pattern, or (c) accept the Phase J3 fallback.

Further reading

  • Source of the translator: Solverz/equation/loop_jac.py.

  • Source of the mutable-matrix analyser: Solverz/code_printer/python/module/mutable_mat_analyzer.py.

  • Tracking issue for J2 extensions: #133.