Appendix — how the LoopEqn Jacobian translator works¶
This page is for readers who want to extend the Jacobian translator
or diagnose why a particular body dropped to the slow path. If
you’re only using LoopEqn, the main guide at Indexed equations with LoopEqn is
enough — come back here once you hit a term shape that the printer
can’t classify.
Glossary¶
Term |
Meaning |
|---|---|
Phase J1 |
Constant Jacobian blocks: |
Phase J2 |
Mutable Jacobian blocks classified into |
Phase J3 |
The fallback: a per-entry Python-loop |
KD |
Shorthand for |
|
Solverz primitive for the |
|
Solverz primitive for a sparse selection matrix whose row |
CSR walker |
The per-row sparse iteration pattern that the F-side printer uses for |
Biscale |
Phase J2 term shape |
Supported body shapes (F side)¶
What the F-side printer (_translate_loop_body_njit) will accept
inside a LoopEqn.body without raising. If the body stays within
this grammar, module_printer(jit=True).render() succeeds.
Body element |
Syntax |
Notes |
|---|---|---|
Direct Var / Param reference |
|
|
Indirect gather |
|
|
Bare scalar Param |
|
scalar Param used as coefficient |
Nested |
|
printer emits |
Sparse 2-D Param walker |
|
CSR row walk emitted automatically |
Arithmetic |
|
element-wise in the loop body |
Unary functions |
|
dispatched to |
Phase J2 translator coverage¶
The canonical derivative of a LoopEqn.body is computed via
sympy.diff, run through canonicalize_kronecker (which pulls KD
factors out of Sums where safe and collapses Sum(f(j) · KD(j, k), j) → f(k) collapses), and then classified term-by-term. Terms that
match one of the shapes below land on the compiled fast path; the
rest fall to Phase J3.
Shape (after canonicalisation) |
Emission |
Where (file) |
|---|---|---|
|
|
|
|
|
same |
|
|
|
|
|
|
|
|
same |
|
|
|
|
same as the direct-KD DiagTerm above |
|
|
|
|
|
|
|
|
bare |
|
same, non-identity injective |
|
same |
Mutable-matrix analyser decompositions¶
Once a term is classified as Phase J2, it becomes one entry in a
MutableMatBlockMapping
structure. The analyser recognises four term kinds:
diag_terms—Diag(inner), scattersinner[i]to output position(i, i).row_scale_terms—Mat_Mul(Diag(u), Matrix), scattersu[r] · M.data[k]to(r, c)at each matrix nnz.col_scale_terms—Mat_Mul(Matrix, Diag(v)), symmetric.biscale_terms—Mat_Mul(Diag(u), Matrix, Diag(v))in any of the N-arg flat, 2-arg nested, or wrapped--1forms, scattersu[r] · v[c] · M.data[k].
Matrix in any of the above may be a Para, a _LoopJacSelectMat,
or a Mat_Mul chain of either — _sparse_matrix_nnz evaluates the
chain to a single COO sparse matrix at analyser time. _extract_sign_and_core
strips any numeric coefficient into a stored sign field (allowing
-2 * Diag(x) to be handled without falling back).
Not supported (Phase J3 fallback)¶
The following shapes still hit the Phase J3 LoopEqnDiff kernel
(correct but ~3–7× slower than J2 for small nnz). Rewrite the
body to avoid them when the inner loop is on the hot path:
Multi-arg sympy
Functionwithout simplification — e.g. a user-definedsp.Function('f')(x[i], y[i])whosesp.diffkeeps the multi-arg form. Prefer elementary operations (the built-inatan2is fine becausesp.diff(atan2(x, y), x)simplifies toy / (x² + y²)).Bilinear outer–dummy products inside
Sum— e.g.Sum(Vm[i] * Vm[j] * cos(Va[i] - Va[j]), j)(polar power flow in the full form). None of the J2 patterns express two free axes under the sameSumin theirDiag/Mat_Mulgrammar. The diagonal derivative part simplifies cleanly, but the bilinear off-diagonal falls to J3.Non-identity-map Sum-KD with Var-dependent scalar factors — e.g.
Sum(Sign(ms[orig[p]]) * KD(k, map[p]) * V[i, p], p)withmapa non-identity injection. The identity-map variant is supported (Pattern 4 mutable); the non-identity case needs an inverse-map gather the current scatter-add primitives don’t express.Piecewise / conditional bodies —
canonicalize_kroneckerunwraps the first non-zero branch heuristically, which produces the wrong result for non-trivial conditions. Avoidsp.Piecewisein LoopEqn bodies.Stencil PDE bodies (
SolPde-style multi-arg callables) — inherently J3. The sparsity analyser still reports the per-cell contribution pattern correctly, so the J3 kernel iterates onlyO(nnz)entries per Newton step.
Body-level limitations¶
Per-pipe 1-D Vars with different lengths (e.g. pre-refactor
Tsp_0,Tsp_1, …) cannot be indexed by aLoopEqndummy:sympy.IndexedBasecan select array elements but not symbolic names. The SolMuseum solution is the “flatten” refactor that fuses per-pipe states into a single flatTsp_allVar with explicit(pipe_offset, segment)addressing.Nested
Sumwith reset is not supported. A body likeSum(Sum(body, (inner, …)), (outer, …))where the inner sum resets each outer iteration cannot be expressed as a flat linear-prelude. Split into two bodies or flatten the sum algebra.
Debugging a fall-to-J3¶
Set SZ_DEBUG_PATTERN4MUT=1 or SZ_DEBUG_ANALYZER=1 in the
environment when rendering to print every term the analyser / Pattern
4 translator sees and accepts or rejects. The output points at the
exact canonical expression that wasn’t classifiable, so you can
decide whether to (a) rewrite the body to avoid the shape,
(b) extend the translator with a new pattern, or (c) accept the
Phase J3 fallback.
Further reading¶
Source of the translator:
Solverz/equation/loop_jac.py.Source of the mutable-matrix analyser:
Solverz/code_printer/python/module/mutable_mat_analyzer.py.Tracking issue for J2 extensions: #133.