(loopeqn_translator)=

# Appendix — how the `LoopEqn` Jacobian translator works

This page is for readers who want to extend the Jacobian translator
or diagnose why a particular body dropped to the slow path. If
you're only *using* `LoopEqn`, the main guide at {ref}`loopeqn` is
enough — come back here once you hit a term shape that the printer
can't classify.

## Glossary

| Term | Meaning |
|---|---|
| **Phase J1** | Constant Jacobian blocks: `∂F/∂x` has no free Var symbols, so the block value is baked into `_data_` at module-build time and the runtime `J` just slices it. |
| **Phase J2** | Mutable Jacobian blocks classified into `Diag(var)` / `Mat_Mul(Diag(var), Matrix)` / `Mat_Mul(Matrix, Diag(var))` / `Diag(u) @ Matrix @ Diag(v)` (biscale) terms. Each contributes via a compiled scatter-add loop over the matrix's nnz — no `scipy.sparse` construction at call time. |
| **Phase J3** | The fallback: a per-entry Python-loop `LoopEqnDiff` kernel that evaluates the canonical derivative expression at each nnz position. Correct for any shape; slower than J2 because the loop body isn't vectorised. |
| **KD** | Shorthand for `sympy.KroneckerDelta`. `KD(i, j)` is 1 when `i == j`, zero otherwise — the canonicalised form of the `∂x[i]/∂x[k]` derivative sympy produces. |
| **`_LoopJacEye(n)`** | Solverz primitive for the `n × n` identity matrix. Emitted when the canonical form reduces to `KD(outer, diff)`. |
| **`_LoopJacSelectMat(col_map, n_outer, n_diff)`** | Solverz primitive for a sparse selection matrix whose row `i` has a single 1 at column `col_map[i]`. Emitted for `KD(diff, map[outer])`. |
| **CSR walker** | The per-row sparse iteration pattern that the F-side printer uses for `Sum(M[i, j] * x[j], j)` when `M` is declared `sparse=True`. |
| **Biscale** | Phase J2 term shape `Diag(u) @ Matrix @ Diag(v)` where `Matrix` is a constant sparse matrix. Each nnz `(r, c, d)` scatters `u[r] * v[c] * d` to output position `(r, c)`. |

## Supported body shapes (F side)

What the F-side printer (`_translate_loop_body_njit`) will accept
inside a `LoopEqn.body` without raising. If the body stays within
this grammar, `module_printer(jit=True).render()` succeeds.

| Body element | Syntax | Notes |
|---|---|---|
| Direct Var / Param reference | `m.x[i]`, `m.G[i, j]` | `i` / `j` are bounded `Idx` |
| Indirect gather | `m.x[m.map[i]]`, `m.G[m.ns[i], j]` | `map` / `ns` are 1-D int Params; equivalent to `m.x[i]` when `i` came from a non-identity `Set.idx(...)` |
| Bare scalar Param | `m.Cp`, `m.Cp * something` | scalar Param used as coefficient |
| Nested `Sum` over `Idx` | `Sum(body, j)` with `j = Idx('j', n)` | printer emits `for j in range(n)` |
| Sparse 2-D Param walker | `Sum(m.G[i, j] * m.x[j], j)` with `G` declared `sparse=True` | CSR row walk emitted automatically |
| Arithmetic | `+`, `-`, `*`, `/`, `**` | element-wise in the loop body |
| Unary functions | `sin`, `cos`, `exp`, `log`, `sqrt`, `Abs`, `Sign`, `heaviside`, `atan2` | dispatched to `np.*` / `SolCF.*` |

## Phase J2 translator coverage

The canonical derivative of a `LoopEqn.body` is computed via
`sympy.diff`, run through `canonicalize_kronecker` (which pulls `KD`
factors out of `Sum`s where safe and collapses `Sum(f(j) · KD(j, k), j)
→ f(k)` collapses), and then classified term-by-term. Terms that
match one of the shapes below land on the compiled fast path; the
rest fall to Phase J3.

| Shape (after canonicalisation) | Emission | Where (file) |
|---|---|---|
| `KD(outer, diff)` | `_LoopJacEye(n)` — constant identity | `loop_jac.py::_translate_kronecker_delta` |
| `KD(diff, map[outer])` | `_LoopJacSelectMat(map)` — constant selection | same |
| `Indexed(Param, outer, diff)` | `Para(name, dim=2)` — constant 2-D block | `loop_jac.py::_translate_indexed_param` |
| `Indexed(Param, outer, diff) * Indexed(Var, outer)` | `Mat_Mul(Diag(Var), Para)` — row-scale | `_classify_and_translate_term` |
| `Indexed(Param, outer, diff) * Indexed(Var, diff)` | `Mat_Mul(Para, Diag(Var))` — col-scale | same |
| `KD(outer, diff) * Sum(Param[outer, j] * Var[j], j)` | `Diag(Mat_Mul(Para, iVar))` — DiagTerm | `_classify_and_translate_term` |
| `KD(diff, map[outer]) * Sum(...)` with **identity** `map` | same as the direct-KD DiagTerm above | `_classify_and_translate_term` + `_indirect_kd_is_identity` |
| `KD(diff, map[outer]) * Π scalar_factors(outer)` | `Diag(scalar_vec) @ SelectMat(map)` (direct: just `Diag(scalar_vec)`) | `_try_kd_outer_scalars` |
| `Π outer_scalars(outer) * Sum(...)` with a Sum-KD inside | `Diag(outer_vec) @ m_inner` | `_try_outer_scalar_times_sum_kd` |
| `Sum(f(outer, dummy) * KD(diff, map[dummy]), dummy)` identity `map` | bare `Para` / `Mat_Mul(SelectMat_row, Para)` / `Mat_Mul(matrix_expr, Diag(scalar_vec))` | `_try_sum_kd_collapse` |
| same, non-identity injective `map`, single matrix factor | `Mat_Mul(Para, SelectMat_col)` | same |

## Mutable-matrix analyser decompositions

Once a term is classified as Phase J2, it becomes one entry in a
[`MutableMatBlockMapping`](https://github.com/smallbunnies/Solverz/blob/main/Solverz/code_printer/python/module/mutable_mat_analyzer.py)
structure. The analyser recognises four term kinds:

- `diag_terms` — `Diag(inner)`, scatters `inner[i]` to output
  position `(i, i)`.
- `row_scale_terms` — `Mat_Mul(Diag(u), Matrix)`, scatters
  `u[r] · M.data[k]` to `(r, c)` at each matrix nnz.
- `col_scale_terms` — `Mat_Mul(Matrix, Diag(v))`, symmetric.
- `biscale_terms` — `Mat_Mul(Diag(u), Matrix, Diag(v))` in any of
  the N-arg flat, 2-arg nested, or wrapped-`-1` forms, scatters
  `u[r] · v[c] · M.data[k]`.

`Matrix` in any of the above may be a `Para`, a `_LoopJacSelectMat`,
or a `Mat_Mul` chain of either — `_sparse_matrix_nnz` evaluates the
chain to a single COO sparse matrix at analyser time. `_extract_sign_and_core`
strips any numeric coefficient into a stored `sign` field (allowing
`-2 * Diag(x)` to be handled without falling back).

## Not supported (Phase J3 fallback)

The following shapes still hit the Phase J3 `LoopEqnDiff` kernel
(correct but ~3–7× slower than J2 for small `nnz`). Rewrite the
body to avoid them when the inner loop is on the hot path:

- **Multi-arg sympy `Function` without simplification** — e.g. a
  user-defined `sp.Function('f')(x[i], y[i])` whose `sp.diff` keeps
  the multi-arg form. Prefer elementary operations (the built-in
  `atan2` is fine because `sp.diff(atan2(x, y), x)` simplifies to
  `y / (x² + y²)`).
- **Bilinear outer–dummy products inside `Sum`** — e.g.
  `Sum(Vm[i] * Vm[j] * cos(Va[i] - Va[j]), j)` (polar power flow
  in the full form). None of the J2 patterns express two free
  axes under the same `Sum` in their `Diag` / `Mat_Mul` grammar.
  The diagonal derivative part simplifies cleanly, but the
  bilinear off-diagonal falls to J3.
- **Non-identity-map Sum-KD with Var-dependent scalar factors** —
  e.g. `Sum(Sign(ms[orig[p]]) * KD(k, map[p]) * V[i, p], p)` with
  `map` a non-identity injection. The identity-map variant is
  supported (Pattern 4 mutable); the non-identity case needs an
  inverse-map gather the current scatter-add primitives don't
  express.
- **Piecewise / conditional bodies** — `canonicalize_kronecker`
  unwraps the first non-zero branch heuristically, which produces
  the wrong result for non-trivial conditions. Avoid `sp.Piecewise`
  in LoopEqn bodies.
- **Stencil PDE bodies** (`SolPde`-style multi-arg callables) —
  inherently J3. The sparsity analyser still reports the per-cell
  contribution pattern correctly, so the J3 kernel iterates only
  `O(nnz)` entries per Newton step.

## Body-level limitations

- **Per-pipe 1-D Vars with different lengths** (e.g. pre-refactor
  `Tsp_0`, `Tsp_1`, …) cannot be indexed by a `LoopEqn` dummy:
  `sympy.IndexedBase` can select array *elements* but not symbolic
  *names*. The SolMuseum solution is the "flatten" refactor that
  fuses per-pipe states into a single flat `Tsp_all` Var with
  explicit `(pipe_offset, segment)` addressing.
- **Nested `Sum` with reset** is not supported. A body like
  `Sum(Sum(body, (inner, …)), (outer, …))` where the inner sum
  resets each outer iteration cannot be expressed as a flat
  linear-prelude. Split into two bodies or flatten the sum algebra.

## Debugging a fall-to-J3

Set `SZ_DEBUG_PATTERN4MUT=1` or `SZ_DEBUG_ANALYZER=1` in the
environment when rendering to print every term the analyser / Pattern
4 translator sees and accepts or rejects. The output points at the
exact canonical expression that wasn't classifiable, so you can
decide whether to (a) rewrite the body to avoid the shape,
(b) extend the translator with a new pattern, or (c) accept the
Phase J3 fallback.

## Further reading

- Source of the translator: `Solverz/equation/loop_jac.py`.
- Source of the mutable-matrix analyser: `Solverz/code_printer/python/module/mutable_mat_analyzer.py`.
- Tracking issue for J2 extensions: [#133](https://github.com/smallbunnies/Solverz/issues/133).