Sparse Solvers¶
After assembly and boundary-condition condensation you have a sparse system — linear or nonlinear — to solve. TensorMesh delegates this work to torch-sla, a standalone differentiable sparse-linear-algebra library. The same FEM code retargets between CPU and GPU, and between iterative and direct solvers, by changing one keyword argument; nonlinear systems are driven by the same backend through a Newton / Picard / Anderson interface.
Install torch-sla¶
Important
torch-sla is a hard, import-time dependency of
tensormesh.sparse. The module will not import without it.
It is also the library where all current and future solver work
lands — TensorMesh tracks torch-sla releases as the canonical
sparse-linear-algebra layer and we recommend keeping it up to date.
The base wheel ships the CPU stack (SciPy / native PyTorch). GPU backends are opt-in extras — you can install one or both, depending on which CUDA solver you want:
pip install torch-sla # CPU stack only
pip install "torch-sla[cupy]" # + CuPy backend (iterative + SuperLU)
pip install "torch-sla[cudss]" # + cuDSS backend (fastest GPU direct)
pip install "torch-sla[all]" # both GPU backends + dev tooling
TensorMesh also exposes the GPU extras at its own install layer
(pip install "tensormesh-fem[cupy]" etc.) — see
Installation.
Inspect available backends¶
After installing, torch_sla.show_backends() prints a status table
showing which backends are usable on the current machine and how to
install any that are missing:
>>> import torch_sla
>>> torch_sla.show_backends()
torch-sla backend status (CUDA: available)
scipy [CPU] available
eigen [CPU] not available — JIT-compiled C++ extension (requires a C++ compiler)
pytorch [CPU/CUDA] available
cupy [CUDA] not available — pip install torch-sla[cupy]
cudss [CUDA] not available — pip install torch-sla[cudss]
What torch-sla gives TensorMesh:
SparseTensor— the COO data type thatSparseMatrixextends. Element assembly hands you aSparseMatrixdirectly, so every operator method on the parent class (@,.solve,.nonlinear_solve,.is_symmetric, …) is available without any extra wrapping.A unified
solveop with a custom backward pass (an adjoint sparse solve), so gradients flow through every linear system.A pluggable backend layer covering CPU, GPU, iterative and direct solvers behind a single dispatch entry point.
Solving A x = b: the .solve() method¶
The canonical entry point is SparseMatrix.solve — inherited
unchanged from torch_sla.SparseTensor. Assembly already returns a
SparseMatrix, so this is the only call you normally need:
K = LaplaceElementAssembler.from_mesh(mesh)()
x = K.solve(b) # auto backend + method
x = K.solve(b, backend="cudss") # force NVIDIA GPU direct
x = K.solve(b, method="bicgstab") # force a specific iterative method
Symmetry / positive-definiteness is auto-detected. On every call,
torch-sla computes is_symmetric() and
is_positive_definite() on the matrix, then picks CG / Cholesky
for SPD systems and BiCGStab / LU otherwise. You do not pass an
is_spd hint; passing one would be ignored.
Key keyword arguments (full list in the torch-sla reference):
backend:"auto"(CPU → SciPy, CUDA → cuDSS when available, else CuPy / native PyTorch), or one of"scipy","eigen","pytorch","cupy","cudss".method:"auto"(chosen from matrix properties + backend), or an iterative method ("cg","bicgstab","minres","gmres","lgmres") or a direct factorization ("lu","umfpack","cholesky","ldlt"). Not every backend supports every method — see the table below.atol(default1e-10),tol(default1e-12),maxiter(default10000) for iterative convergence.verbose=Trueprints a one-line summary of the auto-selected backend / method and the detected matrix properties — handy when you want to confirm that"auto"picked the path you expected:>>> x = K.solve(b, verbose=True) [torch-sla] solve: n=1024, nnz=7168, dtype=float64, device=cpu, symmetric=True, spd=True, backend=scipy, method=lu
Supported backends¶
Backend |
String |
Device |
Direct / Iter |
Methods |
|---|---|---|---|---|
SciPy |
|
CPU |
both |
|
Eigen |
|
CPU |
iterative |
|
Native PyTorch |
|
CPU / GPU |
iterative |
|
CuPy |
|
GPU |
both |
|
cuDSS |
|
GPU |
direct |
|
Batched right-hand sides¶
When b has shape [n_dof, n_batch], torch-sla automatically
amortizes the factorization across the batch — one factor pass,
n_batch back-substitutions — instead of running an iterative
solver independently per column. This is the workhorse of the
tensormesh.dataset ML workflow.
B = torch.randn(K.shape[0], 64) # 64 right-hand sides
X = K.solve(B) # one factorization
See Batched Workflows for the broader story on batching.
Nonlinear systems: .nonlinear_solve()¶
For problems where the residual F(u, params) = 0 is nonlinear in
u — hyperelasticity, plasticity, phase-field — use
SparseMatrix.nonlinear_solve (inherited from
torch_sla.SparseTensor). The matrix is passed automatically into
the residual closure, the Jacobian is obtained through autograd, and
the backward pass uses the adjoint method.
from torch_sla import SparseTensor # SparseMatrix also works
# Nonlinear PDE: A @ u + u**2 = f
def residual(u, A, f):
return A @ u + u**2 - f
f = torch.randn(n, requires_grad=True)
u0 = torch.zeros(n)
u = A.nonlinear_solve(residual, u0, f, method='newton')
# Gradients flow via the implicit-function theorem
loss = u.sum()
loss.backward()
print(f.grad) # dL/df
Key keyword arguments:
method:"newton"(default, Newton-Raphson with Armijo line search),"picard"(fixed-point), or"anderson"(Anderson acceleration).line_search(defaultTrue) — toggle Armijo backtracking.tol/atol/max_iterfor the outer nonlinear loop.linear_solver/linear_methodto pick the backend used for each inner linear solve.
The forward pass runs the chosen nonlinear iteration; the backward
pass solves a single adjoint system, so gradients of u w.r.t. any
requires_grad parameter (including A.values) cost roughly one
extra linear solve regardless of how many nonlinear iterations the
forward took.
Lower-level entry points (legacy)¶
Deprecated since version 0.x: tensormesh.sparse.spsolve() and
tensormesh.sparse.nonlinear_solve() are TensorMesh-internal
fallbacks that pre-date the torch-sla integration. They are
scheduled for removal; the canonical solver path is the
SparseMatrix (i.e. torch_sla.SparseTensor) methods
above.
tensormesh.sparse.spsolve() is a free function that operates on
raw COO arrays (edata, row, col, shape, b) instead of a
SparseMatrix object. In practice this is rarely useful —
assembly produces a SparseMatrix, condensation produces a
SparseMatrix, and K.solve(b) covers every FEM workflow
in this guide. tensormesh.sparse.spsolve will be retired once the
remaining call sites have moved over to SparseMatrix.solve /
torch_sla.spsolve.
tensormesh.sparse.nonlinear_solve() is the in-tree Newton-Raphson
helper. It requires the user to supply an explicit Jacobian closure
and lacks line search, Picard / Anderson modes, and the autograd-based
Jacobian assembly that torch_sla.SparseTensor.nonlinear_solve
provides. Migrate to A.nonlinear_solve(residual, u0, *params).
What’s next¶
Time Integration — wrap a linear solve in a time-stepping loop for transient problems.
Batched Workflows — three different axes of batching, and when each one applies.
Differentiability — how the adjoint backward works and how to wire a learnable parameter through a sparse solve.