Sparse Solvers

After assembly and boundary-condition condensation you have a sparse system — linear or nonlinear — to solve. TensorMesh delegates this work to torch-sla, a standalone differentiable sparse-linear-algebra library. The same FEM code retargets between CPU and GPU, and between iterative and direct solvers, by changing one keyword argument; nonlinear systems are driven by the same backend through a Newton / Picard / Anderson interface.

Install torch-sla

Important

torch-sla is a hard, import-time dependency of tensormesh.sparse. The module will not import without it. It is also the library where all current and future solver work lands — TensorMesh tracks torch-sla releases as the canonical sparse-linear-algebra layer and we recommend keeping it up to date.

The base wheel ships the CPU stack (SciPy / native PyTorch). GPU backends are opt-in extras — you can install one or both, depending on which CUDA solver you want:

pip install torch-sla              # CPU stack only
pip install "torch-sla[cupy]"      # + CuPy backend (iterative + SuperLU)
pip install "torch-sla[cudss]"     # + cuDSS backend (fastest GPU direct)
pip install "torch-sla[all]"       # both GPU backends + dev tooling

TensorMesh also exposes the GPU extras at its own install layer (pip install "tensormesh-fem[cupy]" etc.) — see Installation.

Inspect available backends

After installing, torch_sla.show_backends() prints a status table showing which backends are usable on the current machine and how to install any that are missing:

>>> import torch_sla
>>> torch_sla.show_backends()
torch-sla backend status (CUDA: available)
  scipy    [CPU]      available
  eigen    [CPU]      not available — JIT-compiled C++ extension (requires a C++ compiler)
  pytorch  [CPU/CUDA] available
  cupy     [CUDA]     not available — pip install torch-sla[cupy]
  cudss    [CUDA]     not available — pip install torch-sla[cudss]

What torch-sla gives TensorMesh:

  • SparseTensor — the COO data type that SparseMatrix extends. Element assembly hands you a SparseMatrix directly, so every operator method on the parent class (@, .solve, .nonlinear_solve, .is_symmetric, …) is available without any extra wrapping.

  • A unified solve op with a custom backward pass (an adjoint sparse solve), so gradients flow through every linear system.

  • A pluggable backend layer covering CPU, GPU, iterative and direct solvers behind a single dispatch entry point.

Solving A x = b: the .solve() method

The canonical entry point is SparseMatrix.solve — inherited unchanged from torch_sla.SparseTensor. Assembly already returns a SparseMatrix, so this is the only call you normally need:

K = LaplaceElementAssembler.from_mesh(mesh)()
x = K.solve(b)                              # auto backend + method
x = K.solve(b, backend="cudss")             # force NVIDIA GPU direct
x = K.solve(b, method="bicgstab")           # force a specific iterative method

Symmetry / positive-definiteness is auto-detected. On every call, torch-sla computes is_symmetric() and is_positive_definite() on the matrix, then picks CG / Cholesky for SPD systems and BiCGStab / LU otherwise. You do not pass an is_spd hint; passing one would be ignored.

Key keyword arguments (full list in the torch-sla reference):

  • backend: "auto" (CPU → SciPy, CUDA → cuDSS when available, else CuPy / native PyTorch), or one of "scipy", "eigen", "pytorch", "cupy", "cudss".

  • method: "auto" (chosen from matrix properties + backend), or an iterative method ("cg", "bicgstab", "minres", "gmres", "lgmres") or a direct factorization ("lu", "umfpack", "cholesky", "ldlt"). Not every backend supports every method — see the table below.

  • atol (default 1e-10), tol (default 1e-12), maxiter (default 10000) for iterative convergence.

  • verbose=True prints a one-line summary of the auto-selected backend / method and the detected matrix properties — handy when you want to confirm that "auto" picked the path you expected:

    >>> x = K.solve(b, verbose=True)
    [torch-sla] solve: n=1024, nnz=7168, dtype=float64, device=cpu, symmetric=True, spd=True, backend=scipy, method=lu
    

Supported backends

Backend

String

Device

Direct / Iter

Methods

SciPy

"scipy"

CPU

both

lu, umfpack, cg, bicgstab, gmres, lgmres, minres, qmr. Default on CPU.

Eigen

"eigen"

CPU

iterative

cg, bicgstab. C++ Eigen via pybind11.

Native PyTorch

"pytorch"

CPU / GPU

iterative

cg, bicgstab with Jacobi preconditioning. Pure torch — fully autograd-traced.

CuPy

"cupy"

GPU

both

lu, cg, cgs, gmres, minres, lsqr, lsmr.

cuDSS

"cudss"

GPU

direct

lu, cholesky, ldlt. NVIDIA cuDSS — fastest GPU path; default on CUDA when memory allows.

Batched right-hand sides

When b has shape [n_dof, n_batch], torch-sla automatically amortizes the factorization across the batch — one factor pass, n_batch back-substitutions — instead of running an iterative solver independently per column. This is the workhorse of the tensormesh.dataset ML workflow.

B = torch.randn(K.shape[0], 64)             # 64 right-hand sides
X = K.solve(B)                              # one factorization

See Batched Workflows for the broader story on batching.

Nonlinear systems: .nonlinear_solve()

For problems where the residual F(u, params) = 0 is nonlinear in u — hyperelasticity, plasticity, phase-field — use SparseMatrix.nonlinear_solve (inherited from torch_sla.SparseTensor). The matrix is passed automatically into the residual closure, the Jacobian is obtained through autograd, and the backward pass uses the adjoint method.

from torch_sla import SparseTensor  # SparseMatrix also works

# Nonlinear PDE: A @ u + u**2 = f
def residual(u, A, f):
    return A @ u + u**2 - f

f  = torch.randn(n, requires_grad=True)
u0 = torch.zeros(n)

u = A.nonlinear_solve(residual, u0, f, method='newton')

# Gradients flow via the implicit-function theorem
loss = u.sum()
loss.backward()
print(f.grad)        # dL/df

Key keyword arguments:

  • method: "newton" (default, Newton-Raphson with Armijo line search), "picard" (fixed-point), or "anderson" (Anderson acceleration).

  • line_search (default True) — toggle Armijo backtracking.

  • tol / atol / max_iter for the outer nonlinear loop.

  • linear_solver / linear_method to pick the backend used for each inner linear solve.

The forward pass runs the chosen nonlinear iteration; the backward pass solves a single adjoint system, so gradients of u w.r.t. any requires_grad parameter (including A.values) cost roughly one extra linear solve regardless of how many nonlinear iterations the forward took.

Lower-level entry points (legacy)

Deprecated since version 0.x: tensormesh.sparse.spsolve() and tensormesh.sparse.nonlinear_solve() are TensorMesh-internal fallbacks that pre-date the torch-sla integration. They are scheduled for removal; the canonical solver path is the SparseMatrix (i.e. torch_sla.SparseTensor) methods above.

tensormesh.sparse.spsolve() is a free function that operates on raw COO arrays (edata, row, col, shape, b) instead of a SparseMatrix object. In practice this is rarely useful — assembly produces a SparseMatrix, condensation produces a SparseMatrix, and K.solve(b) covers every FEM workflow in this guide. tensormesh.sparse.spsolve will be retired once the remaining call sites have moved over to SparseMatrix.solve / torch_sla.spsolve.

tensormesh.sparse.nonlinear_solve() is the in-tree Newton-Raphson helper. It requires the user to supply an explicit Jacobian closure and lacks line search, Picard / Anderson modes, and the autograd-based Jacobian assembly that torch_sla.SparseTensor.nonlinear_solve provides. Migrate to A.nonlinear_solve(residual, u0, *params).

What’s next

  • Time Integration — wrap a linear solve in a time-stepping loop for transient problems.

  • Batched Workflows — three different axes of batching, and when each one applies.

  • Differentiability — how the adjoint backward works and how to wire a learnable parameter through a sparse solve.