验证安装¶

一个自包含的冒烟测试，用于确认 TensorMesh、torch-sla 以及你的 PyTorch 版本都已正确接入。该脚本会：

打印核心版本信息（TensorMesh、PyTorch、torch-sla、CUDA），
在 CPU 上求解一个微型泊松问题（始终执行），
如果 CUDA 可用，则在 GPU 上重复该求解，以及
报告你的机器上可用的 torch-sla 稀疏求解器后端。

该检查器随包一起分发，因此安装后只需运行python -m tensormesh.verify_install即可——无需保存任何文件。总运行时间为几秒钟；完整源码在下方列出，供参考。

脚本¶

"""Verify a TensorMesh install: print the core versions, solve a tiny
Poisson problem on CPU (and on GPU if available), and report the
torch-sla sparse-solver backends available on this machine."""

import math
import time

import torch
import torch_sla
import tensormesh


def solve_poisson(device):
    from tensormesh import ElementAssembler, NodeAssembler, Mesh, Condenser

    mesh = Mesh.gen_rectangle(chara_length=0.1).to(device)

    class Laplace(ElementAssembler):
        def forward(self, gradu, gradv):
            return gradu @ gradv

    class Source(NodeAssembler):
        def forward(self, v, f):
            return f * v

    x, y = mesh.points[:, 0], mesh.points[:, 1]
    f_vals = 2 * math.pi**2 * torch.sin(math.pi * x) * torch.sin(math.pi * y)

    K = Laplace.from_mesh(mesh)()
    b = Source.from_mesh(mesh)(point_data={"f": f_vals})

    cond = Condenser(mesh.boundary_mask)
    K_, b_ = cond(K, b)
    u = cond.recover(K_.solve(b_))

    u_exact = torch.sin(math.pi * x) * torch.sin(math.pi * y)
    return float((u - u_exact).norm() / u_exact.norm())


def main():
    print("TensorMesh smoke test")
    print("=" * 40)
    print(f"tensormesh : {tensormesh.__version__}")
    print(f"torch      : {torch.__version__}")
    print(f"torch-sla  : {torch_sla.__version__}")
    print(f"cuda       : {torch.version.cuda or 'not available'}")
    print()

    t0 = time.perf_counter()
    err = solve_poisson("cpu")
    print(f"[CPU ] Poisson 2D ... OK   L2 error = {err:.3e}   {time.perf_counter() - t0:.2f} s")

    if torch.cuda.is_available():
        t0 = time.perf_counter()
        err = solve_poisson("cuda")
        print(f"[CUDA] Poisson 2D ... OK   L2 error = {err:.3e}   {time.perf_counter() - t0:.2f} s")
    else:
        print("[CUDA] not available, skipping GPU test")

    print()
    # Every sparse-solver backend is provided by torch-sla; let it report them.
    torch_sla.show_backends()

    print()
    print("All required checks passed.")


if __name__ == "__main__":
    main()

预期输出¶

在一台 CUDA 机器上（一台带有 NVIDIA GPU 且 PyTorch 启用了 CUDA 的 Linux 机器），你应当看到与下面相近的输出：

TensorMesh smoke test
========================================
tensormesh : 0.1.0
torch      : 2.10.0+cu128
torch-sla  : 0.2.1
cuda       : 12.8

[CPU ] Poisson 2D ... OK   L2 error = 1.185e-02   0.05 s
[CUDA] Poisson 2D ... OK   L2 error = 1.185e-02   0.88 s

torch-sla backend status (CUDA: available)
  scipy    [CPU]       available
  eigen    [CPU]       not available — JIT-compiled C++ extension (requires a C++ compiler)
  pytorch  [CPU/CUDA]  available
  cupy     [CUDA]      available
  cudss    [CUDA]      available

All required checks passed.

在仅有 CPU 的机器上（例如 macOS / Linux 笔记本），cuda 这一行会显示 not available，[CUDA] 求解会被跳过，后端表格的标题会显示 (CUDA: not available)，并将 cupy / cudss 列为可供安装的可选 extras。

确切的 L2 误差取决于网格，但应当在 \(10^{-2}\) 量级——若超过百分之几，则表明存在数值问题。

备注

为什么 GPU 运行有时反而比 CPU 运行 *更慢*？ 冒烟测试的网格是有意设计得很小的（约 \(\sim\) 100 个自由度）。在这种规模下，CUDA 求解的耗时主要由一次性开销主导——上下文创建、JIT 内核编译、主机↔设备数据传输，以及 cuSPARSE/cuSOLVER 工作空间分配——而非实际的浮点运算。首次 GPU 调用还要承担 CUDA 驱动初始化的开销。请运行一个真实规模的问题（\(\geq 10^4\)–\(10^5\) 个自由度）才能看到 GPU 反超；基准测试请参阅性能。

备注

``A.solve(b)`` 默认实际使用的是哪个求解器？ SparseMatrix.solve 继承自 torch_sla.SparseTensor，调用时使用 backend="auto"、method="auto"。随后 torch-sla 的自动选择器会根据设备和问题规模进行选择：

CPU → SciPy / SuperLU（backend="scipy"、method="lu"）——一种直接分解法，对于 CPU 所能处理的规模而言既快速又能达到机器精度。
CUDA，自由度 < 2M → 若 cuDSS 可用则使用 cuDSS（对称正定时用 method="cholesky"，否则用 ldlt / lu）；否则回退到 CuPy，最终回退到 PyTorch 原生的迭代式 CG。
CUDA，自由度 ≥ 2M → 采用带 Jacobi 预处理的 PyTorch 原生迭代求解器（backend="pytorch"，对称正定时用 method="cg"，否则用 bicgstab），以控制在 GPU 显存范围之内。

因此上面的冒烟测试在 CPU 上运行 SuperLU、在 GPU 上运行 cuDSS Cholesky（泊松问题是对称正定的），而非迭代求解器。若要显式选择不同的求解器，请向 solve 传入 backend=... / method=...——参见稀疏求解器。

每一个稀疏求解器后端都由 torch-sla 提供——TensorMesh 自身不添加任何后端。被报告为 not available 的后端并不代表出错；它们是可选的 torch-sla extras，你可以在需要时安装：

scipy [CPU]——SciPy / SuperLU 的直接与迭代求解器；默认的 CPU 路径，始终可用。
pytorch [CPU/CUDA]——torch 原生的迭代求解器（CG / BiCGSTAB），始终可用且完全兼容 autograd。
eigen [CPU]——一个 JIT 编译的 C++ 直接求解器；需要机器上安装有 C++ 编译器。
cupy [CUDA] / cudss [CUDA]——GPU 稀疏直接求解器（pip install torch-sla[cupy] / pip install torch-sla[cudss]）。

随时可运行 torch_sla.show_backends() 重新检查状态。关于如何在它们之间进行选择，请参阅稀疏求解器。

故障排查¶

``ModuleNotFoundError: No module named 'tensormesh'``——该包尚未安装在当前激活的 Python 环境中。请重新检查安装。

``ModuleNotFoundError: No module named 'torch_sla'``——请通过 pip install "torch-sla>=0.2.1" 安装它。torch-sla 是一个硬性的、导入时即需满足的依赖：缺少它 tensormesh.sparse 将拒绝导入，并且每一个稀疏求解器后端都是通过它来提供的。

卡在 ``[CPU ] Poisson 2D``——你的 PyTorch 版本可能正在首次使用时下载 gmsh 缓存文件；后续运行会瞬间完成。

``L2 error`` ≫ ``1e-2``——很可能是网格或 PyTorch 版本存在问题。请在 github.com/camlab-ethz/TensorMesh/issues 提交一个 issue，并附上本脚本的完整输出。