Benchmark ========= This page compares TensorMesh against several established FEM frameworks on two **forward** problems (3D Poisson and 3D linear elasticity) and one **inverse** problem (compliance-minimization topology optimization). All benchmark scripts and reference data live in a separate repository: `camlab-ethz/tensormesh-bench `_. Test environment ---------------- .. list-table:: :widths: 25 75 * - **CPU** - 8 cores from a dual-socket AMD EPYC 9005-series (Zen 5 "Turin") node * - **GPU** - 1× NVIDIA RTX PRO 6000 Blackwell Server Edition * - **Memory** - 64 GB DDR5 ECC (8 cores × 8 GB) * - **OS** - Ubuntu 22.04 LTS (x86_64) Compared frameworks ------------------- .. list-table:: :header-rows: 1 :widths: 24 16 12 12 36 * - Framework - Backend - CPU - CUDA - Notes * - `FEniCS (DOLFINx) `_ - C++/Python - ✅ - ❌ - 8-rank MPI * - `Firedrake `_ - Python/PETSc - ✅ - ❌ - 8-rank MPI * - `MFEM `_ - C++ - ✅ - ❌ - 8-rank MPI * - `scikit-fem `_ - Python - ✅ - ❌ - Lightweight, single process * - `JAX-FEM `_ - JAX - ✅ - ✅ - Differentiable * - `torch-fem `_ - PyTorch - ✅ - ✅ - Differentiable * - **TensorMesh** - PyTorch - ✅ - ✅ - Differentiable Solver configuration -------------------- To keep the comparison controlled, every framework uses the same iterative linear solver and stopping criteria: .. list-table:: :header-rows: 1 :widths: 35 65 * - Parameter - Value * - Iterative method - BiCGSTAB * - Preconditioner - Jacobi (diagonal scaling) * - Relative tolerance - :math:`10^{-10}` * - Absolute tolerance - :math:`10^{-10}` * - Maximum iterations - 10,000 .. note:: The only exception is the CUDA path of **torch-fem**, which uses conjugate gradient (CG) instead of BiCGSTAB — its CUDA backend does not ship a BiCGSTAB implementation. Forward problems ---------------- 3D Poisson equation ~~~~~~~~~~~~~~~~~~~ On the unit cube :math:`\Omega = [0,1]^3` we solve .. math:: \begin{cases} -\Delta u(\mathbf{x}) = f(\mathbf{x}) & \text{in } \Omega, \\ u(\mathbf{x}) = 0 & \text{on } \partial\Omega, \end{cases} with constant source :math:`f = 1` and homogeneous Dirichlet boundary conditions on every face. **Solve time and residual convergence.** Wall-clock time is plotted against problem size; residual histories show that every framework reaches the same target tolerance. .. list-table:: :widths: 50 50 * - **Solve time** - **Residual convergence** * - .. image:: ../_static/benchmark/all_frameworks_time_combined_poisson_3d.png :width: 100% :alt: 3D Poisson solve time across frameworks - .. image:: ../_static/benchmark/all_frameworks_residual_combined_poisson_3d.png :width: 100% :alt: 3D Poisson residual convergence across frameworks **Solution visualizations.** Spot-check that each framework produces the same field — useful as a sanity check before reading speed numbers. .. list-table:: :widths: 25 25 25 25 * - **FEniCS (DOLFINx)** - **JAX-FEM** - **scikit-fem** - **TensorMesh** * - .. image:: ../_static/benchmark/vis/fenics_poisson_3d_solution.png :width: 100% :alt: FEniCS Poisson 3D solution - .. image:: ../_static/benchmark/vis/jaxfem_poisson_3d_solution.png :width: 100% :alt: JAX-FEM Poisson 3D solution - .. image:: ../_static/benchmark/vis/skfem_poisson_3d_solution.png :width: 100% :alt: scikit-fem Poisson 3D solution - .. image:: ../_static/benchmark/vis/tensormesh_poisson_3d_solution.png :width: 100% :alt: TensorMesh Poisson 3D solution 3D linear elasticity ~~~~~~~~~~~~~~~~~~~~ We solve the static equilibrium of an isotropic linear elastic body: .. math:: \begin{cases} -\nabla \cdot \boldsymbol{\sigma}(\mathbf{x}) = \mathbf{f}(\mathbf{x}) & \text{in } \Omega, \\ \mathbf{u}(\mathbf{x}) = \mathbf{0} & \text{on } \partial\Omega, \end{cases} where :math:`\mathbf{u} : \Omega \to \mathbb{R}^3` is the displacement and the Cauchy stress satisfies Hooke's law .. math:: \boldsymbol{\sigma} = \lambda \, \mathrm{tr}(\boldsymbol{\varepsilon}) \, \mathbf{I} + 2\mu \, \boldsymbol{\varepsilon}, \qquad \boldsymbol{\varepsilon} = \tfrac{1}{2}\bigl(\nabla \mathbf{u} + (\nabla \mathbf{u})^\top\bigr). We set Young's modulus :math:`E = 1` and Poisson's ratio :math:`\nu = 0.3`, giving .. math:: \lambda = \frac{E\nu}{(1+\nu)(1-2\nu)}, \qquad \mu = \frac{E}{2(1+\nu)}. A constant body force :math:`\mathbf{f} = (1,1,1)^\top` is applied. To introduce geometric complexity we use a hollow cube domain .. math:: \Omega = [0,1]^3 \setminus (0.25,\, 0.75)^3. **Solve time and residual convergence.** .. list-table:: :widths: 50 50 * - **Solve time** - **Residual convergence** * - .. image:: ../_static/benchmark/all_frameworks_time_combined_elasticity_3d.png :width: 100% :alt: 3D elasticity solve time across frameworks - .. image:: ../_static/benchmark/all_frameworks_residual_combined_elasticity_3d.png :width: 100% :alt: 3D elasticity residual convergence across frameworks **Solution visualizations.** .. list-table:: :widths: 25 25 25 25 * - **FEniCS (DOLFINx)** - **JAX-FEM** - **scikit-fem** - **TensorMesh** * - .. image:: ../_static/benchmark/vis/fenics_elasticity_3d_solution.png :width: 100% :alt: FEniCS elasticity 3D solution - .. image:: ../_static/benchmark/vis/jaxfem_elasticity_3d_solution.png :width: 100% :alt: JAX-FEM elasticity 3D solution - .. image:: ../_static/benchmark/vis/skfem_elasticity_3d_solution.png :width: 100% :alt: scikit-fem elasticity 3D solution - .. image:: ../_static/benchmark/vis/tensormesh_elasticity_3d_solution.png :width: 100% :alt: TensorMesh elasticity 3D solution Inverse problem: topology optimization -------------------------------------- The inverse-problem benchmark exercises the **differentiability** of TensorMesh on a classical topology-optimization task: compliance minimization of a 2D cantilever beam using the Solid Isotropic Material with Penalization (SIMP) method. We compare against JAX-FEM, the only other framework in the list above that supports end-to-end automatic differentiation through the FEM pipeline. Problem setup ~~~~~~~~~~~~~ The computational domain is a rectangle :math:`\Omega = [0, L_x] \times [0, L_y]` with :math:`L_x = 60` and :math:`L_y = 30` (dimensionless), discretized into :math:`60 \times 30` bilinear quadrilateral (QUAD4) elements — 1,891 nodes and 1,800 elements. Homogeneous Dirichlet conditions are imposed on the left edge, .. math:: \mathbf{u} = \mathbf{0} \quad \text{on } \Gamma_D = \{(x,y) : x = 0\}, and a distributed traction is applied to the lower portion of the right boundary, .. math:: \mathbf{t} = (0,\, -100)^\top \;\text{N/m} \quad \text{on } \Gamma_N = \{(x,y) : x = L_x,\; 0 \leq y \leq 0.1\, L_y\}. .. image:: ../_static/inverse/boundary_conditions.png :width: 60% :align: center :alt: Cantilever boundary conditions The element stiffness is parameterized by the SIMP interpolation .. math:: E(\rho) = E_{\min} + \rho^p \,(E_{\max} - E_{\min}), where :math:`\rho \in [\rho_{\min}, 1]` is the element-wise density (the design variable), :math:`p = 3` penalizes intermediate densities, :math:`E_{\max} = 70{,}000` MPa is the solid stiffness, and :math:`E_{\min} = 70` MPa keeps the stiffness matrix non-singular. The Poisson's ratio is :math:`\nu = 0.3`. The optimization minimizes structural compliance subject to a volume constraint: .. math:: \begin{aligned} \min_{\boldsymbol{\rho}} \quad & C(\boldsymbol{\rho}) = \mathbf{u}^\top \mathbf{K}(\boldsymbol{\rho}) \mathbf{u} = \mathbf{F}^\top \mathbf{u} \\ \text{s.t.} \quad & \frac{1}{|\Omega|} \int_{\Omega} \rho \, d\Omega \leq \bar{v} \\ & \mathbf{K}(\boldsymbol{\rho}) \mathbf{u} = \mathbf{F} \\ & \rho_{\min} \leq \rho_e \leq 1, \quad \forall e \end{aligned} with target volume fraction :math:`\bar{v} = 0.5` and :math:`\rho_{\min} = 10^{-3}`. We use the Method of Moving Asymptotes (MMA) with move limit :math:`\Delta\rho_{\max} = 0.1` and a sensitivity filter of radius :math:`r_{\min} = 1.5\,h` (where :math:`h` is the element size) to suppress checkerboard patterns. The optimization runs for 51 iterations. Sensitivity via automatic differentiation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The classical adjoint-method sensitivity for SIMP has the closed form .. math:: \frac{\partial C}{\partial \rho_e} = -p\, \rho_e^{p-1} (E_{\max} - E_{\min})\, \mathbf{u}_e^\top \mathbf{K}_0^e \mathbf{u}_e, where :math:`\mathbf{K}_0^e` is the element stiffness at unit Young's modulus. In TensorMesh this gradient is **not implemented manually** — it is obtained by backpropagating through the differentiable assembly and sparse solve via PyTorch's reverse-mode autograd. The closed-form expression above is reproduced only for reference and as a consistency check against the autograd output. Final design comparison ~~~~~~~~~~~~~~~~~~~~~~~ Both frameworks converge to the same canonical truss-like topology, and TensorMesh runs noticeably faster end-to-end thanks to GPU-resident assembly and solve. .. image:: ../_static/inverse/final_comparison.png :width: 100% :align: center :alt: Final compliance and topology comparison Optimization animations ~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 50 50 * - **JAX-FEM** - **TensorMesh** * - .. image:: ../_static/inverse/jaxfem_optimization.gif :width: 100% :alt: JAX-FEM topology-optimization animation - .. image:: ../_static/inverse/tensormesh_optimization.gif :width: 100% :alt: TensorMesh topology-optimization animation