tensormesh.distributed¶

Mesh partitioning and parallel assembly across multiple devices, with integration into torch-sla’s distributed sparse solver. See Distributed FEM for a worked walkthrough.

DistributedMesh¶

class DistributedMesh(mesh: Mesh, num_partitions: int | None = None, method: str = 'coordinate', devices: List[device] | None = None)[source]¶

Bases: object

Partitioned mesh for multi-GPU parallel assembly.

Wraps partition_mesh to split a global mesh into submeshes, each assigned to a separate device. Each submesh stores an orig_nid mapping (local node index → global node index) in point_data.

Parameters:

mesh (Mesh) – Global mesh (typically on CPU).
num_partitions (int, optional) – Number of partitions. Defaults to torch.cuda.device_count(), or 2 if no CUDA devices are available.
method (str, optional) – Partitioning method: 'coordinate' (default, fast RCB), 'spectral', or 'metis'.
devices (list of device, optional) – Devices to assign partitions to. Defaults to cuda:0, cuda:1, ... or cpu if CUDA is unavailable.

Examples

>>> mesh = tm.Mesh.gen_rectangle(chara_length=0.05)
>>> dmesh = DistributedMesh(mesh, num_partitions=4)
>>> print(dmesh.num_partitions, dmesh.n_global_points)

__init__(mesh: Mesh, num_partitions: int | None = None, method: str = 'coordinate', devices: List[device] | None = None)[source]¶

Distributed assembly¶

distributed_element_assemble(assembler_cls: Type[ElementAssembler], dmesh: DistributedMesh, quadrature_order: int = 2, project: str = 'reduce', call_kwargs: dict | None = None, **assembler_kwargs) → DSparseTensor[source]¶

Assemble element matrix in parallel across multiple devices.

Assemblers are created sequentially (for CUDA thread-safety), then assembly computation runs in parallel threads on separate GPUs.

Parameters:

assembler_cls (Type[ElementAssembler]) – The assembler class (e.g., LaplaceElementAssembler).
dmesh (DistributedMesh) – Partitioned mesh with device assignments.
quadrature_order (int, optional) – Quadrature order for integration. Default: 2.
project (str, optional) – Projection method: 'reduce' or 'sparse'. Default: 'reduce'.
call_kwargs (dict, optional) – Extra keyword arguments passed to assembler.__call__() (e.g., point_data, scalar_data).
**assembler_kwargs – Extra keyword arguments passed to assembler_cls.from_mesh().

Returns:

Distributed sparse matrix ready for distributed solve.

Return type:

DSparseTensor

distributed_element_assemble_to_sparse(assembler_cls: Type[ElementAssembler], dmesh: DistributedMesh, quadrature_order: int = 2, project: str = 'reduce', call_kwargs: dict | None = None, **assembler_kwargs) → SparseMatrix[source]¶

Assemble element matrix in parallel, returning a global SparseMatrix.

Same as distributed_element_assemble() but returns a standard SparseMatrix instead of torch-sla’s DSparseTensor.

distributed_node_assemble(assembler_cls: Type[NodeAssembler], dmesh: DistributedMesh, quadrature_order: int = 2, project: str = 'reduce', point_data: Dict[str, Tensor] | None = None, call_kwargs: dict | None = None, **assembler_kwargs) → Tensor[source]¶: Assemble node vector (RHS) in parallel across multiple devices.