Skill

primitive-parallel-state

Centralized state tracking the multi-dimensional parallelism topology — DP rank, TP rank, PP rank, EP rank, CP rank — and the communication groups for each. Activate when the user asks "parallel state", "process group", "rank topology", "world setup", or wires up multi-dim parallelism.

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A small singleton-like object that knows, for the current process: its DP/TP/PP/EP/CP rank, world size on each axis, and the corresponding `torch.distributed.ProcessGroup` for collectives. Every other parallelism primitive depends on this.

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars0

Forks0

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Primitive · ParallelState

A small singleton-like object that knows, for the current process: its DP/TP/PP/EP/CP rank, world size on each axis, and the corresponding torch.distributed.ProcessGroup for collectives. Every other parallelism primitive depends on this.

What it does

Initializes process groups for each parallelism axis.
Computes per-rank coordinates: (dp_rank, tp_rank, pp_rank, ep_rank, cp_rank).
Provides accessors: get_tp_group(), get_pp_group(), etc.
Single source of truth — primitives must not call torch.distributed.init_process_group directly.

Interface (V1 stub)

from curry_train.primitives import ParallelState

# At training start, after torchrun has set RANK / LOCAL_RANK / WORLD_SIZE:
ps = ParallelState.init(
    dp_size=2,
    tp_size=2,
    pp_size=2,
    ep_size=1,
    cp_size=1,
)
# 2*2*2 = 8 GPUs total

ps.tp_rank             # 0..tp_size-1
ps.dp_rank
ps.pp_rank
ps.get_tp_group()      # process group for collectives within this TP slice
ps.get_dp_group()
ps.get_pp_group()
ps.is_first_pp_stage()
ps.is_last_pp_stage()

Topology

For a world of size W, the parallelism dims partition it: W = dp × tp × pp × ep × cp. The order matters: typical placement is (dp, ep, pp, tp, cp) from outermost to innermost. ParallelState computes the per-rank coordinates from RANK accordingly.

When called

Once at the start of training, after torchrun initializes the world.
Re-called on resume from a different topology (DCP supports resharding).

Boundaries

Does not handle multi-node networking — that's torchrun / cluster-side concern.
Does not actually move tensors — that's the job of primitive-tp-linear, primitive-pipeline-schedule, etc., using the groups it provides.
Must be initialized before any other parallelism primitive is constructed. Construction order matters.

Implementation status

V1: stub at template/curry_train/primitives/parallel_state.py. Reference: Megatron-LM's core/parallel_state.py is the canonical implementation; HuggingFace Accelerate's state.py is a lighter alternative.

All other parallelism primitives (primitive-tp-linear, primitive-pipeline-schedule, primitive-experts, primitive-context-parallel) consume groups from this primitive.
skills/stage4-parallel-primitive-intro — when to actually wire each axis.
Megatron-LM source: megatron/core/parallel_state.py.

primitive-parallel-state

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

primitive-parallel-state

Tool Access

Preview

SKILL.md

Primitive · ParallelState

What it does

Interface (V1 stub)

Topology

When called

Boundaries

Implementation status

Related

Similar Skills

Help us improve

Primitive · ParallelState

What it does

Interface (V1 stub)

Topology

When called

Boundaries

Implementation status

Related