npx claudepluginhub plurigrid/asi --plugin asiThis skill uses the workspace's default tool permissions.
**Trit**: -1 (MINUS - analysis/verification)
Indexes K-Scale Labs robotics skills for humanoid robot development, RL training, sim-to-real transfer, and deployment. Organizes 9 skills in GF(3) triadic structure.
Guides training RL agents with Stable Baselines3 (PPO, SAC, DQN, TD3, DDPG, A2C), custom Gym environments, callbacks for monitoring, vectorized envs for parallel training, and deep RL workflows.
Routes to MindSpeed-MM skills for Huawei Ascend NPU multimodal training pipelines by model type (VLM/understanding, generative, omni, audio). Provides workflow overviews and model index.
Share bugs, ideas, or general feedback.
Trit: -1 (MINUS - analysis/verification) Color: #DBA51D (Golden Yellow) URI: skill://evla-vla#DBA51D
EdgeVLA is an open-source edge vision-language-action model for robotics. It standardizes diverse robotics datasets from the Open-X Embodiment (OXE) collection for consistent training and deployment.
┌────────────────────────────────────────────────────────────────┐
│ EdgeVLA ARCHITECTURE │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Open-X Embodiment Datasets │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ DROID │ │ Bridge │ │ LIBERO │ │ RT-X │ + 60... │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ └───────┼───────────┼───────────┼───────────┼──────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ OXE_DATASET_CONFIGS Standardization │ │
│ │ • image_obs_keys: primary, secondary, wrist cameras │ │
│ │ • state_encoding: POS_EULER, POS_QUAT, JOINT │ │
│ │ • action_encoding: EEF_POS, JOINT_POS │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Unified Data Format │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ Images: resized, normalized, multi-view │ │ │
│ │ │ States: 8-dim standardized proprioception │ │ │
│ │ │ Actions: 7-dim EEF or joint actions │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ VLA Model │ │
│ │ Vision Encoder → Language Model → Action Decoder │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
from evla.config import OXE_DATASET_CONFIGS, StateEncoding, ActionEncoding
# DROID dataset configuration
droid_config = OXE_DATASET_CONFIGS["droid"]
# {
# "image_obs_keys": {
# "primary": "exterior_image_1_left",
# "secondary": "exterior_image_2_left",
# "wrist": "wrist_image_left",
# },
# "state_encoding": StateEncoding.POS_QUAT,
# "action_encoding": ActionEncoding.EEF_POS,
# }
# Bridge dataset configuration
bridge_config = OXE_DATASET_CONFIGS["bridge"]
# {
# "image_obs_keys": {
# "primary": "image_0",
# "wrist": "image_1",
# },
# "state_encoding": StateEncoding.POS_EULER,
# "action_encoding": ActionEncoding.EEF_POS,
# }
from evla.config import OXE_NAMED_MIXTURES
# Comprehensive multi-dataset training
oxe_magic_soup = OXE_NAMED_MIXTURES["oxe_magic_soup"]
# RT-X reproduction
rtx_mixture = OXE_NAMED_MIXTURES["rtx"]
# Custom mixture with weights
custom_mixture = {
"droid": 1.0,
"bridge": 0.5,
"libero": 0.3,
}
from evla import EdgeVLA, DataLoader
# Load model
model = EdgeVLA.from_pretrained("kscale/evla-base")
# Create dataloader with mixture
loader = DataLoader(
mixture="oxe_magic_soup",
batch_size=32,
image_size=(224, 224),
)
# Training loop
for batch in loader:
images = batch["images"] # (B, V, H, W, C)
states = batch["states"] # (B, 8)
actions = batch["actions"] # (B, 7)
loss = model.train_step(images, states, actions)
# Inference
with torch.no_grad():
image = camera.capture()
state = robot.get_state()
action = model.predict(image, state, "pick up the red block")
robot.execute(action)
This skill participates in balanced triads:
evla-vla (-1) ⊗ kos-firmware (+1) ⊗ mujoco-scenes (0) = 0 ✓
ksim-rl (-1) ⊗ topos-generate (+1) ⊗ evla-vla (-1) = needs balancing
kos-firmware (+1): Robot firmware for deploymentksim-rl (-1): RL training for locomotionkbot-humanoid (-1): K-Bot configurationmujoco-scenes (0): Scene composition@misc{evla2024,
title={EdgeVLA: Open-Source Edge Vision-Language-Action Model},
author={K-Scale Labs},
year={2024},
url={https://github.com/kscalelabs/evla}
}
@article{openvla2024,
title={OpenVLA: An Open-Source Vision-Language-Action Model},
author={Kim, Moo Jin and others},
journal={arXiv:2406.09246},
year={2024}
}