Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Generate synthetic datasets for LLM training using sdg_hub's composable blocks and YAML-defined flows, with support for 100+ LLM providers and custom scripts.
npx claudepluginhub red-hat-ai-innovation-team/sdg_hub --plugin sdg-hubUse when the user wants to run synthetic data generation via scripts — detect environment, execute a flow, and present results. For detailed guidance on approaches, blocks, flow authoring, and troubleshooting, consult the synthetic-data-generation skill.
Use when the user wants to list, search, or inspect available SDG flows and data generation pipelines. Applies to browsing flow catalogs, finding flows by use case, or understanding what a specific flow does.
Use when the user wants to set up synthetic data generation for the first time, or when sdg_hub is not yet installed/configured in the current environment.
Generate synthetic data using sdg_hub with composable blocks and YAML flows. Use when the user wants to create training datasets, generate QA pairs, run data generation pipelines, build custom flows, produce synthetic data from documents, use agent frameworks for data generation, or distill MCP tool-use traces. Supports pre-built flows, custom Python scripts, and YAML flow authoring with 20+ blocks, agent connectors (Langflow, LangGraph), MCP tool-use, and 100+ LLM providers via LiteLLM.
Composable blocks and flows for synthetic data generation
SDG Hub is a Python framework for building synthetic data generation pipelines. Chain LLM, parsing, transform, filtering, and agent blocks into YAML-defined flows -- then generate training data at scale.
pip install sdg-hub
from sdg_hub import FlowRegistry, Flow
# Discover and load a built-in flow
FlowRegistry.discover_flows()
flow = Flow.from_yaml(FlowRegistry.get_flow_path("MCP Server Distillation"))
# Configure and run
flow.set_model_config(model="openai/gpt-4o")
result = flow.generate(dataset)
See the Quick Start for a full walkthrough, or browse all built-in flows.
Full documentation at ai-innovation.team/sdg_hub
SDG Hub is available as a plugin for two coding agents, bringing synthetic data generation directly into your coding workflow.
Via org marketplace (recommended — includes all Red Hat AI plugins):
/plugin marketplace add Red-Hat-AI-Innovation-Team/plugins
/plugin install sdg-hub@Red-Hat-AI-Innovation-Team/plugins
Via this repo directly:
/plugin marketplace add Red-Hat-AI-Innovation-Team/sdg_hub
/plugin install sdg-hub@Red-Hat-AI-Innovation-Team/sdg_hub
From a local clone:
git clone https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub.git
/plugin marketplace add /path/to/sdg_hub
codex plugin marketplace add Red-Hat-AI-Innovation-Team/plugins
Then install the plugin from the marketplace. See .codex-plugin/INSTALL.md for manual installation.
Invoke the setup-guide skill to configure your LLM provider and model.
| Skill | Description |
|---|---|
setup-guide | Guided first-time configuration |
data-generation | Run synthetic data generation using a flow |
flow-browser | Browse and inspect available flows |
Apache License 2.0 -- see LICENSE.
Built by the Red Hat AI Innovation Team
Share bugs, ideas, or general feedback.
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Generate realistic test data including users, products, orders, and custom schemas for comprehensive testing
DevsForge mock data generator with Faker.js integration, realistic test data, custom generators, and fixture creation
Agents for data engineering, machine learning, and AI development
Data engineering, ML, and AI specialists - data pipelines, machine learning, LLM architecture
🔧 Data Engineer — Data Pipeline Engineer + Data Infrastructure Specialist
Build Retrieval-Augmented Generation pipelines
LLM post-training — unified interface for SFT, OSFT, LoRA fine-tuning, and GRPO reinforcement learning
Inference-time scaling for LLMs — generate multiple candidates and select the best using voting, scoring, or search
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim