Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By jmagly
Corpus-to-dataset pipeline for AI training data curation. Ingests sources, synthesizes examples, generates preference pairs, applies decontamination, and exports to Alpaca/ShareGPT/ChatML/JSONL/Parquet with provenance and reproducibility. Grounded in 485 research REFs covering DPO/KTO/ORPO/SimPO, Self-Instruct/Evol/Orca/Phi/PersonaHub/STaR/ReST, Model Collapse guard, Datasheets/Model Cards/Data Statements, HF Datasets/Arrow+Parquet.
npx claudepluginhub jmagly/aiwg-trainingUses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Computes dataset-level metrics (diversity, difficulty, domain balance, quality grade distribution) and prepares the matric-eval handoff package for model evaluation.
Coordinates dataset versioning, datasheet/model card generation, integrity manifests, and the publication gate including override escalation paths.
Runs exact, fuzzy, and semantic contamination checks against eval-set targets and feeds the publication gate.
Generates SFT training examples from admitted sources using self-instruct, evol-instruct, squad, and STaR patterns with per-example provenance.
Runs mechanical format adapters (alpaca, sharegpt, chatml, jsonl, parquet) with round-trip validation and sidecar metadata.
Acquire a training data source with license validation and delegate ingest to the semantic memory kernel
Generate Datasheet, Model Card, and Data Statement from a dataset manifest
Deterministically rebuild a dataset from its manifest and verify fixity equivalence
Create a versioned training dataset with manifest, fixity, provenance, and archive snapshot
Detect training-eval overlap against benchmark sets before dataset publication
LLM post-training — unified interface for SFT, OSFT, LoRA fine-tuning, and GRPO reinforcement learning
Synthetic data generation — composable blocks and YAML-defined flows for building LLM training datasets
ML engineering plugin: Give your AI coding agent ML engineering superpowers.
Style transfer pipeline for training LLMs to write in specific author styles using SFT with LoRA
Voice profile system for consistent, authentic writing. Apply, create, blend, and analyze voices. Includes 4 built-in profiles: technical-authority, friendly-explainer, executive-brief, and casual-conversational.
Core AIWG utilities for context regeneration, workspace management, development kit, and @-mention traceability. Essential foundation for other AIWG plugins.
Marketing automation framework with 37 specialized agents for campaign management, content strategy, brand compliance, and analytics. Full campaign lifecycle from strategy to measurement.
Writing quality validation and AI pattern detection. Identify AI-generated patterns, enhance authenticity, and enforce writing standards. Includes writing-validator agent and ai-pattern-detection skill.
Complete SDLC framework with 58 specialized agents for software development lifecycle management. Phase-based workflows (Inception→Elaboration→Construction→Transition), security reviews, testing orchestration, and deployment automation.
Train and optimize machine learning models with automated workflows
Design experiments, profile datasets, build models, and audit them for bias before shipping