Help us improve
Share bugs, ideas, or general feedback.
From training
Acquire a training data source with license validation and delegate ingest to the semantic memory kernel
npx claudepluginhub jmagly/aiwg-trainingHow this skill is triggered — by the user, by Claude, or both
Slash command
/training:acquire-training-sourceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Acquire a training data source — filesystem directory, URL, git repo, or existing AIWG research REF — with license validation and format detection. Delegates ingest mechanics to the semantic memory kernel.
Searches Kaggle, Hugging Face, and GitHub for raw datasets to convert into seed data when a user has a domain but no documents.
Detects training pipelines that ingest external data without integrity gating. Use when auditing dataset ingestion, fine-tuning scripts, or web-scraped data curation.
Automatically triggers GRADE quality assessments for new research sources (REF-*.md, PDFs) and findings added to .aiwg/research corpus directories.
Share bugs, ideas, or general feedback.
Acquire a training data source — filesystem directory, URL, git repo, or existing AIWG research REF — with license validation and format detection. Delegates ingest mechanics to the semantic memory kernel.
<source> (required)Source location. Supported forms:
file:/path/to/dir — local filesystem directoryfile:/path/to/file.{md,txt,json,jsonl,pdf} — single filehttps://... — URL (document, tarball, API endpoint)git:<repo-url> — git repository (shallow clone)ref:REF-XXX — existing AIWG research REF (reuse corpus)--license <spdx> (optional)SPDX license identifier for the source. Required unless --allow-unlicensed is set. See https://spdx.org/licenses/ for valid identifiers.
--allow-unlicensed (optional)Override the license-required gate. Emits a warning and tags the source license: unknown in metadata. Examples derived from unlicensed sources inherit unknown and will be blocked by license-check lint at publication time.
--format <type> (optional)Hint the expected format: code, docs, papers, dialogues, mixed. Used by downstream synthesis skills. Auto-detected if omitted.
ref:REF-XXX, look up the REF in .aiwg/research/ and reuse its files as the source.--license is provided or --allow-unlicensed is set. For ref: sources, inherit the REF's declared license.--format is not given, scan file extensions + sampled content to classify (code / docs / papers / dialogues / mixed)..aiwg/training/raw/<source-id>/.memory-ingest --consumer training-complete --source .aiwg/training/raw/<source-id>/. Kernel handles:
derivedPages.rawExamplessource.yaml in the raw dir capturing:
source_id, source_type, acquired_at, acquired_bylicense (SPDX), license_source (declared vs inherited)format_detectedfile_count, total_bytes, sha256_manifest_refprovenance-create skill.memory-log-append with op ingest (inherited from kernel) plus source-level summary.The kernel handles ingest mechanics. This skill retains training-specific layers on top:
ref:REF-XXX sources to pull from existing research corpus--license or --allow-unlicensed# Acquire a GitHub repo as training source
acquire-training-source git:https://github.com/rust-lang/rust --license "Apache-2.0 OR MIT" --format code
# Reuse a research REF as a training source
acquire-training-source ref:REF-375 --format papers
# Acquire a local directory
acquire-training-source file:/home/user/datasets/code-review --license MIT
# Acquire unlicensed source (emits warning)
acquire-training-source https://example.com/dataset.tar.gz --allow-unlicensed
@agentic/code/frameworks/training-complete/schemas/example-record.yaml — target example format@agentic/code/frameworks/sdlc-complete/schemas/research/license-metadata.yaml — SPDX tracking@agentic/code/addons/semantic-memory/skills/memory-ingest/SKILL.md@agentic/code/frameworks/research-complete/skills/research-acquire/SKILL.md@agentic/code/frameworks/sdlc-complete/skills/provenance-create/SKILL.md@agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md