From lllllllama-ai-paper-reproduction-skill
Orchestrates end-to-end candidate-only AI research exploration atop current_research, with auditable repo understanding, idea gating, bounded code adaptation, and governed experiments to explore_outputs/.
npx claudepluginhub lllllllama/ai-research-workflow-skillsThis skill uses the workspace's default tool permissions.
- Accept either a legacy `variant_spec` flow or a higher-level `research_campaign` flow.
agents/openai.yamlreferences/ai-research-explore-policy.mdreferences/idea-evaluation-framework.mdreferences/research-campaign-spec.mdreferences/smoke-validation-policy.mdreferences/source-mapping-policy.mdreferences/sources-naming-policy.mdscripts/lookup/__init__.pyscripts/lookup/cache_store.pyscripts/lookup/inventory_writer.pyscripts/lookup/normalizers.pyscripts/lookup/providers/__init__.pyscripts/lookup/providers/arxiv_provider.pyscripts/lookup/providers/base.pyscripts/lookup/providers/doi_provider.pyscripts/lookup/providers/github_provider.pyscripts/lookup/providers/optional_provider.pyscripts/lookup/providers/url_provider.pyscripts/lookup/record_schema.pyscripts/lookup/repo_extractors.pyCurates autoresearch patterns for autonomous loops: LLMs propose code/ML changes, measure metrics, keep improvements or revert. Includes Python code and Claude skill setup.
Orchestrates autonomous experiments to optimize measurable metrics like build time, latency, accuracy, or configs via git branches and .lab/ logging.
Executes authorized exploratory deep learning experiments like small-subset validations, batch sweeps, idle-GPU searches, and transfer-learning trials in research repos, ranking outputs in explore_outputs/.
Share bugs, ideas, or general feedback.
variant_spec flow or a higher-level research_campaign flow.current_research in a durable form such as a branch, commit, checkpoint, run record, or already-trained local model state.analyze-project to produce analysis_outputs/RESEARCH_MAP.md, CHANGE_MAP.md, and EVAL_CONTRACT.md.sources/; the built-in provider set is intentionally small and should be treated as bounded source resolution, not open-ended literature search.candidate_ideas, then optionally expand the search space with a bounded idea-seed generation pass that writes analysis_outputs/IDEA_SEEDS.json.env-and-assets-bootstrap only when the environment or assets tied to current_research are still unclear.explore-code for bounded exploratory code adaptation.analysis_outputs/ATOMIC_IDEA_MAP.md and analysis_outputs/ATOMIC_IDEA_MAP.json; if no implementable atomic units can be derived, stop for a checkpoint instead of pretending the idea is ready.analysis_outputs/IMPLEMENTATION_FIDELITY.md and analysis_outputs/IMPLEMENTATION_FIDELITY.json; distinguish directly_verified, heuristic, and not_checked.explore-run for short-cycle trials, sweeps, and pre-execution candidate ranking.minimal-run-and-audit or run-train only when the exploratory plan needs real command execution.experiment_manifest before wider execution and keep supporting changes mechanical and reversible.explore_outputs/; never present the result as trusted reproduction success or a verified novelty/SOTA claim.cost, success_rate, and expected_gain.selection_weights in the variant spec when the researcher wants to rebalance those three factors.max_variants and max_short_cycle_runs.status first, then primary_metric and metric_goal when provided.single_variable_fit, interface_fit, patch_surface, dependency_drag, eval_risk, and short-run feasibility before soft ranking.novelty_estimate, groundedness, source_support_strength, interface_fit, patch_surface, dependency_drag, ablation_clarity, and implementation_story_clarity, together with the existing upside/risk fields.research_campaign is the preferred input for the third scenario.current_researchtask_familydatasetbenchmarkevaluation_sourcesota_referencecandidate_ideascompute_budgetresearch_lookupidea_policyidea_generationsource_constraintsfeasibility_policybaseline_gateexecution_policyvariant_specevaluation_source and sota_reference as frozen inputs for this campaign; do not claim they are globally complete.current_research to anchor the exploratory context.variant_axes, subset_sizes, and short_run_steps to describe the candidate matrix.selection_weights to tune the pre-execution balance between cost, success_rate, and expected_gain.primary_metric and metric_goal to control post-execution candidate ranking.current_research.explore-code.explore-run.external_provider is strongest, parsed_locator and repo_local_extracted are weaker support, and seed_only must not be treated as strong research evidence.Use references/ai-research-explore-policy.md, references/research-campaign-spec.md, ../../references/explore-variant-spec.md, scripts/orchestrate_explore.py, and scripts/write_outputs.py.