From geopol-sim
Backward-assess a past simulation run for accuracy. Use when the user wants to "grade a forecast", "score this run", "check how accurate the predictions were", "evaluate past predictions", or "see how the panel did". Fetches fresh news grounding for the elapsed horizons, runs an LLM grader against each prediction, and writes per-prediction scores plus aggregate calibration stats.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin geopol-simThis skill is limited to using the following tools:
This is the seed of a self-improving loop. Once a few runs are graded, the data drives prompts like "this model historically over-confidence-biases on 1m horizons — weight accordingly".
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Analyzes multiple pages for keyword overlap, SEO cannibalization risks, and content duplication. Suggests differentiation, consolidation, and resolution strategies when reviewing similar content.
Share bugs, ideas, or general feedback.
This is the seed of a self-improving loop. Once a few runs are graded, the data drives prompts like "this model historically over-confidence-biases on 1m horizons — weight accordingly".
reports/<timestamp>/ dir. Defaults to the runstore's "oldest run with elapsed horizons but no grading.json" if the runstore is configured.meta.json for the run timestamp and compute which horizons (24h, 1w, 1m, etc.) have passed. If none have, stop and tell the user when the soonest horizon will be due.synthesis.json for Council, equivalent for Forecaster).hit / partial / miss / unverifiable, plus a one-sentence justification quoting at least one grounding source.env). Request structured JSON output with the per-prediction grade + justification.grading.json next to the run dir. Schema:
{
"graded_at": "<UTC timestamp>",
"grader_model": "<model id>",
"horizons_graded": ["24h", "1w"],
"predictions": [
{
"prediction_id": "<from synthesis>",
"model": "<which council member made it>",
"horizon": "1w",
"grade": "partial",
"justification": "...",
"evidence_url": "..."
}
]
}
grading-report.md:
<runstore_root>/grading-aggregate.jsonl file so per-model and calibration trends across all graded runs can be tracked over time.unverifiable is a valid grade and should be preferred over guessing — predictions about private deliberations or unobservable events are common in geopolitics.grading.json and grading-report.md are written alongside, never replacing.