From replicate
Pushes and publishes AI models to Replicate with cog push or cog-safe-push for CI/CD and safe version releases. Useful when deploying a model, releasing versions, or setting up GitHub Actions for model releases.
How this skill is triggered — by the user, by Claude, or both
Slash command
/replicate:publish-modelsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- Cog reference: <https://cog.run/llms.txt>
cog push reference: https://cog.run/cli#cog-pushbuild-models if you don't yet).cog login against r8.im (or echo $TOKEN | cog login --token-stdin).replicate.com/{owner}/{name} via the API, web UI, or r8-model CLI.REPLICATE_API_TOKEN set in your environment.cog pushThe simplest path. Build and upload a new version:
cog push r8.im/owner/my-model
Or set image: r8.im/owner/my-model in cog.yaml and run a bare:
cog push
Useful flags:
--separate-weights — store weights in a separate layer; faster cold boots and pushes for models with > 1GB of weights.--x-fast — faster pushes during iteration (skips some validation).--secret id=hf,src=$HOME/.hf_token — pass build-time secrets without baking them into image history.cog-safe-push pushes to a private -test model first, checks schema compatibility against the live version, runs prediction comparisons, and fuzzes inputs. Catches breaking changes before they reach users.
Install:
pip install git+https://github.com/replicate/cog-safe-push.git
Required env vars:
REPLICATE_API_TOKENANTHROPIC_API_KEY (Claude judges output similarity for stochastic models)Basic usage:
cog-safe-push --test-hardware=gpu-l40s owner/my-model
This will:
predict.py with ruff.owner/my-model-test if missing.owner/my-model version.owner/my-model.Drop a cog-safe-push.yaml in your project root (or cog-safe-push-configs/<variant>.yaml for multi-model repos). All five test-case checker types in one example:
model: owner/my-model
test_model: owner/my-model-test
test_hardware: gpu-l40s
predict:
compare_outputs: false # set false for stochastic models
predict_timeout: 600
test_cases:
- inputs:
prompt: "a serene mountain landscape"
match_prompt: "a landscape photo of mountains" # AI-judged via Claude
- inputs:
prompt: "a cat"
match_url: "https://example.com/reference-cat.png" # binary/image match
- inputs:
prompt: ""
error_contains: "prompt cannot be empty" # negative test
- inputs:
mode: "json"
jq_query: '.confidence > 0.8 and .status == "success"' # JSON output
- inputs:
prompt: "echo this"
exact_string: "echo this" # exact string match
fuzz:
fixed_inputs:
seed: 42
disabled_inputs:
- debug
iterations: 10
prompt: "Generate creative and diverse prompts"
train: # if your model has a trainer
destination: owner/my-model-trained
destination_hardware: gpu-l40s
train_timeout: 1800
test_cases:
- inputs:
input_images: "https://.../training.zip"
steps: 10
deployment: # auto-create or update on push
name: my-model
owner: owner
hardware: gpu-l40s
parallel: 4
fast_push: false
ignore_schema_compatibility: false
official_model: owner/my-model # for proxy/wrapper models, see below
Test case checkers are mutually exclusive: pick exactly one of match_prompt, match_url, error_contains, jq_query, or exact_string per case. Use compare_outputs: false for any stochastic model (diffusion, LLMs); the default true is brittle.
Two paths, depending on how much glue you want.
# .github/workflows/push.yaml
name: Push to Replicate
on:
workflow_dispatch:
inputs:
no_push:
type: boolean
default: false
jobs:
push:
runs-on: ubuntu-latest-4-cores # builds need disk + cores
steps:
- uses: actions/checkout@v4
- uses: jlumbroso/[email protected]
with:
tool-cache: false
docker-images: false
- uses: replicate/setup-cog@v2
with:
token: ${{ secrets.REPLICATE_API_TOKEN }}
- run: pip install git+https://github.com/replicate/cog-safe-push.git
- env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
REPLICATE_API_TOKEN: ${{ secrets.REPLICATE_API_TOKEN }}
run: |
cog-safe-push -vv ${{ inputs.no_push && '--no-push' || '' }}
Add a concurrency: block so PR builds cancel each other while main-branch pushes queue:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
For Replicate-style multi-model repos, drop in:
# .github/workflows/ci.yaml
name: CI
on:
pull_request: { branches: [main] }
push: { branches: [main] }
workflow_dispatch:
inputs:
models: { type: string, default: "all" }
ignore_schema_checks: { type: boolean, default: false }
cog_version: { type: string, default: "latest" }
test_only: { type: boolean, default: false }
jobs:
ci:
uses: replicate/model-ci-template/.github/workflows/template.yaml@main
with:
trigger_type: ${{ github.event_name }}
models: ${{ inputs.models || 'all' }}
ignore_schema_checks: ${{ inputs.ignore_schema_checks || false }}
cog_version: ${{ inputs.cog_version || 'latest' }}
test_only: ${{ inputs.test_only || false }}
secrets: inherit
The reusable workflow expects:
cog-safe-push-configs/<model>.yaml — one per model variant.script/select-model — bash file with if/elif [[ "$MODEL" == "..." ]] blocks listing valid model names.COG_TOKEN, REPLICATE_API_TOKEN, ANTHROPIC_API_KEY.Pattern from replicate/cog-flux: one repo, N variants, push them in parallel.
jobs:
prepare:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set.outputs.matrix }}
steps:
- id: set
run: |
if [ "${{ inputs.models }}" = "all" ]; then
echo 'matrix={"model":["schnell","dev","krea-dev"]}' >> "$GITHUB_OUTPUT"
else
list=$(echo "${{ inputs.models }}" | jq -Rc 'split(",")')
echo "matrix={\"model\":$list}" >> "$GITHUB_OUTPUT"
fi
push:
needs: prepare
runs-on: ubuntu-latest-4-cores
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- run: ./script/select.sh ${{ matrix.model }} # produces cog.yaml from a template
- run: cog-safe-push --config cog-safe-push-configs/${{ matrix.model }}.yaml -vv
When you maintain a proxy that wraps a third-party API, you push to a private wrapper first, then update the public-facing official model card. Pattern from replicate/cog-official-template:
./script/write-api-key # bake API key into config
cog-safe-push --config cog-safe-push-configs/${MODEL}.yaml -vv
./script/delete-api-key # strip the key
cog-safe-push --push-official-model --config cog-safe-push-configs/${MODEL}.yaml -vv
Set official_model: owner/name in the config so --push-official-model knows where to publish.
Add a deployment block to cog-safe-push.yaml to create or update a Replicate deployment automatically on each push:
deployment:
name: my-model
owner: owner
hardware: gpu-l40s
Scaling defaults: CPU deployments scale 1-20 instances, GPU deployments scale 0-2. Adjust manually via the API or web UI when needed.
Run an hourly canary that exercises the registry path. Pattern from replicate/cog-pagerduty-check:
name: Hourly cog push check
on:
schedule:
- cron: "0 * * * *"
workflow_dispatch:
jobs:
check:
runs-on: ubuntu-latest
steps:
- run: |
# generate a tiny model with a unique uuid, push it, run a prediction
# by digest, fail loudly if anything breaks.
./script/canary.sh
Worth doing for any production-critical model, especially when revenue depends on the registry being up.
--ignore-schema-compatibility is the opt-out.test_hardware so test pushes are reproducible.--no-push for dry runs in PR CI; full push on merge to main or on version tags.compare_outputs: false for stochastic models. Use match_prompt: for image/video outputs (VLM judgment), match_url: for binary outputs you control, jq_query: for JSON, error_contains: for negative tests.REPLICATE_API_TOKEN or ANTHROPIC_API_KEY. Use repo secrets.--separate-weights.npx claudepluginhub replicate/skills --plugin prompt-imagesPackages and builds custom AI models with Cog for deployment on Replicate. Covers cog.yaml, predict.py, GPU/CUDA setup, and Docker image creation.
Deploys trained ML models to production via REST APIs, Docker containers, Kubernetes clusters, with data validation, error handling, and performance monitoring.
Manages Hugging Face Hub resources via the `hf` CLI: download/upload models, datasets, spaces, buckets; manage repos, discussions, and jobs; handle auth and cache. Replaces deprecated `huggingface-cli`.