From replicate
Publishes custom AI models to Replicate using cog push and cog-safe-push, validates with schema checks, tests, fuzzing, and sets up GitHub Actions CI/CD for safe releases.
npx claudepluginhub replicate/skills --plugin prompt-videosThis skill uses the workspace's default tool permissions.
- Cog reference: <https://cog.run/llms.txt>
Packages and builds custom AI models with Cog for Replicate deployment. Creates cog.yaml and predict.py, builds Docker images, handles GPU/CUDA setup, and ports Hugging Face models.
Manages Hugging Face Hub via CLI: download/upload models/datasets/spaces/repos, handle auth/cache/buckets/jobs/webhooks/inference endpoints. For HF ecosystem/AI/ML tasks.
Deploys trained ML models to production via REST APIs, Docker containers, Kubernetes clusters, with data validation, error handling, and performance monitoring.
Share bugs, ideas, or general feedback.
cog push reference: https://cog.run/cli#cog-pushbuild-models if you don't yet).cog login against r8.im (or echo $TOKEN | cog login --token-stdin).replicate.com/{owner}/{name} via the API, web UI, or r8-model CLI.REPLICATE_API_TOKEN set in your environment.cog pushThe simplest path. Build and upload a new version:
cog push r8.im/owner/my-model
Or set image: r8.im/owner/my-model in cog.yaml and run a bare:
cog push
Useful flags:
--separate-weights — store weights in a separate layer; faster cold boots and pushes for models with > 1GB of weights.--x-fast — faster pushes during iteration (skips some validation).--secret id=hf,src=$HOME/.hf_token — pass build-time secrets without baking them into image history.cog-safe-push pushes to a private -test model first, checks schema compatibility against the live version, runs prediction comparisons, and fuzzes inputs. Catches breaking changes before they reach users.
Install:
pip install git+https://github.com/replicate/cog-safe-push.git
Required env vars:
REPLICATE_API_TOKENANTHROPIC_API_KEY (Claude judges output similarity for stochastic models)Basic usage:
cog-safe-push --test-hardware=gpu-l40s owner/my-model
This will:
predict.py with ruff.owner/my-model-test if missing.owner/my-model version.owner/my-model.Drop a cog-safe-push.yaml in your project root (or cog-safe-push-configs/<variant>.yaml for multi-model repos). All five test-case checker types in one example:
model: owner/my-model
test_model: owner/my-model-test
test_hardware: gpu-l40s
predict:
compare_outputs: false # set false for stochastic models
predict_timeout: 600
test_cases:
- inputs:
prompt: "a serene mountain landscape"
match_prompt: "a landscape photo of mountains" # AI-judged via Claude
- inputs:
prompt: "a cat"
match_url: "https://example.com/reference-cat.png" # binary/image match
- inputs:
prompt: ""
error_contains: "prompt cannot be empty" # negative test
- inputs:
mode: "json"
jq_query: '.confidence > 0.8 and .status == "success"' # JSON output
- inputs:
prompt: "echo this"
exact_string: "echo this" # exact string match
fuzz:
fixed_inputs:
seed: 42
disabled_inputs:
- debug
iterations: 10
prompt: "Generate creative and diverse prompts"
train: # if your model has a trainer
destination: owner/my-model-trained
destination_hardware: gpu-l40s
train_timeout: 1800
test_cases:
- inputs:
input_images: "https://.../training.zip"
steps: 10
deployment: # auto-create or update on push
name: my-model
owner: owner
hardware: gpu-l40s
parallel: 4
fast_push: false
ignore_schema_compatibility: false
official_model: owner/my-model # for proxy/wrapper models, see below
Test case checkers are mutually exclusive: pick exactly one of match_prompt, match_url, error_contains, jq_query, or exact_string per case. Use compare_outputs: false for any stochastic model (diffusion, LLMs); the default true is brittle.
Two paths, depending on how much glue you want.
# .github/workflows/push.yaml
name: Push to Replicate
on:
workflow_dispatch:
inputs:
no_push:
type: boolean
default: false
jobs:
push:
runs-on: ubuntu-latest-4-cores # builds need disk + cores
steps:
- uses: actions/checkout@v4
- uses: jlumbroso/free-disk-space@v1.3.1
with:
tool-cache: false
docker-images: false
- uses: replicate/setup-cog@v2
with:
token: ${{ secrets.REPLICATE_API_TOKEN }}
- run: pip install git+https://github.com/replicate/cog-safe-push.git
- env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
REPLICATE_API_TOKEN: ${{ secrets.REPLICATE_API_TOKEN }}
run: |
cog-safe-push -vv ${{ inputs.no_push && '--no-push' || '' }}
Add a concurrency: block so PR builds cancel each other while main-branch pushes queue:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
For Replicate-style multi-model repos, drop in:
# .github/workflows/ci.yaml
name: CI
on:
pull_request: { branches: [main] }
push: { branches: [main] }
workflow_dispatch:
inputs:
models: { type: string, default: "all" }
ignore_schema_checks: { type: boolean, default: false }
cog_version: { type: string, default: "latest" }
test_only: { type: boolean, default: false }
jobs:
ci:
uses: replicate/model-ci-template/.github/workflows/template.yaml@main
with:
trigger_type: ${{ github.event_name }}
models: ${{ inputs.models || 'all' }}
ignore_schema_checks: ${{ inputs.ignore_schema_checks || false }}
cog_version: ${{ inputs.cog_version || 'latest' }}
test_only: ${{ inputs.test_only || false }}
secrets: inherit
The reusable workflow expects:
cog-safe-push-configs/<model>.yaml — one per model variant.script/select-model — bash file with if/elif [[ "$MODEL" == "..." ]] blocks listing valid model names.COG_TOKEN, REPLICATE_API_TOKEN, ANTHROPIC_API_KEY.Pattern from replicate/cog-flux: one repo, N variants, push them in parallel.
jobs:
prepare:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set.outputs.matrix }}
steps:
- id: set
run: |
if [ "${{ inputs.models }}" = "all" ]; then
echo 'matrix={"model":["schnell","dev","krea-dev"]}' >> "$GITHUB_OUTPUT"
else
list=$(echo "${{ inputs.models }}" | jq -Rc 'split(",")')
echo "matrix={\"model\":$list}" >> "$GITHUB_OUTPUT"
fi
push:
needs: prepare
runs-on: ubuntu-latest-4-cores
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- run: ./script/select.sh ${{ matrix.model }} # produces cog.yaml from a template
- run: cog-safe-push --config cog-safe-push-configs/${{ matrix.model }}.yaml -vv
When you maintain a proxy that wraps a third-party API, you push to a private wrapper first, then update the public-facing official model card. Pattern from replicate/cog-official-template:
./script/write-api-key # bake API key into config
cog-safe-push --config cog-safe-push-configs/${MODEL}.yaml -vv
./script/delete-api-key # strip the key
cog-safe-push --push-official-model --config cog-safe-push-configs/${MODEL}.yaml -vv
Set official_model: owner/name in the config so --push-official-model knows where to publish.
Add a deployment block to cog-safe-push.yaml to create or update a Replicate deployment automatically on each push:
deployment:
name: my-model
owner: owner
hardware: gpu-l40s
Scaling defaults: CPU deployments scale 1-20 instances, GPU deployments scale 0-2. Adjust manually via the API or web UI when needed.
Run an hourly canary that exercises the registry path. Pattern from replicate/cog-pagerduty-check:
name: Hourly cog push check
on:
schedule:
- cron: "0 * * * *"
workflow_dispatch:
jobs:
check:
runs-on: ubuntu-latest
steps:
- run: |
# generate a tiny model with a unique uuid, push it, run a prediction
# by digest, fail loudly if anything breaks.
./script/canary.sh
Worth doing for any production-critical model, especially when revenue depends on the registry being up.
--ignore-schema-compatibility is the opt-out.test_hardware so test pushes are reproducible.--no-push for dry runs in PR CI; full push on merge to main or on version tags.compare_outputs: false for stochastic models. Use match_prompt: for image/video outputs (VLM judgment), match_url: for binary outputs you control, jq_query: for JSON, error_contains: for negative tests.REPLICATE_API_TOKEN or ANTHROPIC_API_KEY. Use repo secrets.--separate-weights.