Search everything...

Skill

huggingface-lora-space-builder

Builds and publishes a Gradio demo on Hugging Face Spaces for a user-provided LoRA, selecting the right diffusers pipeline and designing a task-specific UI tailored to the LoRA's model card.

npx claudepluginhub huggingface/skills --plugin trl-training

Popularity

Stars

10,653

Forks

703

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/huggingface-skills:huggingface-lora-space-builder

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Build and publish a Gradio demo on Hugging Face Spaces that runs inference with a user-provided LoRA. Use whenever someone asks to create, generate, ship, or publish "a Space", "a demo", "a Gradio app", or "a playground" for a LoRA — whether the base model is Qwen-Image, Qwen-Image-Edit, LTX, or another diffusion model. Also use when someone describes a LoRA they trained or hosts on the Hub and...

Supporting Files

references/adapting-to-the-lora.mdreferences/base-models/ltx.mdreferences/base-models/qwen-image.mdreferences/creative-mode.mdreferences/tasks.mdreferences/zerogpu-and-publishing.md

SKILL.md

391 lines · ~8.2k tokens(exceeds 5k compaction limit)

Similar Skills

huggingface-spaces

10.7k

Builds, deploys, and debugs Hugging Face Spaces for ML applications using Gradio, Docker, or Static SDKs. Covers ZeroGPU and dedicated hardware setup, model loading, inference providers, and troubleshooting failing builds.

8 files

huggingface-skills

build-models

Packages and builds custom AI models with Cog for deployment on Replicate. Covers cog.yaml, predict.py, GPU/CUDA setup, and Docker image creation.

replicate

hugging-face-gradio

37.9k

Builds and edits Gradio apps, layouts, components, and chat interfaces in Python. Use for ML demos, UI prototypes, and interactive web UIs.

1 file

antigravity-awesome-skills

Stats

LanguagePython

Stars10,653

Forks703

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

huggingface-lora-space-builder | huggingface-skills

Skill

huggingface-lora-space-builder

From huggingface-skills

Builds and publishes a Gradio demo on Hugging Face Spaces for a user-provided LoRA, selecting the right diffusers pipeline and designing a task-specific UI tailored to the LoRA's model card.

npx claudepluginhub huggingface/skills --plugin trl-training

Popularity

Stars

10,653

Forks

703

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/huggingface-skills:huggingface-lora-space-builder

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

references/adapting-to-the-lora.mdreferences/base-models/ltx.mdreferences/base-models/qwen-image.mdreferences/creative-mode.mdreferences/tasks.mdreferences/zerogpu-and-publishing.md

SKILL.md

391 lines · ~8.2k tokens(exceeds 5k compaction limit)

Gradio LoRA Space Builder

The output is a real, published Space (private by default) that the user can try in the browser, not a local script.

What "good" looks like for these demos

The demo should feel handcrafted for this specific LoRA, not a generic template with the LoRA bolted on. Two LoRAs that share a task can still need different demos: a pose-control video LoRA and an outpainting video LoRA both take video in and produce video out, but the inputs the user provides, the preprocessing, and the controls are completely different. Recognizing that is the central job here.

Concretely, a good demo:

Loads fast and runs fast — minimal model loading, sensible step count, no wasted computation per call.
Has a UI with exactly the controls this LoRA needs and nothing else. Excess sliders are a cost, not a feature.
Shows the user what's happening — progress, intermediate outputs where useful, the seed used, a clear error when input is missing.
Honors the LoRA's own recommendations from its model card: trigger words, recommended step count, recommended guidance scale, recommended LoRA scale, example inputs.
Is creative where creativity helps — interactive canvases, before/after sliders, side-by-side previews of intermediate processing — and plain where plainness is right.

Workflow

Work through these phases in order. Information gathered in one phase decides the next.

Gather the LoRA info needed to pick a pipeline and design a UI.
Pick the base pipeline and inference recipe.
Design the UI for this specific LoRA's task and inputs.
Write app.py, requirements.txt, and README.md together; show all three to the user for one batched approval.
Publish the Space (private).

Don't drip-feed questions across multiple turns. Batch them.

Phase 1 — Gather LoRA info

Required: a LoRA repo on the Hub (e.g. username/my-lora).

First, try to read the repo without a token. If it succeeds, the repo is public — proceed. If it fails with 401/403, the repo is private/gated and you need an authenticated session to read it. Don't immediately ask for a token. Check first whether the user is already authenticated.

from huggingface_hub import HfApi, get_token

cached_token = get_token()  # picks up HF_TOKEN env var or cached CLI login
if cached_token:
    try:
        info = HfApi().whoami(token=cached_token)
        username = info["name"]
        # info also has fine-grained token scope info if applicable
    except Exception:
        cached_token = None  # token exists but is invalid/expired

Then:

If a valid cached token exists and it can read the repo, use it. No prompt needed.
If no cached token, or the cached token can't read this private repo, ask the user for a token — once, with the explanation below.

When asking for a token (and only when you actually need to ask):

I need a Hugging Face access token with write scope (to read the LoRA if it's private/gated, and to publish the Space). Create one at https://huggingface.co/settings/tokens. Paste it here.

The same token will be reused for publishing in the final phase, so this is a one-time ask.

Then read what's in the repo:

List the repo files (huggingface_hub.HfApi().list_repo_files(repo_id)). Look for .safetensors, README.md, example images/videos, multiple checkpoints.
Fetch the model card (huggingface_hub.ModelCard.load(repo_id)). The data dict has structured fields; the text has the README body.
If multiple .safetensors files exist, pick the right one — see "Picking the LoRA weights file" in references/zerogpu-and-publishing.md. Briefly: README-recommended file wins, then pytorch_lora_weights.safetensors, then latest training checkpoint, otherwise ask.

From the model card, try to determine:

Base model — the base_model field, or text mentions in the README. Usually present. Use it to pick the pipeline reference file (see Phase 2).
Task — pipeline_tag if set, otherwise inferred from the base model and README text. The five tasks this skill handles: text-to-image, image-to-image, text-to-video, image-to-video, video-to-video.
Trigger words — often called "trigger word", "instance prompt", "activation word"; sometimes embedded in example prompts.
Recommended inference recipe — step count, guidance scale, true CFG scale, LoRA scale, resolution. Many LoRA cards include a Python snippet; trust its parameters (steps, guidance, CFG, LoRA scale, dtype). For loading mechanics, see adapting-to-the-lora.md — prefer pipe.load_lora_weights(...) over whatever loading approach the snippet uses.
Example prompts and example media — use these as Gradio examples in the UI.
Sub-task / specific use case — for image edits and video LoRAs, "what does this LoRA actually do" matters as much as the task category. A relighting LoRA, a face-swap LoRA, and a style LoRA all might be image-to-image, but the UI for each is different.

When something can't be inferred, ask the user — once, in a single batched message. Format the question to make answering trivial. For task category, list the five options as a numbered choice. For sub-task, give a one-line description ("what does this LoRA do? e.g. 'relight portraits', 'apply manga style', 'extend videos to wider aspect ratios'"). Don't ask if you can already infer it confidently from the base model or README.

If the model card has nothing helpful at all — no base model, no task, no example — surface that clearly: "The model card has no usable info. I'll need you to tell me: (1) base model, (2) what this LoRA does, (3) recommended step count and guidance scale if you know them."

Phase 2 — Pick the base pipeline

Two things to decide here: which reference file to load, and which pipeline class to use. They're not the same question — a base-model family file (e.g. qwen-image.md) covers multiple variants, and variants in the same family don't always share a pipeline class. Get this wrong and the Space loads but produces wrong output, or fails at startup.

Step 1 — Load the reference file for this base model family.

references/base-models/qwen-image.md — covers Qwen-Image and Qwen-Image-Edit family (text-to-image and image-to-image).
references/base-models/ltx.md — covers LTX family (text-to-video, image-to-video, video-to-video, including IC-LoRAs).

If the base model isn't in one of these files, this skill doesn't have first-class support yet. Tell the user, and ask whether they want to proceed by analogy (use the closest model's recipe and adjust) or stop. Don't guess silently.

Step 2 — Verify the pipeline class against the base model's own card. This step is mandatory, not optional.

A new base model variant might use the same pipeline class with a different repo path, or a new pipeline class entirely. Don't trust the reference file's table alone — it's best-effort and can lag a recent release. Verify before committing:

from huggingface_hub import ModelCard
base_card = ModelCard.load(base_model_id)
# Read base_card.text — find the diffusers inference snippet, note the pipeline class it imports.

The class imported in the base model card's diffusers snippet is the source of truth. Real examples where this matters:

Qwen-Image-Edit uses QwenImageEditPipeline. Qwen-Image-Edit-2509 and Qwen-Image-Edit-2511 use QwenImageEditPlusPipeline — different class, different default parameters, takes a list of images instead of one. A LoRA targeting 2511 loaded onto QwenImageEditPipeline produces broken output.
LTX-Video uses LTXPipeline/LTXImageToVideoPipeline/LTXConditionPipeline. LTX-2 uses LTX2Pipeline from a different module path. LTX-2.3 sometimes needs a native pipeline outside diffusers.

If the base model card has no diffusers snippet at all, fall back to the reference file's table — and tell the user you're falling back, in case they know something the table doesn't.

The cost of this verification is one Hub fetch and a few seconds of reading. The cost of skipping it is the failure mode the previous bullet describes — a "working" Space that's quietly using the wrong class.

Step 3 — Diffusers vs native pipeline. Default to diffusers when the base model has a diffusers pipeline class. That's the case for Qwen-Image and Qwen-Image-Edit and most of LTX. Some LTX variants (notably LTX-2.3 with certain IC-LoRAs) need a native pipeline; the LTX reference says when. Diffusers gives standard load_lora_weights / set_adapters semantics; the native path needs LoRA-specific glue.

Phase 3 — Design the UI for this LoRA

Don't reach for a template. Reason from the LoRA's task and inputs to a UI.

Read references/tasks.md for the per-task baseline UI patterns (what the standard inputs/outputs look like for T2I, I2I, T2V, I2V, V2V).

Then read references/adapting-to-the-lora.md, which is about thinking through what this specific LoRA needs — beyond the task category. That file is the most important one in this skill. The same task can need very different UIs: a pose-control LTX LoRA needs a video input and a pose-extraction preview; an outpaint LTX LoRA needs an aspect-ratio picker and a black-margin preview; a relighting Flux LoRA needs an image and a brush canvas for indicating where to add light. None of those reduce to "the V2V template" or "the I2I template".

Self-check before writing the UI. Write one sentence describing what a user does with this Space in 10 seconds. If that sentence doesn't distinguish this LoRA from any other LoRA of the same task, the UI isn't shaped enough yet.

Examples that pass the self-check:

"Upload a video, pick a target aspect ratio, click Generate; the model fills the empty margins."
"Draw colored brush strokes where you want light, pick an illumination style, click Generate; the model relights the photo."
"Upload a video of someone moving and an image of a different character; the model produces a video of the character doing the motion."

Examples that fail:

"Type a prompt and click generate." (Generic T2I — say more.)
"Upload an image and an instruction." (Generic edit — what kind of edit?)

Gradio component freshness. Gradio's component set evolves. Before defaulting to plain components, consider whether something newer fits better — for example gr.ImageSlider for before/after on edit LoRAs, gr.BrowserState for persistent prefs, @gr.render for UIs that change based on input. If you're unsure whether a component exists or what its signature is, web-fetch the current Gradio docs at https://www.gradio.app/docs rather than guessing.

When stock and Hub custom components aren't enough — creative mode. If the LoRA's natural input is a shape no Gradio component (built-in or on the Hub) expresses well — point sets, strokes, trajectories, multi-region annotations with metadata, 3D rotation gizmos, timeline scrubbers, anything where the user manipulates a thing on top of media — drop down to custom HTML/JS via gr.HTML. See references/creative-mode.md for the Gradio primitives (gr.HTML, head= injection, elem_id addressing, the two JS↔Python state-sync approaches), the discipline around defining a JSON wire format, and the pitfalls. Don't reach for creative mode just because it would be cool — reach for it when the LoRA's input shape demands it. And don't skip the Hub custom components rung above (e.g. gradio_image_annotation) before going fully bespoke.

gr.Examples for media-input Spaces. When no fitting example media is available from the model's own repo, pull from the shared input pools — split by modality so the HF dataset viewer can render proper thumbnails: images at linoyts/repo-to-space-example-inputs, videos at linoyts/repo-to-space-example-videos. Both are CC0 with categories + natural-language caption metadata and the same filter/rank recipe in each dataset README. Pick 2–3 that fit the task, preprocess to the shapes the model expects, and bake the copies into the Space. Set cache_examples=True, cache_mode="lazy" so the first click caches without running examples at build time (see references/zerogpu-and-publishing.md).

Phase 4 — Write the Space files

Before writing, tell the user concretely what's about to happen — name the actual files. Not "I'll write the three files" but something like:

"Now I'll write the three files needed to publish a Space: app.py (the Gradio demo and inference code), requirements.txt (Python dependencies), and README.md (Space configuration including ZeroGPU hardware setting). Then I'll show all three for your review before publishing."

This anchors the user in what's being produced. Don't say "three files" without naming them — it's vague and signals lack of commitment to the deliverable.

The three files are tightly coupled: requirements.txt is determined by what app.py imports, and the README.md YAML frontmatter sets the SDK version, hardware, and Space title that have to match. Write them together, then show all three to the user for approval in one batched message before publishing.

Read references/zerogpu-and-publishing.md for the ZeroGPU rules. The non-obvious ones:

Models go on cuda at module level (not lazy-loaded inside the GPU function). ZeroGPU has a CUDA emulation that makes this work pre-allocation, and module-level placement is significantly faster than deferred placement.
The function that runs inference is decorated with @spaces.GPU(duration=...). Pick a duration appropriate for the task — short for image generation, longer for video.
Don't use torch.compile — it's incompatible with ZeroGPU's process model.

`app.py`

Compose from the pieces decided in Phases 1–3. Don't paste from a template. Each section should be there because it's needed:

Imports — gradio as gr, torch, spaces, the pipeline class, anything the preprocessing needs.
Constants — LORA_REPO, BASE_MODEL, recommended step count, guidance, LoRA scale, trigger word.
Module-level model load — pipeline from_pretrained, .to("cuda"), load_lora_weights. If the LoRA repo is private, pass token=os.environ["HF_TOKEN"].
Preprocessing functions (if any) — pose extraction, padding, mask building, etc. CPU code can run at module level; GPU code needs to be inside a @spaces.GPU function.
The inference function — decorated with @spaces.GPU(duration=...). Validates inputs, applies trigger word, builds the pipeline kwargs, returns outputs.
The Gradio Blocks — the UI from Phase 3, wired to the inference function.

Common things to get right:

Return the actually-used seed alongside the result so the user can reproduce.
gr.Progress(track_tqdm=True) on the inference function surfaces diffusers' internal progress bar.
Validate inputs — raise gr.Error("Please upload an image first.") when a required input is missing, rather than letting the pipeline fail with a cryptic error.
On gr.Examples, use cache_examples=True, cache_mode="lazy" — plain cache_examples=True runs examples at build time and fails on ZeroGPU; lazy mode defers caching to the first user click.

`requirements.txt`

Don't ship a fixed minimal list and hope for the best. The "minimal" list works for plain T2I LoRAs and breaks the moment the base model has a vision-language text encoder, video output, or any non-trivial preprocessing. Derive requirements.txt from what the Space actually needs, in this order:

Every top-level non-stdlib import in app.py. If app.py does import cv2, requirements.txt has opencv-python. If it does from controlnet_aux import OpenposeDetector, requirements.txt has controlnet-aux. Walk the imports mechanically. (Note the exclusions in the next paragraph — some imports are runtime built-ins and don't need to be listed.)
What the base-model reference's "Required dependencies" subsection says. Each base-model file lists the non-obvious extras the pipeline pulls in — torchvision for Qwen-Image (Qwen 2.5-VL text encoder), imageio[ffmpeg] for LTX (video export), etc. Include all of them. These are the deps that aren't picked up from imports because the pipeline's components import them transitively at load time.
What the LoRA's own model card explicitly mentions installing. If the LoRA README has its own pip install block, lift the deps from there.
The diffusers/ML stack: diffusers, transformers, accelerate, peft, safetensors. Default to plain (unpinned). Switch diffusers to git+https://github.com/huggingface/diffusers if the base-model reference says the model needs it (recent releases often do — Qwen-Image-Edit-2511 is a current example).

What not to list in requirements.txt:

gradio — controlled by the sdk_version: field in README.md's YAML frontmatter, not by requirements.txt. Listing it in requirements is at best ignored, at worst causes a version conflict with the SDK. Set the version in the README only.
torch — provided by the Space runtime. Only add if you need a specific version pinned (rare, and usually a sign something else is wrong).
spaces — provided by the Space runtime. Only add if you need a specific version pinned.
huggingface_hub — provided by the Space runtime. Only add if you need a specific version pinned.

These four come pre-installed in the ZeroGPU container. Listing them anyway is the kind of "include rather than skip" instinct that's right for non-baseline deps but wrong for baseline ones, because pinning conflicts with the runtime's managed versions.

Bias for everything else: include rather than skip when uncertain. A package the Space doesn't actually use causes a slightly slower build. A missing required package causes a startup-time crash that's much harder for the user to diagnose. These costs aren't symmetric — the test failure that prompted this rule was exactly the second kind.

But two specific deps are not safe to add reflexively because they routinely cause more problems than they solve on ZeroGPU:

xformers — pinned to specific torch versions, frequent source of conflicts. The ZeroGPU runtime ships torch 2.8+, so any pinned xformers version must support that. Additional gotcha on Blackwell: xformers' FA3 dispatch mis-gates the hardware (FA3 kernels are Hopper-only at sm_90a, but the dispatcher gates on device_capability >= (9, 0), which also matches Blackwell) and crashes at kernel launch with CUDA invalid argument. If a Space using xformers attention hits this, disable FA3 dispatch at module load:
```
try:
    from xformers.ops.fmha import _set_use_fa3
    _set_use_fa3(False)
except Exception:
    pass
```
Only include xformers if app.py actually uses it.
flash-attn — needs a build step, often fails to install. Same torch 2.8+ alignment caveat as xformers. Only include if app.py actually uses it.

Pin other versions only when you have a reason (e.g. a known incompatibility, or matching a recipe from the model card).

`README.md`

Spaces are configured by the YAML frontmatter at the top of README.md. This frontmatter is what selects ZeroGPU.

---
title: <human-readable title>
emoji: 🎨
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: <current Gradio version>
app_file: app.py
pinned: false
hardware: zero-a10g
short_description: <one short line for the Space tile, ~60 chars max>
models:
  - <base model repo>
  - <lora repo>
---

# <title>

A short description with links to the LoRA and base model.

Key fields:

sdk: gradio — required for ZeroGPU.
sdk_version — match the Gradio version you wrote against. Look up the current version (pip index versions gradio, or check https://www.gradio.app) rather than guessing.
hardware: zero-a10g — the legacy string for ZeroGPU. The actual hardware is NVIDIA RTX Pro 6000 Blackwell, but the identifier is zero-a10g. ZeroGPU is available to PRO, Team, and Enterprise accounts; if the user isn't subscribed, the Space will fall back to CPU. Mention this if you suspect they aren't on PRO.
models: — list base and LoRA repos. This enables Hub caching and discovery.
short_description — appears on the Space tile. Keep it short (~60 characters or less). The Hub's YAML validator rejects long values with a 400 from https://huggingface.co/api/validate-yaml, which surfaces as an HfHubHTTPError during create_repo or upload_file. The exact server-side limit isn't documented and may change, so target the visible-tile-length range rather than pushing right up to a cap. If you do hit the 400, the fix is almost always to shorten this field. One sentence describing what the Space does is plenty — the README body below the YAML is where you put longer prose.

Single batched approval — order of operations matters

The discipline here is write all three files first, then show them all together in one message. Not "write app.py → talk about it → write requirements → talk about it → write README → talk about it." That rhythm produces three approval moments even if you don't explicitly ask for approval, because the user is being asked to react after each file.

Concretely:

Write app.py, requirements.txt, and README.md in succession with no intervening prose. No commentary between files. No "Now I'll write the next one." No description of what each file does as you produce it. Just the three files, back to back.
Then, in a single message, ask for approval covering all three at once. Something like: "Here's the Space — app.py (N lines), requirements.txt, and README.md. Review and confirm to publish, or tell me what to change."
The user responds once, covering whatever they want changed across any of the three files.

What to avoid:

Walking through app.py's structure or design choices after writing it but before writing the others. Save commentary for either the pre-writing announcement (Phase 4 opening) or the single approval message after all three exist.
Asking "ready for the next one?" or "want me to continue with requirements?" — those are implicit per-file approvals.
Showing one file inline and offering to "show the next when you're ready" — same trap.
Treating any of the three files as optional or as a follow-up. They are produced together as one deliverable.

If the user interrupts after seeing the first file with feedback or a question, that's fine — engage with it — but the rule still applies: the next time you produce code, produce all remaining files together, not one at a time.

Phase 5 — Publish the Space

Use the authenticated session from Phase 1. Default to private, so the user can vet the Space before flipping it public. Confirm the target username with the user before creating: "I'll publish to {username}/{space_name} — confirm?"

from huggingface_hub import HfApi, SpaceHardware

api = HfApi(token=hf_token)
username = api.whoami()["name"]
repo_id = f"{username}/{space_name}"

api.create_repo(
    repo_id=repo_id,
    repo_type="space",
    space_sdk="gradio",
    space_hardware=SpaceHardware.ZERO_A10G,
    private=True,
    exist_ok=True,
)

# Upload files
for path in ["app.py", "requirements.txt", "README.md"]:
    api.upload_file(path_or_fileobj=path, path_in_repo=path,
                    repo_id=repo_id, repo_type="space")

If the LoRA repo itself is private/gated, the Space needs the token at runtime to download the LoRA. Set it as a Space secret:

api.add_space_secret(repo_id=repo_id, key="HF_TOKEN", value=HF_TOKEN)

…and in app.py, load the LoRA with token=os.environ["HF_TOKEN"].

After upload, run the smoke-test below before sharing — the build runs asynchronously and silent failures (wrong weight_name, missing dep, wrong pipeline class) only surface at first inference. Once the smoke-test passes, share the Space URL (https://huggingface.co/spaces/{repo_id}) and tell the user the Space is private — they'll need to be logged in to view it. Note that the build takes a few minutes; the logs are at https://huggingface.co/spaces/{repo_id}/logs/container if anything fails.

Publish-time failures (before the build starts):

HfHubHTTPError: 400 Bad Request from https://huggingface.co/api/validate-yaml during create_repo or upload_file. The README YAML failed server-side validation. By far the most common cause is a short_description that's too long; sometimes a stray field or malformed value. Fix: shorten short_description to ~60 characters and retry. If shortening doesn't fix it, look for typos in field names or invalid values (e.g. unsupported colors in colorFrom/colorTo, an invalid hardware string).
403 on create_repo with space_hardware="zero-a10g": user isn't on PRO/Team/Enterprise, so they can't request ZeroGPU at creation time. Fix: retry create_repo without space_hardware, leave hardware: zero-a10g in the README YAML — the Space gets created on CPU. The user can then either upgrade to PRO (auto-promotes to ZeroGPU) or apply for a community GPU grant (request via the Space's hardware settings).
401/403 on upload_file: token doesn't have write scope. Fix: ask the user for a write-scoped token.

Common build failures (after the build starts):

LoRA weight_name mismatch in load_lora_weights → check the actual filename via list_repo_files.
Base model is gated and the token wasn't set as a Space secret.
ZeroGPU not allocated (user not on PRO) → Space falls back to CPU and is unusably slow.
Diffusers version doesn't recognize the pipeline class → pin to git diffusers in requirements.txt.
Missing dependency at module load → see requirements.txt derivation rules above; the most common case is a transitive dep like torchvision for Qwen-Image's text encoder.

If a build fails, offer to read the logs and propose a fix.

Phase 6 — Smoke-test the Space

Before declaring the Space done and handing the URL to the user, exercise it once end-to-end. Several failure modes (wrong weight_name, wrong pipeline class, missing transitive dep, gated-base-model token issue) build cleanly and only surface at first inference. The gradio Python package ships a CLI that does exactly this — gradio info returns the endpoint signature, gradio predict runs an actual inference. Both ship with the gradio pip dependency the Space already needs, so they're available in any environment where this skill ran.

Step 1 — Wait for the build. create_repo returns immediately, but the container image is still building. Poll HfApi().get_space_runtime(repo_id).stage until it reaches RUNNING:

import time
from huggingface_hub import HfApi
api = HfApi(token=hf_token)
while True:
    stage = api.get_space_runtime(repo_id).stage
    if stage == "RUNNING": break
    if stage in {"BUILD_ERROR", "RUNTIME_ERROR", "CONFIG_ERROR"}:
        raise RuntimeError(f"Build failed: {stage}. Logs: https://huggingface.co/spaces/{repo_id}/logs/container")
    time.sleep(15)

If the build fails, fetch the container logs (https://huggingface.co/spaces/{repo_id}/logs/container), read the traceback, and propose a fix. Don't run gradio info against a Space that isn't running — it'll hang or 503.

Step 2 — Verify the endpoint signature. gradio info {repo_id} --token {hf_token} returns the exposed endpoints and their parameter types. Read the output and confirm: (a) the endpoint exists (default is /predict, but Blocks Spaces often have a custom name from the Python function name), (b) the parameters in order match what app.py declares, (c) file-typed params show "type": "filepath" as expected. If any of this is off, the user-facing UI may still appear correct but API calls will fail — fix and re-upload.

Step 3 — Run one real inference. Pick the lightest viable input — the simplest example from the LoRA card, or one of the gr.Examples entries. Pass --token for private Spaces. For file inputs, the payload uses {"path": "...", "meta": {"_type": "gradio.FileData"}}.

# Text-to-image:
gradio predict {repo_id} /predict '{"prompt": "...", "aspect_ratio": "1:1", ...}' --token $HF_TOKEN

# Image-to-image (file input):
gradio predict {repo_id} /predict '{"input_image": {"path": "/tmp/sample.jpg", "meta": {"_type": "gradio.FileData"}}, "prompt": "..."}' --token $HF_TOKEN

If you don't have a local sample image for I2I, lift one from the LoRA repo (hf_hub_download(repo_id, filename="example.png")) or the base model card.

Caveat for creative-mode Spaces. gradio info and gradio predict only exercise the Python endpoint — they tell you nothing about whether custom JS in a gr.HTML widget works. If the Space uses creative mode (see references/creative-mode.md), after the API smoke-test passes, open the Space URL in a browser and verify the interaction once before sharing. Server-side green plus broken JS is the most common failure mode for these.

Step 4 — Interpret the result.

Returns successfully and the output looks plausible → done. Share the URL.
HTTPError 503 / "Space is sleeping" → the Space spun down between steps 1 and 3. Wake it (api.restart_space(repo_id)) and retry.
Inference error mentioning weight_name / safetensors → the LoRA filename in app.py doesn't match the actual file in the LoRA repo. Re-check list_repo_files, fix weight_name=, re-upload app.py.
Inference error mentioning a missing pipeline class or attribute → diffusers version too old. Switch requirements.txt to git+https://github.com/huggingface/diffusers and re-upload.
ImportError at module load → missing dep. Add it to requirements.txt and re-upload. The runtime logs (/logs/run) name the missing package.
OOM → reduce default resolution or step count, or pick a smaller base variant.
Timeout / hangs → bump @spaces.GPU(duration=...) and re-upload.

The smoke-test exists to convert these from "user discovers it and reports back" to "you discover it and fix it before sharing." Don't skip it because the build went green — green-build-broken-inference is the most common failure mode for Spaces with a non-trivial pipeline.

What to avoid

A generic "one demo for all LoRAs" template. The whole point of this skill is to tailor.
Lazy-loading the model inside the GPU function. Slow on ZeroGPU, and hides startup errors until first request.
torch.compile. Not supported on ZeroGPU.
cache_examples=True without cache_mode="lazy" on ZeroGPU.
Uploading the LoRA weights into the Space repo. Pull from the LoRA's own Hub repo at runtime.
Asking for the HF token only at the end, then discovering the LoRA was private all along and you couldn't read the model card.
Exposing every diffusers knob. Pick the 1–3 controls that matter for this LoRA.
Long preambles in the chat reply once the Space is published. The Space URL is the deliverable; keep the wrap-up brief.

Similar Skills

huggingface-spaces

10.7k

8 files

huggingface-skills

build-models

Packages and builds custom AI models with Cog for deployment on Replicate. Covers cog.yaml, predict.py, GPU/CUDA setup, and Docker image creation.

replicate

hugging-face-gradio

37.9k

Builds and edits Gradio apps, layouts, components, and chat interfaces in Python. Use for ML demos, UI prototypes, and interactive web UIs.

1 file

antigravity-awesome-skills

Stats

LanguagePython

Stars10,653

Forks703

MaintenanceExcellent

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Gradio LoRA Space Builder

The output is a real, published Space (private by default) that the user can try in the browser, not a local script.

What "good" looks like for these demos

Concretely, a good demo:

Loads fast and runs fast — minimal model loading, sensible step count, no wasted computation per call.
Has a UI with exactly the controls this LoRA needs and nothing else. Excess sliders are a cost, not a feature.
Shows the user what's happening — progress, intermediate outputs where useful, the seed used, a clear error when input is missing.
Honors the LoRA's own recommendations from its model card: trigger words, recommended step count, recommended guidance scale, recommended LoRA scale, example inputs.
Is creative where creativity helps — interactive canvases, before/after sliders, side-by-side previews of intermediate processing — and plain where plainness is right.

Workflow

Work through these phases in order. Information gathered in one phase decides the next.

Gather the LoRA info needed to pick a pipeline and design a UI.
Pick the base pipeline and inference recipe.
Design the UI for this specific LoRA's task and inputs.
Write app.py, requirements.txt, and README.md together; show all three to the user for one batched approval.
Publish the Space (private).

Don't drip-feed questions across multiple turns. Batch them.

Phase 1 — Gather LoRA info

Required: a LoRA repo on the Hub (e.g. username/my-lora).

from huggingface_hub import HfApi, get_token

cached_token = get_token()  # picks up HF_TOKEN env var or cached CLI login
if cached_token:
    try:
        info = HfApi().whoami(token=cached_token)
        username = info["name"]
        # info also has fine-grained token scope info if applicable
    except Exception:
        cached_token = None  # token exists but is invalid/expired

Then:

If a valid cached token exists and it can read the repo, use it. No prompt needed.
If no cached token, or the cached token can't read this private repo, ask the user for a token — once, with the explanation below.

When asking for a token (and only when you actually need to ask):

I need a Hugging Face access token with write scope (to read the LoRA if it's private/gated, and to publish the Space). Create one at https://huggingface.co/settings/tokens. Paste it here.

The same token will be reused for publishing in the final phase, so this is a one-time ask.

Then read what's in the repo:

List the repo files (huggingface_hub.HfApi().list_repo_files(repo_id)). Look for .safetensors, README.md, example images/videos, multiple checkpoints.
Fetch the model card (huggingface_hub.ModelCard.load(repo_id)). The data dict has structured fields; the text has the README body.
If multiple .safetensors files exist, pick the right one — see "Picking the LoRA weights file" in references/zerogpu-and-publishing.md. Briefly: README-recommended file wins, then pytorch_lora_weights.safetensors, then latest training checkpoint, otherwise ask.

From the model card, try to determine:

Base model — the base_model field, or text mentions in the README. Usually present. Use it to pick the pipeline reference file (see Phase 2).
Task — pipeline_tag if set, otherwise inferred from the base model and README text. The five tasks this skill handles: text-to-image, image-to-image, text-to-video, image-to-video, video-to-video.
Trigger words — often called "trigger word", "instance prompt", "activation word"; sometimes embedded in example prompts.
Recommended inference recipe — step count, guidance scale, true CFG scale, LoRA scale, resolution. Many LoRA cards include a Python snippet; trust its parameters (steps, guidance, CFG, LoRA scale, dtype). For loading mechanics, see adapting-to-the-lora.md — prefer pipe.load_lora_weights(...) over whatever loading approach the snippet uses.
Example prompts and example media — use these as Gradio examples in the UI.
Sub-task / specific use case — for image edits and video LoRAs, "what does this LoRA actually do" matters as much as the task category. A relighting LoRA, a face-swap LoRA, and a style LoRA all might be image-to-image, but the UI for each is different.

Phase 2 — Pick the base pipeline

Step 1 — Load the reference file for this base model family.

references/base-models/qwen-image.md — covers Qwen-Image and Qwen-Image-Edit family (text-to-image and image-to-image).
references/base-models/ltx.md — covers LTX family (text-to-video, image-to-video, video-to-video, including IC-LoRAs).

Step 2 — Verify the pipeline class against the base model's own card. This step is mandatory, not optional.

from huggingface_hub import ModelCard
base_card = ModelCard.load(base_model_id)
# Read base_card.text — find the diffusers inference snippet, note the pipeline class it imports.

The class imported in the base model card's diffusers snippet is the source of truth. Real examples where this matters:

Qwen-Image-Edit uses QwenImageEditPipeline. Qwen-Image-Edit-2509 and Qwen-Image-Edit-2511 use QwenImageEditPlusPipeline — different class, different default parameters, takes a list of images instead of one. A LoRA targeting 2511 loaded onto QwenImageEditPipeline produces broken output.
LTX-Video uses LTXPipeline/LTXImageToVideoPipeline/LTXConditionPipeline. LTX-2 uses LTX2Pipeline from a different module path. LTX-2.3 sometimes needs a native pipeline outside diffusers.

If the base model card has no diffusers snippet at all, fall back to the reference file's table — and tell the user you're falling back, in case they know something the table doesn't.

Phase 3 — Design the UI for this LoRA

Don't reach for a template. Reason from the LoRA's task and inputs to a UI.

Read references/tasks.md for the per-task baseline UI patterns (what the standard inputs/outputs look like for T2I, I2I, T2V, I2V, V2V).

Examples that pass the self-check:

"Upload a video, pick a target aspect ratio, click Generate; the model fills the empty margins."
"Draw colored brush strokes where you want light, pick an illumination style, click Generate; the model relights the photo."
"Upload a video of someone moving and an image of a different character; the model produces a video of the character doing the motion."

Examples that fail:

"Type a prompt and click generate." (Generic T2I — say more.)
"Upload an image and an instruction." (Generic edit — what kind of edit?)

Phase 4 — Write the Space files

Before writing, tell the user concretely what's about to happen — name the actual files. Not "I'll write the three files" but something like:

"Now I'll write the three files needed to publish a Space: app.py (the Gradio demo and inference code), requirements.txt (Python dependencies), and README.md (Space configuration including ZeroGPU hardware setting). Then I'll show all three for your review before publishing."

This anchors the user in what's being produced. Don't say "three files" without naming them — it's vague and signals lack of commitment to the deliverable.

Read references/zerogpu-and-publishing.md for the ZeroGPU rules. The non-obvious ones:

Models go on cuda at module level (not lazy-loaded inside the GPU function). ZeroGPU has a CUDA emulation that makes this work pre-allocation, and module-level placement is significantly faster than deferred placement.
The function that runs inference is decorated with @spaces.GPU(duration=...). Pick a duration appropriate for the task — short for image generation, longer for video.
Don't use torch.compile — it's incompatible with ZeroGPU's process model.

`app.py`

Compose from the pieces decided in Phases 1–3. Don't paste from a template. Each section should be there because it's needed:

Imports — gradio as gr, torch, spaces, the pipeline class, anything the preprocessing needs.
Constants — LORA_REPO, BASE_MODEL, recommended step count, guidance, LoRA scale, trigger word.
Module-level model load — pipeline from_pretrained, .to("cuda"), load_lora_weights. If the LoRA repo is private, pass token=os.environ["HF_TOKEN"].
Preprocessing functions (if any) — pose extraction, padding, mask building, etc. CPU code can run at module level; GPU code needs to be inside a @spaces.GPU function.
The inference function — decorated with @spaces.GPU(duration=...). Validates inputs, applies trigger word, builds the pipeline kwargs, returns outputs.
The Gradio Blocks — the UI from Phase 3, wired to the inference function.

Common things to get right:

Return the actually-used seed alongside the result so the user can reproduce.
gr.Progress(track_tqdm=True) on the inference function surfaces diffusers' internal progress bar.
Validate inputs — raise gr.Error("Please upload an image first.") when a required input is missing, rather than letting the pipeline fail with a cryptic error.
On gr.Examples, use cache_examples=True, cache_mode="lazy" — plain cache_examples=True runs examples at build time and fails on ZeroGPU; lazy mode defers caching to the first user click.

`requirements.txt`

Every top-level non-stdlib import in app.py. If app.py does import cv2, requirements.txt has opencv-python. If it does from controlnet_aux import OpenposeDetector, requirements.txt has controlnet-aux. Walk the imports mechanically. (Note the exclusions in the next paragraph — some imports are runtime built-ins and don't need to be listed.)
What the base-model reference's "Required dependencies" subsection says. Each base-model file lists the non-obvious extras the pipeline pulls in — torchvision for Qwen-Image (Qwen 2.5-VL text encoder), imageio[ffmpeg] for LTX (video export), etc. Include all of them. These are the deps that aren't picked up from imports because the pipeline's components import them transitively at load time.
What the LoRA's own model card explicitly mentions installing. If the LoRA README has its own pip install block, lift the deps from there.
The diffusers/ML stack: diffusers, transformers, accelerate, peft, safetensors. Default to plain (unpinned). Switch diffusers to git+https://github.com/huggingface/diffusers if the base-model reference says the model needs it (recent releases often do — Qwen-Image-Edit-2511 is a current example).

What not to list in requirements.txt:

gradio — controlled by the sdk_version: field in README.md's YAML frontmatter, not by requirements.txt. Listing it in requirements is at best ignored, at worst causes a version conflict with the SDK. Set the version in the README only.
torch — provided by the Space runtime. Only add if you need a specific version pinned (rare, and usually a sign something else is wrong).
spaces — provided by the Space runtime. Only add if you need a specific version pinned.
huggingface_hub — provided by the Space runtime. Only add if you need a specific version pinned.

But two specific deps are not safe to add reflexively because they routinely cause more problems than they solve on ZeroGPU:

xformers — pinned to specific torch versions, frequent source of conflicts. The ZeroGPU runtime ships torch 2.8+, so any pinned xformers version must support that. Additional gotcha on Blackwell: xformers' FA3 dispatch mis-gates the hardware (FA3 kernels are Hopper-only at sm_90a, but the dispatcher gates on device_capability >= (9, 0), which also matches Blackwell) and crashes at kernel launch with CUDA invalid argument. If a Space using xformers attention hits this, disable FA3 dispatch at module load:
```
try:
    from xformers.ops.fmha import _set_use_fa3
    _set_use_fa3(False)
except Exception:
    pass
```
Only include xformers if app.py actually uses it.
flash-attn — needs a build step, often fails to install. Same torch 2.8+ alignment caveat as xformers. Only include if app.py actually uses it.

Pin other versions only when you have a reason (e.g. a known incompatibility, or matching a recipe from the model card).

`README.md`

Spaces are configured by the YAML frontmatter at the top of README.md. This frontmatter is what selects ZeroGPU.

---
title: <human-readable title>
emoji: 🎨
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: <current Gradio version>
app_file: app.py
pinned: false
hardware: zero-a10g
short_description: <one short line for the Space tile, ~60 chars max>
models:
  - <base model repo>
  - <lora repo>
---

# <title>

A short description with links to the LoRA and base model.

Key fields:

sdk: gradio — required for ZeroGPU.
sdk_version — match the Gradio version you wrote against. Look up the current version (pip index versions gradio, or check https://www.gradio.app) rather than guessing.
hardware: zero-a10g — the legacy string for ZeroGPU. The actual hardware is NVIDIA RTX Pro 6000 Blackwell, but the identifier is zero-a10g. ZeroGPU is available to PRO, Team, and Enterprise accounts; if the user isn't subscribed, the Space will fall back to CPU. Mention this if you suspect they aren't on PRO.
models: — list base and LoRA repos. This enables Hub caching and discovery.
short_description — appears on the Space tile. Keep it short (~60 characters or less). The Hub's YAML validator rejects long values with a 400 from https://huggingface.co/api/validate-yaml, which surfaces as an HfHubHTTPError during create_repo or upload_file. The exact server-side limit isn't documented and may change, so target the visible-tile-length range rather than pushing right up to a cap. If you do hit the 400, the fix is almost always to shorten this field. One sentence describing what the Space does is plenty — the README body below the YAML is where you put longer prose.

Single batched approval — order of operations matters

Concretely:

Write app.py, requirements.txt, and README.md in succession with no intervening prose. No commentary between files. No "Now I'll write the next one." No description of what each file does as you produce it. Just the three files, back to back.
Then, in a single message, ask for approval covering all three at once. Something like: "Here's the Space — app.py (N lines), requirements.txt, and README.md. Review and confirm to publish, or tell me what to change."
The user responds once, covering whatever they want changed across any of the three files.

What to avoid:

Walking through app.py's structure or design choices after writing it but before writing the others. Save commentary for either the pre-writing announcement (Phase 4 opening) or the single approval message after all three exist.
Asking "ready for the next one?" or "want me to continue with requirements?" — those are implicit per-file approvals.
Showing one file inline and offering to "show the next when you're ready" — same trap.
Treating any of the three files as optional or as a follow-up. They are produced together as one deliverable.

Phase 5 — Publish the Space

from huggingface_hub import HfApi, SpaceHardware

api = HfApi(token=hf_token)
username = api.whoami()["name"]
repo_id = f"{username}/{space_name}"

api.create_repo(
    repo_id=repo_id,
    repo_type="space",
    space_sdk="gradio",
    space_hardware=SpaceHardware.ZERO_A10G,
    private=True,
    exist_ok=True,
)

# Upload files
for path in ["app.py", "requirements.txt", "README.md"]:
    api.upload_file(path_or_fileobj=path, path_in_repo=path,
                    repo_id=repo_id, repo_type="space")

If the LoRA repo itself is private/gated, the Space needs the token at runtime to download the LoRA. Set it as a Space secret:

api.add_space_secret(repo_id=repo_id, key="HF_TOKEN", value=HF_TOKEN)

…and in app.py, load the LoRA with token=os.environ["HF_TOKEN"].

Publish-time failures (before the build starts):

HfHubHTTPError: 400 Bad Request from https://huggingface.co/api/validate-yaml during create_repo or upload_file. The README YAML failed server-side validation. By far the most common cause is a short_description that's too long; sometimes a stray field or malformed value. Fix: shorten short_description to ~60 characters and retry. If shortening doesn't fix it, look for typos in field names or invalid values (e.g. unsupported colors in colorFrom/colorTo, an invalid hardware string).
403 on create_repo with space_hardware="zero-a10g": user isn't on PRO/Team/Enterprise, so they can't request ZeroGPU at creation time. Fix: retry create_repo without space_hardware, leave hardware: zero-a10g in the README YAML — the Space gets created on CPU. The user can then either upgrade to PRO (auto-promotes to ZeroGPU) or apply for a community GPU grant (request via the Space's hardware settings).
401/403 on upload_file: token doesn't have write scope. Fix: ask the user for a write-scoped token.

Common build failures (after the build starts):

LoRA weight_name mismatch in load_lora_weights → check the actual filename via list_repo_files.
Base model is gated and the token wasn't set as a Space secret.
ZeroGPU not allocated (user not on PRO) → Space falls back to CPU and is unusably slow.
Diffusers version doesn't recognize the pipeline class → pin to git diffusers in requirements.txt.
Missing dependency at module load → see requirements.txt derivation rules above; the most common case is a transitive dep like torchvision for Qwen-Image's text encoder.

If a build fails, offer to read the logs and propose a fix.

Phase 6 — Smoke-test the Space

Step 1 — Wait for the build. create_repo returns immediately, but the container image is still building. Poll HfApi().get_space_runtime(repo_id).stage until it reaches RUNNING:

import time
from huggingface_hub import HfApi
api = HfApi(token=hf_token)
while True:
    stage = api.get_space_runtime(repo_id).stage
    if stage == "RUNNING": break
    if stage in {"BUILD_ERROR", "RUNTIME_ERROR", "CONFIG_ERROR"}:
        raise RuntimeError(f"Build failed: {stage}. Logs: https://huggingface.co/spaces/{repo_id}/logs/container")
    time.sleep(15)

# Text-to-image:
gradio predict {repo_id} /predict '{"prompt": "...", "aspect_ratio": "1:1", ...}' --token $HF_TOKEN

# Image-to-image (file input):
gradio predict {repo_id} /predict '{"input_image": {"path": "/tmp/sample.jpg", "meta": {"_type": "gradio.FileData"}}, "prompt": "..."}' --token $HF_TOKEN

If you don't have a local sample image for I2I, lift one from the LoRA repo (hf_hub_download(repo_id, filename="example.png")) or the base model card.

Step 4 — Interpret the result.

Returns successfully and the output looks plausible → done. Share the URL.
HTTPError 503 / "Space is sleeping" → the Space spun down between steps 1 and 3. Wake it (api.restart_space(repo_id)) and retry.
Inference error mentioning weight_name / safetensors → the LoRA filename in app.py doesn't match the actual file in the LoRA repo. Re-check list_repo_files, fix weight_name=, re-upload app.py.
Inference error mentioning a missing pipeline class or attribute → diffusers version too old. Switch requirements.txt to git+https://github.com/huggingface/diffusers and re-upload.
ImportError at module load → missing dep. Add it to requirements.txt and re-upload. The runtime logs (/logs/run) name the missing package.
OOM → reduce default resolution or step count, or pick a smaller base variant.
Timeout / hangs → bump @spaces.GPU(duration=...) and re-upload.

What to avoid

A generic "one demo for all LoRAs" template. The whole point of this skill is to tailor.
Lazy-loading the model inside the GPU function. Slow on ZeroGPU, and hides startup errors until first request.
torch.compile. Not supported on ZeroGPU.
cache_examples=True without cache_mode="lazy" on ZeroGPU.
Uploading the LoRA weights into the Space repo. Pull from the LoRA's own Hub repo at runtime.
Asking for the HF token only at the end, then discovering the LoRA was private all along and you couldn't read the model card.
Exposing every diffusers knob. Pick the 1–3 controls that matter for this LoRA.
Long preambles in the chat reply once the Space is published. The Space URL is the deliverable; keep the wrap-up brief.

huggingface-lora-space-builder

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

huggingface-lora-space-builder

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Gradio LoRA Space Builder

What "good" looks like for these demos

Workflow

Phase 1 — Gather LoRA info

Phase 2 — Pick the base pipeline

Phase 3 — Design the UI for this LoRA

Phase 4 — Write the Space files

app.py

requirements.txt

README.md

Single batched approval — order of operations matters

Phase 5 — Publish the Space

Phase 6 — Smoke-test the Space

What to avoid

Similar Skills

Help us improve

Gradio LoRA Space Builder

What "good" looks like for these demos

Workflow

Phase 1 — Gather LoRA info

Phase 2 — Pick the base pipeline

Phase 3 — Design the UI for this LoRA

Phase 4 — Write the Space files

app.py

requirements.txt

README.md

Single batched approval — order of operations matters

Phase 5 — Publish the Space

Phase 6 — Smoke-test the Space

What to avoid

`app.py`

`requirements.txt`

`README.md`

`app.py`

`requirements.txt`

`README.md`