Help us improve
Share bugs, ideas, or general feedback.
From opengradient-plugin
Generates idiomatic Python code for OpenGradient SDK: verified LLM inference (OpenAI/Anthropic/Google models), chat completions, streaming, tool calling, on-chain ONNX inference, LangChain agents, model hub operations, digital twins.
npx claudepluginhub opengradient/claude-plugins --plugin opengradient-pluginHow this skill is triggered — by the user, by Claude, or both
Slash command
/opengradient-plugin:opengradient-sdkThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an expert on the **OpenGradient Python SDK** (`opengradient`). Help the user write correct, idiomatic code using the SDK.
Routes LLM requests to OpenAI, Grok/xAI, Groq, DeepSeek, or OpenRouter using SwiftOpenAI-CLI agent mode with auto-setup and API key checks.
Provides production-ready patterns for LLM apps including RAG pipelines, chunking strategies, vector DB selection, embedding models, and AI agent architectures. Use for designing RAG systems, agents, and LLMOps.
Builds LLM-powered applications with Claude API, Anthropic SDKs, or Agent SDK. Detects project language (Python, TypeScript, Java, Go, Ruby, C#, PHP) and provides language-specific docs and defaults.
Share bugs, ideas, or general feedback.
You are an expert on the OpenGradient Python SDK (opengradient). Help the user write correct, idiomatic code using the SDK.
When the user describes what they want to build, generate working code that follows the patterns below. Always prefer the simplest approach that satisfies the requirements.
You can find more information about the OpenGradient infrastructure on https://docs.opengradient.ai.
This guide was written for OpenGradient SDK version 0.9.4, make sure to install this version.
# Requires Python >=3.11
pip install opengradient==0.9.4
OpenGradient is a decentralized AI inference platform. The SDK provides:
Each service is instantiated separately — there is no single init() function:
import opengradient as og
# LLM inference (requires Base Sepolia private key with OPG tokens)
llm = og.LLM(private_key="0x...")
# On-chain ONNX inference (requires OpenGradient testnet private key)
alpha = og.Alpha(private_key="0x...")
# Model Hub (requires email/password auth)
hub = og.ModelHub(email="...", password="...")
# Digital twins (requires twins API key)
twins = og.Twins(api_key="...")
Before the first LLM call, approve OPG token spending (idempotent — skips if allowance is sufficient):
llm.ensure_opg_approval(min_allowance=5)
The ensure_opg_approval method accepts:
min_allowance (float): Minimum OPG allowance required.approve_amount (float, optional): Amount to approve. Defaults to 2 * min_allowance.Returns a Permit2ApprovalResult with allowance_before, allowance_after, and tx_hash (None if no approval was needed).
Users must acquire $OPG tokens on Base Sepolia in their wallet in order to pay for inferences via x402. If the user owns no $OPG, you they can request via our faucet.
og.TEE_LLM)| Provider | Models |
|---|---|
| OpenAI | GPT_4_1_2025_04_14, O4_MINI, GPT_5, GPT_5_MINI, GPT_5_2 |
| Anthropic | CLAUDE_SONNET_4_5, CLAUDE_SONNET_4_6, CLAUDE_HAIKU_4_5, CLAUDE_OPUS_4_5, CLAUDE_OPUS_4_6 |
GEMINI_2_5_FLASH, GEMINI_2_5_PRO, GEMINI_2_5_FLASH_LITE, GEMINI_3_PRO, GEMINI_3_FLASH | |
| xAI | GROK_4, GROK_4_FAST, GROK_4_1_FAST, GROK_4_1_FAST_NON_REASONING |
og.x402SettlementMode)PRIVATE — Payment only, no data on-chain (maximum privacy)BATCH_HASHED — Aggregated into Merkle tree (most cost-efficient, default)INDIVIDUAL_FULL — Full input/output recorded on-chain (maximum transparency)IMPORTANT: llm.chat() and llm.completion() are async methods. Use await inside an async function or asyncio.run() for top-level calls.
import asyncio
import opengradient as og
llm = og.LLM(private_key="0x...")
llm.ensure_opg_approval(min_allowance=5)
result = asyncio.run(llm.chat(
model=og.TEE_LLM.GEMINI_2_5_FLASH,
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=300,
temperature=0.0,
))
print(result.chat_output["content"])
import asyncio
import opengradient as og
async def stream_example():
llm = og.LLM(private_key="0x...")
llm.ensure_opg_approval(min_allowance=5)
stream = await llm.chat(
model=og.TEE_LLM.GPT_5,
messages=[{"role": "user", "content": "Explain quantum computing"}],
max_tokens=500,
stream=True,
)
async for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
asyncio.run(stream_example())
import asyncio
import opengradient as og
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
async def tool_example():
llm = og.LLM(private_key="0x...")
llm.ensure_opg_approval(min_allowance=5)
result = await llm.chat(
model=og.TEE_LLM.GPT_5,
messages=[{"role": "user", "content": "Weather in NYC?"}],
tools=tools,
max_tokens=200,
)
if result.finish_reason == "tool_calls":
for tc in result.chat_output["tool_calls"]:
print(f"Call: {tc['function']['name']}({tc['function']['arguments']})")
asyncio.run(tool_example())
import asyncio
import opengradient as og
async def agent_loop(user_query, tools, max_iterations=5):
llm = og.LLM(private_key="0x...")
llm.ensure_opg_approval(min_allowance=5)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": user_query},
]
for _ in range(max_iterations):
result = await llm.chat(
model=og.TEE_LLM.GPT_5,
messages=messages,
tools=tools,
tool_choice="auto",
)
if result.finish_reason == "tool_calls":
messages.append(result.chat_output)
for tc in result.chat_output["tool_calls"]:
tool_result = execute_tool(tc["function"]["name"], tc["function"]["arguments"])
messages.append({
"role": "tool",
"tool_call_id": tc["id"],
"content": tool_result,
})
else:
return result.chat_output["content"]
import asyncio
import opengradient as og
llm = og.LLM(private_key="0x...")
llm.ensure_opg_approval(min_allowance=5)
result = asyncio.run(llm.completion(
model=og.TEE_LLM.GPT_5,
prompt="The capital of France is",
max_tokens=50,
temperature=0.0,
))
print(result.completion_output)
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
import opengradient as og
llm = og.agents.langchain_adapter(
private_key="0x...",
model_cid=og.TEE_LLM.GPT_5,
max_tokens=300,
)
@tool
def lookup(query: str) -> str:
"""Look up information."""
return "result"
agent = create_react_agent(llm, [lookup])
result = agent.invoke({"messages": [("user", "Find info about X")]})
print(result["messages"][-1].content)
import opengradient as og
alpha = og.Alpha(private_key="0x...")
result = alpha.infer(
model_cid="QmbUqS93oc4JTLMHwpVxsE39mhNxy6hpf6Py3r9oANr8aZ",
inference_mode=og.InferenceMode.VANILLA,
model_input={"input": [1.0, 2.0, 3.0]},
)
print(result.model_output)
print(result.transaction_hash)
Digital twins are digital clones of people. You can create your own digital twin or browse existing ones on https://twin.fun. In order to chat to a twin, you or the developer needs to get the twin's unique ID.
import opengradient as og
twins = og.Twins(api_key="your-key")
result = twins.chat(
twin_id="0x1abd463fd6244be4a1dc0f69e0b70cd5",
model=og.TEE_LLM.GROK_4_1_FAST_NON_REASONING,
messages=[{"role": "user", "content": "What do you think about AI?"}],
max_tokens=1000,
)
print(result.chat_output["content"])
import opengradient as og
hub = og.ModelHub(email="you@example.com", password="...")
repo = hub.create_model(
model_name="my-model",
model_desc="A prediction model",
version="1.0.0",
)
upload = hub.upload(
model_path="./model.onnx",
model_name=repo.name,
version=repo.initialVersion,
)
print(f"Model CID: {upload.modelCid}")
TextGenerationOutput: chat_output (dict), completion_output (str), finish_reason, transaction_hash, payment_hash, tee_signature, tee_timestamp, tee_id, tee_endpoint, tee_payment_addressTextGenerationStream: async iterable of StreamChunk objects (use async for)StreamChunk: choices[0].delta.content, choices[0].delta.tool_calls, usage (final chunk only), is_final, tee_signature, tee_timestampInferenceResult: model_output (dict of np.ndarray), transaction_hashModelRepository: name, initialVersionFileUploadResult: modelCid, sizellm.chat() Full Signatureasync def chat(
self,
model: TEE_LLM,
messages: List[Dict],
max_tokens: int = 100,
stop_sequence: Optional[List[str]] = None,
temperature: float = 0.0,
tools: Optional[List[Dict]] = None,
tool_choice: Optional[str] = None,
x402_settlement_mode: x402SettlementMode = x402SettlementMode.BATCH_HASHED,
stream: bool = False,
) -> Union[TextGenerationOutput, AsyncGenerator[StreamChunk, None]]:
llm.ensure_opg_approval(min_allowance=...) before the first LLM inference.llm.chat() and llm.completion() are async — use await or asyncio.run().finish_reason: "stop" / "length" = text response, "tool_calls" = function calls.async for and check chunk.choices[0].delta.content is not None before printing.result.chat_output as the assistant message, then append each tool result with role: "tool" and matching tool_call_id.