From dataverse
Bulk reads, multi-page iteration, and analytics over Dataverse data via Python SDK and Web API. Supports filtering, SQL queries, aggregation, pandas workflows.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dataverse:dv-queryThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **This skill uses Python exclusively.** Do not use Node.js, JavaScript, or any other language for Dataverse scripting. See the overview skill's Hard Rules.
This skill uses Python exclusively. Do not use Node.js, JavaScript, or any other language for Dataverse scripting. See the overview skill's Hard Rules.
All reads use the SDK — not urllib, requests, or raw HTTP. This is the same rule as dv-data's SDK-First Rule, applied to reads. If you find yourself writing urllib.request or get_token() for a query, STOP — the SDK handles it. The only exceptions are $apply aggregation and N:N $expand, documented below.
When the user asks a question about their data, pick the approach by what they're asking, not by which API you know:
| User asks... | Approach | Why |
|---|---|---|
| "show me open tickets" / simple filter | MCP read_query (if available) or client.records.get() with $filter | Small result, no aggregation |
| "how many X" / simple count | MCP read_query or client.records.get() with count=True | Single number |
| Single-table aggregation (most/sum/avg/top-N) | $apply server-side aggregation (raw Web API) | One HTTP call, returns only grouped results |
| Cross-table aggregation | client.dataframe.get() with minimal $select + pd.merge() | Server can't join; pandas merge is fast with minimal columns |
| "show me X with related Y" / resolve lookups | client.records.get() with $expand or QueryBuilder (b8+) | Lookup resolution |
| "export this data" / bulk extract | client.dataframe.get() with select= | Direct to DataFrame → CSV |
| "load into notebook" / interactive analysis | client.dataframe.get() or QueryBuilder .to_dataframe() (b8+) | pandas native |
| "find duplicates" / complex filter | client.records.get() with $filter or QueryBuilder (b8+) | SDK handles pagination |
| Simple filtered read (<5K rows) | client.query.sql() | Lightweight SQL SELECT with WHERE, ORDER BY, TOP |
Key principle: Let the server do the work. For single-table aggregation, use $apply — it runs server-side and returns only grouped results. For cross-table questions, use client.dataframe.get() with minimal $select on each table, then pd.merge() — the merge itself is sub-second; the bottleneck is network transfer, which $select minimizes.
Always query the live Dataverse environment. Do not query local copies, cached files, or source databases when the user expects results from Dataverse. The data in Dataverse is the source of truth.
client.query.sql()client.query.sql() uses the Dataverse Web API ?sql= parameter — a limited SQL subset (same limitations as MCP read_query). It does NOT support GROUP BY, JOINs, HAVING, DISTINCT, or subqueries. Results are capped at ~5,000 rows.
When to use: Fast filtered reads on tables with <5K rows. For these, it's significantly faster (~2-6s) than page iteration or DataFrames because it's a single HTTP call.
# Fast filtered read on small tables (<5K rows)
results = client.query.sql(
"SELECT TOP 100 name, estimatedvalue "
"FROM opportunity "
"WHERE statecode = 0 "
"ORDER BY estimatedvalue DESC"
)
for r in results:
print(f"{r['name']}: ${r.get('estimatedvalue', 0):,.0f}")
Do NOT use for: Tables >5K rows (results silently truncated), aggregation (no GROUP BY), or cross-table queries (no JOINs). Use $apply for single-table aggregation and client.dataframe.get() + pd.merge() for cross-table.
| Need | Use instead |
|---|---|
| Create, update, delete records | dv-data |
| Create tables, columns, relationships | dv-metadata |
| Export or deploy solutions | dv-solution |
import os, sys
sys.path.insert(0, os.path.join(os.getcwd(), "scripts"))
from auth import get_client
# get_client sets a plugin attribution context on the User-Agent header.
# Do not modify the context value — it is a closed schema for server-side
# telemetry (app/skill/agent). Never include secrets or PII.
client = get_client("dv-query")
get_client(skill) handles auth, environment URL, and plugin attribution (User-Agent tagging). See scripts/auth.py. For scripts that run to completion, wrap the returned client in a with statement for automatic connection cleanup.
Getting this wrong causes 400 errors.
| Property type | Convention | Example | When used |
|---|---|---|---|
| Structural (columns) | LogicalName — always lowercase | new_name, new_priority | $select, $filter, $orderby |
| Navigation (lookups) | Navigation Property Name — case-sensitive, matches $metadata | new_AccountId | $expand |
parentaccountid, ownerid): lowercase$metadata SchemaName (e.g., new_AccountId)client.records.get() is the primary read method — works on all SDK versions (b6+). It returns a page iterator for multi-record queries and a single Record for by-GUID fetch. Always use select= to limit columns.
for page in client.records.get(
"new_ticket",
select=["new_name", "new_priority", "new_status"],
filter="new_status eq 100000000",
orderby=["new_name asc"],
top=50,
):
for r in page:
print(r["new_name"], r["new_priority"])
client.records.get() returns a page iterator — always iterate pages and then records within each page. Each record is a Record object that supports dict-like access: r["column"], r.get("column"), r.keys(). Do not use r.data.get() — use r.get() directly.
record = client.records.get("new_ticket", "<record-guid>",
select=["new_name", "new_priority", "new_status"])
print(record["new_name"])
To show display names instead of GUIDs, request the formatted value annotation via include_annotations:
for page in client.records.get("opportunity",
select=["name", "estimatedvalue", "_parentaccountid_value"],
include_annotations="OData.Community.Display.V1.FormattedValue",
):
for r in page:
account_name = r.get("_parentaccountid_value@OData.Community.Display.V1.FormattedValue")
print(f"{r['name']} — {account_name}")
You MUST pass include_annotations — without it, the Prefer: odata.include-annotations header is not sent and formatted values are not in the response. Use "*" for all annotations or the specific annotation name above.
Formatted values are available for lookup, choice, status, and owner fields.
for page in client.records.get("opportunity",
select=["name", "estimatedvalue"],
expand=["parentaccountid($select=name)"], # nested $select avoids fetching all account columns
):
for r in page:
account = r.get("parentaccountid") or {}
print(f"{r['name']} — {account.get('name', 'Unknown')}")
Always use nested $select inside $expand — without it, Dataverse returns every column on the related entity, which wastes bandwidth and memory.
for page in client.records.get(
"new_ticket",
select=["new_name", "new_priority", "new_status"],
expand=["new_CustomerId($select=new_name)", "new_AgentId($select=new_name)"], # nested $select + case-sensitive nav props
):
for r in page:
customer = r.get("new_CustomerId") or {}
agent = r.get("new_AgentId") or {}
print(f"{r['new_name']} | {customer.get('new_name','')} | {agent.get('new_name','')}")
expanduses the Navigation Property Name (new_CustomerId), not the lowercase logical name (new_customerid). Using lowercase causes a 400 error.
For aggregations and many-to-many expansion, the SDK doesn't have direct support — use raw Web API. See references/web-api-advanced.md for full code samples.
Quick reference:
$expand on N:N relationships: GET /<entitySet>?$expand=<n:n_nav>($select=...) — single page only; follow @odata.nextLink for >5,000 results.$apply for aggregations: runs server-side, returns grouped results in one call. Patterns: groupby((col),aggregate(metric with sum as total)), aggregate($count as count), aggregate(amount with average as avg). 50K source-record limit.$apply only works within one entity set. Use client.dataframe.get(entity, select=[...]) per table → pd.merge() → groupby(). Always pass select=; without it transfers 10-20× more data.Available in PowerPlatform-Dataverse-Client b8+. Chainable builder for complex queries that would be awkward as a single OData URL or FetchXML string. Full reference and examples in references/querybuilder.md.
For interactive querying in notebooks (auth + DataverseClient + DataFrame display), see references/jupyter-setup.md.
| Status | Cause | Fix |
|---|---|---|
| 400 | Wrong field casing in $select/$filter (must be lowercase LogicalName) or $expand (must be case-sensitive Navigation Property Name) | Verify names via EntityDefinitions(LogicalName='...')/Attributes |
| 400 | Unsupported SQL in MCP read_query or client.query.sql() (DISTINCT, HAVING, subqueries, OFFSET, JOINs, GROUP BY) | Use $apply for single-table aggregation, or client.dataframe.get() + pandas for cross-table |
| 404 | Table logical name not found | Check spelling — use client.tables.get("<name>") to verify |
| 429 | Rate limited | SDK retries automatically; reduce page size or add delays between pages |
For HttpError handling in SDK scripts, see the error handling pattern in dv-data.
.py files — curly quotes and em dashes cause SyntaxError on Windows.python -c for multiline code — write a .py file instead.str(uuid.uuid4()), not shell backtick substitution.npx claudepluginhub anthropics/claude-plugins-official --plugin dataverseCRUD and bulk data operations on Dataverse tables using the Python SDK: create, update, delete, upsert, CSV import, foreign-key loads, AI sample data.
Populates Dataverse tables with sample records via OData API for testing and demoing Power Pages sites. Handles prerequisites, table discovery from manifest or API, and ordered insertions.
Explores Bauplan lakehouse data using Python SDK: inspect namespaces, tables, schemas, samples, profiling queries; export results to files. Read-only, phased execution produces summary.md.