From palantir-pack
Optimize Palantir Foundry costs through compute tuning, incremental builds, and usage monitoring. Use when analyzing Foundry compute costs, reducing API usage, or implementing cost monitoring for Foundry workloads. Trigger with phrases like "palantir cost", "foundry billing", "reduce foundry costs", "foundry pricing", "foundry expensive".
npx claudepluginhub flight505/skill-forge --plugin palantir-packThis skill is limited to using the following tools:
Optimize Foundry compute and API costs through incremental transforms, right-sized Spark profiles, efficient pagination, and usage monitoring.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Optimize Foundry compute and API costs through incremental transforms, right-sized Spark profiles, efficient pagination, and usage monitoring.
| Cost Category | Driver | Optimization |
|---|---|---|
| Compute | Full rebuilds of large transforms | Use @incremental() |
| Compute | Oversized Spark profiles | Right-size @configure profiles |
| Storage | Redundant dataset snapshots | Configure retention policies |
| API | High-frequency polling | Use webhooks instead |
| API | Small page sizes | Use max page_size (500) |
from transforms.api import transform_df, Input, Output, incremental
# BEFORE: Full rebuild every run (expensive for large datasets)
@transform_df(Output("/out"), data=Input("/in"))
def expensive(data):
return data.filter(data.status == "active")
# AFTER: Only processes new/changed rows
@incremental()
@transform_df(Output("/out"), data=Input("/in"))
def cheap(data):
return data.filter(data.status == "active")
from transforms.api import configure
# DON'T: Default profile for everything
# DO: Match profile to actual data size
# Small data (< 1GB) — use lightweight transforms (no Spark)
from transforms.api import transform_polars
@transform_polars(Output("/out"), data=Input("/small_table"))
def small_job(data):
return data.filter(data["status"] == "active")
# Medium data (1-50GB) — default profile is fine
@transform_df(Output("/out"), data=Input("/medium_table"))
def medium_job(data):
return data.select("id", "name")
# Large data (50GB+) — explicit large profile
@configure(profile=["DRIVER_MEMORY_LARGE"])
@transform_df(Output("/out"), data=Input("/big_table"))
def large_job(data):
return data.groupBy("region").count()
# EXPENSIVE: Polling every 30 seconds
import time
while True:
result = client.ontologies.OntologyObject.list(
ontology="co", object_type="Order", page_size=100,
)
process_new_orders(result.data)
time.sleep(30) # 2,880 API calls/day!
# CHEAP: Webhook-driven (0 polling API calls)
# Register webhook for ontology.object.created events
# See palantir-webhooks-events skill
def log_api_usage(response):
"""Log rate limit headers to track usage patterns."""
remaining = response.headers.get("X-RateLimit-Remaining", "?")
limit = response.headers.get("X-RateLimit-Limit", "?")
print(f"API usage: {remaining}/{limit} remaining")
| Optimization | Risk | Mitigation |
|---|---|---|
| Incremental | Missed data on schema change | Schedule periodic full rebuild |
| Polars (no Spark) | Data too large for memory | Fall back to Spark for > 1GB |
| Aggressive caching | Stale data | Set TTL matching business requirements |
| Webhook-only | Missed events | Periodic reconciliation job |
For reference architecture, see palantir-reference-architecture.