From palantir-pack
Optimizes Palantir Foundry compute/API costs using incremental transforms, Spark profile tuning, pagination, webhooks, and usage monitoring.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin palantir-packThis skill is limited to using the following tools:
Optimize Foundry compute and API costs through incremental transforms, right-sized Spark profiles, efficient pagination, and usage monitoring.
Optimizes Palantir Foundry API performance using pagination, client-side caching, batch retrieval, and Spark transform tuning for slow responses or high-throughput integrations.
Optimizes Databricks costs via SQL billing analysis, Python SDK cluster policies, spot instances, and resource right-sizing. Use for cloud spend reduction and usage audits.
Optimizes Snowflake costs via resource monitors, warehouse right-sizing, auto-suspend tuning, and credit consumption analysis from billing data.
Share bugs, ideas, or general feedback.
Optimize Foundry compute and API costs through incremental transforms, right-sized Spark profiles, efficient pagination, and usage monitoring.
| Cost Category | Driver | Optimization |
|---|---|---|
| Compute | Full rebuilds of large transforms | Use @incremental() |
| Compute | Oversized Spark profiles | Right-size @configure profiles |
| Storage | Redundant dataset snapshots | Configure retention policies |
| API | High-frequency polling | Use webhooks instead |
| API | Small page sizes | Use max page_size (500) |
from transforms.api import transform_df, Input, Output, incremental
# BEFORE: Full rebuild every run (expensive for large datasets)
@transform_df(Output("/out"), data=Input("/in"))
def expensive(data):
return data.filter(data.status == "active")
# AFTER: Only processes new/changed rows
@incremental()
@transform_df(Output("/out"), data=Input("/in"))
def cheap(data):
return data.filter(data.status == "active")
from transforms.api import configure
# DON'T: Default profile for everything
# DO: Match profile to actual data size
# Small data (< 1GB) — use lightweight transforms (no Spark)
from transforms.api import transform_polars
@transform_polars(Output("/out"), data=Input("/small_table"))
def small_job(data):
return data.filter(data["status"] == "active")
# Medium data (1-50GB) — default profile is fine
@transform_df(Output("/out"), data=Input("/medium_table"))
def medium_job(data):
return data.select("id", "name")
# Large data (50GB+) — explicit large profile
@configure(profile=["DRIVER_MEMORY_LARGE"])
@transform_df(Output("/out"), data=Input("/big_table"))
def large_job(data):
return data.groupBy("region").count()
# EXPENSIVE: Polling every 30 seconds
import time
while True:
result = client.ontologies.OntologyObject.list(
ontology="co", object_type="Order", page_size=100,
)
process_new_orders(result.data)
time.sleep(30) # 2,880 API calls/day!
# CHEAP: Webhook-driven (0 polling API calls)
# Register webhook for ontology.object.created events
# See palantir-webhooks-events skill
def log_api_usage(response):
"""Log rate limit headers to track usage patterns."""
remaining = response.headers.get("X-RateLimit-Remaining", "?")
limit = response.headers.get("X-RateLimit-Limit", "?")
print(f"API usage: {remaining}/{limit} remaining")
| Optimization | Risk | Mitigation |
|---|---|---|
| Incremental | Missed data on schema change | Schedule periodic full rebuild |
| Polars (no Spark) | Data too large for memory | Fall back to Spark for > 1GB |
| Aggressive caching | Stale data | Set TTL matching business requirements |
| Webhook-only | Missed events | Periodic reconciliation job |
For reference architecture, see palantir-reference-architecture.