Skill

palantir-cost-tuning

Optimizes Palantir Foundry compute/API costs using incremental transforms, Spark profile tuning, pagination, webhooks, and usage monitoring.

Python

data-engineering

monitoring

performance

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin palantir-pack

Tool Access

This skill is limited to using the following tools:

ReadGrep

Preview

Optimize Foundry compute and API costs through incremental transforms, right-sized Spark profiles, efficient pagination, and usage monitoring.

SKILL.md

Similar Skills

palantir-performance-tuning

1.9k

Optimizes Palantir Foundry API performance using pagination, client-side caching, batch retrieval, and Spark transform tuning for slow responses or high-throughput integrations.

3 tools

palantir-pack

databricks-cost-tuning

1.9k

Optimizes Databricks costs via SQL billing analysis, Python SDK cluster policies, spot instances, and resource right-sizing. Use for cloud spend reduction and usage audits.

5 tools

databricks-pack

snowflake-cost-tuning

1.9k

Optimizes Snowflake costs via resource monitors, warehouse right-sizing, auto-suspend tuning, and credit consumption analysis from billing data.

2 tools

snowflake-pack

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitMar 22, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Palantir Cost Tuning

Overview

Optimize Foundry compute and API costs through incremental transforms, right-sized Spark profiles, efficient pagination, and usage monitoring.

Prerequisites

Active Foundry enrollment with build history
Access to Foundry resource usage metrics
Understanding of transform build patterns

Instructions

Step 1: Cost Drivers in Foundry

Cost Category	Driver	Optimization
Compute	Full rebuilds of large transforms	Use `@incremental()`
Compute	Oversized Spark profiles	Right-size `@configure` profiles
Storage	Redundant dataset snapshots	Configure retention policies
API	High-frequency polling	Use webhooks instead
API	Small page sizes	Use max page_size (500)

Step 2: Convert Full Rebuilds to Incremental

from transforms.api import transform_df, Input, Output, incremental

# BEFORE: Full rebuild every run (expensive for large datasets)
@transform_df(Output("/out"), data=Input("/in"))
def expensive(data):
    return data.filter(data.status == "active")

# AFTER: Only processes new/changed rows
@incremental()
@transform_df(Output("/out"), data=Input("/in"))
def cheap(data):
    return data.filter(data.status == "active")

Step 3: Right-Size Spark Profiles

from transforms.api import configure

# DON'T: Default profile for everything
# DO: Match profile to actual data size

# Small data (< 1GB) — use lightweight transforms (no Spark)
from transforms.api import transform_polars
@transform_polars(Output("/out"), data=Input("/small_table"))
def small_job(data):
    return data.filter(data["status"] == "active")

# Medium data (1-50GB) — default profile is fine
@transform_df(Output("/out"), data=Input("/medium_table"))
def medium_job(data):
    return data.select("id", "name")

# Large data (50GB+) — explicit large profile
@configure(profile=["DRIVER_MEMORY_LARGE"])
@transform_df(Output("/out"), data=Input("/big_table"))
def large_job(data):
    return data.groupBy("region").count()

Step 4: Replace Polling with Webhooks

# EXPENSIVE: Polling every 30 seconds
import time
while True:
    result = client.ontologies.OntologyObject.list(
        ontology="co", object_type="Order", page_size=100,
    )
    process_new_orders(result.data)
    time.sleep(30)  # 2,880 API calls/day!

# CHEAP: Webhook-driven (0 polling API calls)
# Register webhook for ontology.object.created events
# See palantir-webhooks-events skill

Step 5: Monitor Usage

def log_api_usage(response):
    """Log rate limit headers to track usage patterns."""
    remaining = response.headers.get("X-RateLimit-Remaining", "?")
    limit = response.headers.get("X-RateLimit-Limit", "?")
    print(f"API usage: {remaining}/{limit} remaining")

Output

Incremental transforms reducing rebuild compute by 90%+
Right-sized Spark profiles matching actual data volumes
Webhook-driven architecture eliminating polling costs
Usage monitoring for ongoing optimization

Error Handling

Optimization	Risk	Mitigation
Incremental	Missed data on schema change	Schedule periodic full rebuild
Polars (no Spark)	Data too large for memory	Fall back to Spark for > 1GB
Aggressive caching	Stale data	Set TTL matching business requirements
Webhook-only	Missed events	Periodic reconciliation job

Resources

Next Steps

For reference architecture, see palantir-reference-architecture.