From dak
Federates BigQuery to remote Iceberg catalogs (Databricks Unity, AWS Glue) for cross-cloud querying via Lakehouse. Use to query Databricks or S3 data from GCP engines.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dak:federate-lakehouse-catalogThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill describes how to set up a federated catalog in BigQuery to query
This skill describes how to set up a federated catalog in BigQuery to query remote catalogs like Databricks Unity Catalog or AWS Glue Data Catalog data in AWS over the public internet.
Before running any commands, the agent MUST collect the following information from the user:
Only proceed to the next steps once this information is confirmed.
Verify that the required Google Cloud APIs are enabled for the project:
gcloud services check biglake.googleapis.com
If the API is not enabled, explicitly ask the user for permission to enable it. Do NOT proceed without their confirmation.
Store the Databricks client ID and secret in Secret Manager. Ensure the
secretmanager.googleapis.com API is enabled. The secret MUST be in the
same region as your Lakehouse catalog.
credentials.json:{
"client_id": "<CLIENT_ID>",
"client_secret": "<CLIENT_SECRET>"
}
gcloud config set api_endpoint_overrides/secretmanager https://secretmanager.<REGION>.rep.googleapis.com/
gcloud secrets create <SECRET_NAME> \
--location="<REGION>" \
--project="<PROJECT_ID>" \
--data-file=credentials.json
Create a BigLake Iceberg catalog of type federated pointing to Databricks.
gcloud alpha biglake iceberg catalogs create <CATALOG_NAME> \
--project="<PROJECT_ID>" \
--primary-location="<REGION>" \
--catalog-type="federated" \
--federated-catalog-type="unity" \
--secret-name="projects/<PROJECT_ID>/locations/<REGION>/secrets/<SECRET_NAME>" \
--unity-instance-name="<UNITY_INSTANCE_NAME>" \
--unity-catalog-name="<UNITY_CATALOG_NAME>" \
--refresh-interval="300s"
Grant the service account created for the catalog access to read the secret.
gcloud alpha biglake iceberg catalogs describe <CATALOG_NAME> \
--project="<PROJECT_ID>" \
--location="<REGION>" \
--format="value(biglake-service-account-id)"
gcloud secrets add-iam-policy-binding <SECRET_NAME> \
--project="<PROJECT_ID>" \
--location="<REGION>" \
--member="serviceAccount:<SERVICE_ACCOUNT_EMAIL>" \
--role="roles/secretmanager.secretAccessor"
Lakehouse provisions a Google service account ID after catalog creation. Create the AWS IAM role with a placeholder trust policy first.
trust_policy.json:{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "accounts.google.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"accounts.google.com:aud": ["PLACEHOLDER_VALUE"],
"accounts.google.com:sub": ["PLACEHOLDER_VALUE"]
}
}
}
]
}
aws iam create-role \
--role-name <AWS_ROLE_NAME> \
--assume-role-policy-document file://trust_policy.json \
--max-session-duration 43200
Attach a policy that allows Lakehouse to read from Glue and S3.
[!IMPORTANT] Safe IAM Scoping: The example below uses wildcard structures for illustration. You MUST consult with the user to scope the
ResourceARNs to their specific catalog, database, and S3 buckets. Do NOT blindly apply wildcard permissions.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueRead",
"Effect": "Allow",
"Action": [
"glue:GetCatalog",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables"
],
"Resource": "arn:aws:glue:<AWS_REGION>:<AWS_ACCOUNT_ID>:catalog"
},
{
"Sid": "S3Read",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<SPECIFIC_BUCKET>",
"arn:aws:s3:::<SPECIFIC_BUCKET>/*"
]
}
]
}
Attach this permissions policy to your IAM role.
When creating an AWS Glue federated catalog, the --glue-warehouse MUST be
set to your 12-digit AWS Account ID string (not an S3 bucket URI). Best
Practice: Initialize the catalog without specifying a refresh schedule to
prevent premature metadata synchronization failures while AWS trust
relationships are propagating.
gcloud alpha biglake iceberg catalogs create <CATALOG_NAME> \
--project="<PROJECT_ID>" \
--primary-location="<REGION>" \
--catalog-type="federated" \
--federated-catalog-type="glue" \
--glue-warehouse="<AWS_ACCOUNT_ID>" \
--glue-aws-region="<AWS_REGION>" \
--glue-aws-role-arn="arn:aws:iam::<AWS_ACCOUNT_ID>:role/<AWS_ROLE_NAME>"
Extract the biglake-service-account-id from the created catalog, and update
your AWS IAM role's trust policy to replace PLACEHOLDER_VALUE in the aud and
sub conditions with this Google Service Agent ID.
Update the catalog to activate background refresh once the trust policy is updated.
gcloud alpha biglake iceberg catalogs update <CATALOG_NAME> \
--project="<PROJECT_ID>" \
--refresh-interval="300s"
Once set up, you can query the tables via BigQuery.
SELECT * FROM `<PROJECT_ID>.<CATALOG_NAME>.<NAMESPACE>.<TABLE_NAME>` LIMIT 10;
[!IMPORTANT] Regional Isolation: The Secret Manager secret and the Lakehouse catalog MUST be created in the exact same region.
[!TIP] Region Pairing Best Practice: When setting up the federated catalog, choose GCP regions with "Low Latency Dedicated" or "Partner CCI" to ensure optimal performance when federating large datasets across clouds. Examples of optimal pairings: - AWS
us-east-1(N. Virginia) pairs best with GCPus-east4(Ashburn, VA) - AWSus-west-2(Oregon) pairs best with GCPus-west1(The Dalles, OR) - AWSeu-west-2(London) pairs best with GCPeurope-west2(London) - AWSeu-central-1(Frankfurt) pairs best with GCPeurope-west3(Frankfurt) For the exhaustive list of mappings, read the full capabilities table at: https://docs.cloud.google.com/lakehouse/docs/regions-capabilities-cross-cloud-lakehouse
[!IMPORTANT] BigQuery Query Location: When querying the federated catalog via BigQuery, you MUST ensure the query runs in the same region as the catalog (e.g.,
us-east4). If using thebqCLI, use the--locationflag.
After completing the setup, the agent MUST validate that the federation is working and propose next steps to the user.
Validate the Connection:
Attempt to list the namespaces or tables in the newly federated catalog
using the bq CLI or BigQuery API. For example:
bq ls --location="<REGION>" <PROJECT_ID>.<CATALOG_NAME>
If the command returns a list of namespaces/schemas, the federation is successful.
Troubleshooting:
biglake-service-account-id and that the GCP and AWS regions match your
configuration.roles/secretmanager.secretAccessor.Explore and Propose:
npx claudepluginhub gemini-cli-extensions/data-agent-kit-starter-pack --plugin dakInventories and audits AWS Glue Data Catalog assets across S3 tables, Redshift-federated, and remote Iceberg catalogs. For listing catalogs, databases, tables and data landscape overviews.
Analyze lakehouse data interactively via Fabric Lakehouse Livy API sessions using PySpark/Spark SQL for DataFrames, cross-lakehouse joins, Delta time-travel, and unstructured/JSON data.
Guides BigQuery engineering with bq CLI for queries, table ops, data load/export; GoogleSQL syntax, functions, window funcs; partitioning, clustering, optimization.