From dak
Searches and inspects GCP data assets via Dataplex catalog, BigQuery, Spanner, and BigLake. Use when you have a topic but lack specific resource IDs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dak:discovering-gcp-data-assetsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
If the asset was created or mentioned earlier in the same conversation, then
If the asset was created or mentioned earlier in the same conversation, then proceed with that asset instead of searching. Skip steps 2, 3, and 4.
Dataplex Lookup Context provides the richest metadata for data assets. You MUST prioritize using it for all Google Cloud assets, even if you already know their IDs.
bigquery-public-data project, Dataplex Lookup Context will fail. You
MUST skip Steps 3 and 4 and inspect the table directly using the bq CLI or
BigQuery MCP tools instead.You MUST use the Dataplex search command to discover assets and retrieve their
full projects/... entry names. This step is required even if you already know
the asset's short ID (e.g., my_dataset.my_table), because Step 4 strictly
requires the full entry name.
[!IMPORTANT]
The
--projectparameter MUST ALWAYS be provided. This project_id is used to attribute the search only and does NOT restrict the search scope. The project must have the dataplex API enabled and user must have thedataplex.entries.getpermissions.
Use this when the user describes the meaning or intent of the data (e.g., "Find Q4 product sales data").
Use the search_entries MCP tool
OR
gcloud dataplex entries search "<NATURAL_LANGUAGE_QUERY>" \
--project="<PROJECT_ID>" \
--semantic-search \
--limit=50
Use this for exact keyword matches or technical strings (e.g., name:order_v2).
AND, OR) MUST be
UPPERCASE. Use plural labels. for label filters (e.g.,
labels.env=prod).label. for label filters (e.g., label.env=prod).| for OR and , for AND within parentheses to
shorten queries (e.g., projectid:(prod|staging) or column:(id,name)).: for token/substring matches (e.g., name:sales).= for exact matches. REQUIRED for system, type, and
location.parent
filter if the project or dataset is known (e.g.,
parent:projects/<PROJECT_ID>).name:x: Substring/token match on resource ID.displayname:x: Substring/token match on display name.projectid:x: Substring/token match on GCP project ID.parent:x: Substring match on hierarchical path (e.g.,
projects/my-proj).location=x: Exact match on location (e.g., us-central1, us).column:x: Substring/token match on column names in the schema.system=x: Exact match on source system. Common values: bigquery,
storage, biglake, cloud_sql, cloud_spanner, cloud_bigtable,
pubsub.type=x: Exact match on entry type (e.g., bigquery-table,
storage-bucket, storage-folder).labels.key=value: (Semantic Mode ONLY) Exact match on a label.label.key=value: (Keyword Mode ONLY) Exact match on a label.createtime[>|<|=]x: Match assets created after/before date
YYYY-MM-DD.fully_qualified_name=x: Exact match on the FQN (e.g.,
bigquery:project.dataset.table).[!TIP]
Dataplex search results rely on metadata being ingested into the Universal Catalog (often via Discovery Scans). If an asset is missing from search, it may not be indexed. - Fallback 1: Try searching by the
fully_qualified_namequalifier. - Fallback 2: Use native tools (e.g.,bq show,gcloud storage) or specific skills for that asset type if you already know the ID.
gcloud dataplex entries search "<KEYWORD_SEARCH_QUERY>" \
--project="<PROJECT_ID>" \
--limit=50
[!IMPORTANT]
Handling Search Results and Avoiding Loops:
- No Results: If the search returns no entries:
- Variation Rule: You may try AT MOST 3 variations of the search query (e.g., switching AND/OR clauses, adding/removing
parent:, removingprojectid:orlocation:, tryingfully_qualified_name=).- Stop Rule: If after 3 attempts no results are found, STOP and inform the user. Ask for clarification, specifically the Dataplex full entry name if known, or identifiers such as project ID, dataset ID, or instance ID to help narrow the search. Example: "I couldn't find any tables matching that description after several attempts. If you know the Dataplex full entry name (
projects/...), please provide it. Otherwise, please provide any identifiers you know, such as project, dataset, or instance name, to help locate the asset."- Multiple Results:
- If more than 10 results are returned, state that many matches were found. Show the names of the first 5 entries and ask for clarification.
- If 2-10 results are returned and you cannot definitively choose, list them and ask the user.
- Single Result: Proceed to Step 3 with the full entry name.
- Avoid Infinite Loops: MUST NOT re-run identical or near-identical queries. If Dataplex fails to return the expected asset, prioritize asking the user for the exact resource ID or using Fallback 2 (Native Tools).
Criteria: Once candidate assets are returned, proceed to Step 4 using the full entry names from the search results.
You MUST use the Lookup Context command to fetch schema and deep metadata for the relevant results obtained from Step 3.
[!IMPORTANT]
The
--resourcesparameter MUST be the full name (starting withprojects/) returned by the search result. Passing short table IDs, GCS URIs, or fully qualifiedbigquery:prefixes is PROHIBITED and will fail.
Use the lookup_context MCP tool
OR
gcloud dataplex context lookup --resources="<FULL_ENTRY_NAME>"
Completion Criteria: The command returns the detailed schema and business context.
projects/).bigquery-public-data)
that has not fully synchronized its metadata with the Dataplex Universal
Catalog. While the entry appears in search, context lookup is unavailable.bq CLI).bq show) also returns "Not
Found", STOP. Do not restart the Dataplex discovery loop. Specifically ask
the user to verify the project ID and table ID.If you find yourself repeatedly searching for the same asset:
bq show").--project <PROJECT_ID> arguments were not provided--project <PROJECT_ID> arguments
does not have the Dataplex API enabled or the user is missing necessary IAM
permissions.npx claudepluginhub gemini-cli-extensions/data-agent-kit-starter-pack --plugin dakExecutes SQL queries, retrieves dataset/table metadata, lists datasets and tables, and searches the data catalog in BigQuery.
Search entries, lookup metadata and relationships, and explore aspect types for data assets in the Knowledge Catalog.
Searches DataHub catalog to discover datasets, find entities by platform/domain, and answer ad-hoc questions about metadata ownership and PII.