Skill

discovering-gcp-data-assets

From dak

Searches and inspects GCP data assets via Dataplex catalog, BigQuery, Spanner, and BigLake. Use when you have a topic but lack specific resource IDs.

GCP

BigQuery

data-engineering

Popularity

Stars

110

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/dak:discovering-gcp-data-assets

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

If the asset was created or mentioned earlier in the same conversation, then

SKILL.md

241 lines · ~2.5k tokens

Stats

LanguageTypeScript

Stars110

Forks17

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Instructions

Step 1: Prioritize Assets from the Conversation

If the asset was created or mentioned earlier in the same conversation, then proceed with that asset instead of searching. Skip steps 2, 3, and 4.

Step 2: Handle Public Datasets or Proceed to Search

Dataplex Lookup Context provides the richest metadata for data assets. You MUST prioritize using it for all Google Cloud assets, even if you already know their IDs.

Public Datasets (Direct Inspection): If the requested asset belongs to the bigquery-public-data project, Dataplex Lookup Context will fail. You MUST skip Steps 3 and 4 and inspect the table directly using the bq CLI or BigQuery MCP tools instead.
All Other Assets (Proceed to Step 3): For all other BigQuery, Cloud Storage, Spanner, BigLake Iceberg or general GCP data assets (whether their IDs are known or missing), you MUST proceed to Step 3 to search the Dataplex catalog and obtain their full Entry Name.

Step 3: Execute Discovery Search

You MUST use the Dataplex search command to discover assets and retrieve their full projects/... entry names. This step is required even if you already know the asset's short ID (e.g., my_dataset.my_table), because Step 4 strictly requires the full entry name.

[!IMPORTANT]

The --project parameter MUST ALWAYS be provided. This project_id is used to attribute the search only and does NOT restrict the search scope. The project must have the dataplex API enabled and user must have the dataplex.entries.get permissions.

A. Semantic Search (Natural Language Intent)

Use this when the user describes the meaning or intent of the data (e.g., "Find Q4 product sales data").

Use the search_entries MCP tool

gcloud dataplex entries search "<NATURAL_LANGUAGE_QUERY>" \
  --project="<PROJECT_ID>" \
  --semantic-search \
  --limit=50

B. Keyword Search (Technical Strings)

Use this for exact keyword matches or technical strings (e.g., name:order_v2).

Search Query Rules (MANDATORY)

Mode-Specific Syntax:
- Semantic Search: Logical operators (AND, OR) MUST be UPPERCASE. Use plural labels. for label filters (e.g., labels.env=prod).
- Keyword Search: Operators are case-insensitive. Use singular label. for label filters (e.g., label.env=prod).
Abbreviated Logic: Use | for OR and , for AND within parentheses to shorten queries (e.g., projectid:(prod|staging) or column:(id,name)).
Exact vs. Token Match:
- Use : for token/substring matches (e.g., name:sales).
- Use = for exact matches. REQUIRED for system, type, and location.
Singular Keywords: When performing keyword search, ALWAYS convert plurals to singular (e.g., "product" NOT "products"). Semantic search handles singular/plural variations and synonyms automatically.
Scope Restriction: You SHOULD restrict the search scope using a parent filter if the project or dataset is known (e.g., parent:projects/<PROJECT_ID>).

Dataplex Search Syntax Reference

name:x: Substring/token match on resource ID.
displayname:x: Substring/token match on display name.
projectid:x: Substring/token match on GCP project ID.
parent:x: Substring match on hierarchical path (e.g., projects/my-proj).
location=x: Exact match on location (e.g., us-central1, us).
column:x: Substring/token match on column names in the schema.
system=x: Exact match on source system. Common values: bigquery, storage, biglake, cloud_sql, cloud_spanner, cloud_bigtable, pubsub.
type=x: Exact match on entry type (e.g., bigquery-table, storage-bucket, storage-folder).
labels.key=value: (Semantic Mode ONLY) Exact match on a label.
label.key=value: (Keyword Mode ONLY) Exact match on a label.
createtime[>|<|=]x: Match assets created after/before date YYYY-MM-DD.
fully_qualified_name=x: Exact match on the FQN (e.g., bigquery:project.dataset.table).

[!TIP]

Dataplex search results rely on metadata being ingested into the Universal Catalog (often via Discovery Scans). If an asset is missing from search, it may not be indexed. - Fallback 1: Try searching by the fully_qualified_name qualifier. - Fallback 2: Use native tools (e.g., bq show, gcloud storage) or specific skills for that asset type if you already know the ID.

gcloud dataplex entries search "<KEYWORD_SEARCH_QUERY>" \
  --project="<PROJECT_ID>" \
  --limit=50

[!IMPORTANT]

Handling Search Results and Avoiding Loops:

No Results: If the search returns no entries:

Variation Rule: You may try AT MOST 3 variations of the search query (e.g., switching AND/OR clauses, adding/removing parent:, removing projectid: or location:, trying fully_qualified_name=).

Stop Rule: If after 3 attempts no results are found, STOP and inform the user. Ask for clarification, specifically the Dataplex full entry name if known, or identifiers such as project ID, dataset ID, or instance ID to help narrow the search. Example: "I couldn't find any tables matching that description after several attempts. If you know the Dataplex full entry name (projects/...), please provide it. Otherwise, please provide any identifiers you know, such as project, dataset, or instance name, to help locate the asset."

Multiple Results:

If more than 10 results are returned, state that many matches were found. Show the names of the first 5 entries and ask for clarification.

If 2-10 results are returned and you cannot definitively choose, list them and ask the user.

Single Result: Proceed to Step 3 with the full entry name.

Avoid Infinite Loops: MUST NOT re-run identical or near-identical queries. If Dataplex fails to return the expected asset, prioritize asking the user for the exact resource ID or using Fallback 2 (Native Tools).

Criteria: Once candidate assets are returned, proceed to Step 4 using the full entry names from the search results.

Step 4: Lookup Context

You MUST use the Lookup Context command to fetch schema and deep metadata for the relevant results obtained from Step 3.

[!IMPORTANT]

The --resources parameter MUST be the full name (starting with projects/) returned by the search result. Passing short table IDs, GCS URIs, or fully qualified bigquery: prefixes is PROHIBITED and will fail.

Command Execution

Use the lookup_context MCP tool

gcloud dataplex context lookup --resources="<FULL_ENTRY_NAME>"

Completion Criteria: The command returns the detailed schema and business context.

Troubleshooting

Context Lookup Fails or "Resource not found"

Cause: Short table names were used improperly.
Fix: Ensure you use the correct entry name format from the search results (starting with projects/).

Search Returns No Results

Cause: Plural terms in keyword search or lack of scoping.
Fix: Switch to singular keywords. For semantic search, try more descriptive natural language.

Context Lookup Fails with "NOT_FOUND" (despite correct format)

Cause: The table belongs to a project (e.g., bigquery-public-data) that has not fully synchronized its metadata with the Dataplex Universal Catalog. While the entry appears in search, context lookup is unavailable.
Fix: Fall back to direct inspection using native tools (e.g., bq CLI).
Stop Rule: If the native tool (e.g., bq show) also returns "Not Found", STOP. Do not restart the Dataplex discovery loop. Specifically ask the user to verify the project ID and table ID.

Breaking the Research Loop

If you find yourself repeatedly searching for the same asset:

STOP.
State what you have tried (e.g., "I tried searching Dataplex with X and Y, and checked bq show").
Ask the user for the exact project, dataset, and table ID.

Search Fails with "--project: Must be specified."

Cause: --project <PROJECT_ID> arguments were not provided
Fix: Provide a project which will be used to authorize and attribute the search request.

Search Fails with "PERMISSION_DENIED"

Cause: The project_id provided in the --project <PROJECT_ID> arguments does not have the Dataplex API enabled or the user is missing necessary IAM permissions.
Fix: Ask the user if they have a project which has the Dataplex API enabled with the dataplex.entries.get permission

discovering-gcp-data-assets

Popularity

Invocation

Context Preview

SKILL.md

discovering-gcp-data-assets

Popularity

Invocation

Context Preview

SKILL.md

Instructions

Step 1: Prioritize Assets from the Conversation

Step 2: Handle Public Datasets or Proceed to Search

Step 3: Execute Discovery Search

A. Semantic Search (Natural Language Intent)

B. Keyword Search (Technical Strings)

Search Query Rules (MANDATORY)

Dataplex Search Syntax Reference

Step 4: Lookup Context

Command Execution

Troubleshooting

Context Lookup Fails or "Resource not found"

Search Returns No Results

Context Lookup Fails with "NOT_FOUND" (despite correct format)

Breaking the Research Loop

Search Fails with "--project: Must be specified."

Search Fails with "PERMISSION_DENIED"

Similar Skills

Instructions

Step 1: Prioritize Assets from the Conversation

Step 2: Handle Public Datasets or Proceed to Search

Step 3: Execute Discovery Search

A. Semantic Search (Natural Language Intent)

B. Keyword Search (Technical Strings)

Search Query Rules (MANDATORY)

Dataplex Search Syntax Reference

Step 4: Lookup Context

Command Execution

Troubleshooting

Context Lookup Fails or "Resource not found"

Search Returns No Results

Context Lookup Fails with "NOT_FOUND" (despite correct format)

Breaking the Research Loop

Search Fails with "--project: Must be specified."

Search Fails with "PERMISSION_DENIED"

Similar Skills