From data-agent-kit-starter-pack
Guides Jupyter notebooks for data analysis, exploration, visualization, and BigQuery queries via %%bqsql magic. Covers best practices for execution, validation, data cleaning, plotting, and ML workflows. Activates for multi-step data tasks or notebook requests.
npx claudepluginhub gemini-cli-extensions/data-agent-kit-starter-pack --plugin data-agent-kit-starter-packThis skill uses the workspace's default tool permissions.
Before choosing to use a notebook, evaluate the task complexity using these
Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.
Monitors deployed URLs for regressions in HTTP status, console errors, performance metrics, content, network, and APIs after deploys, merges, or upgrades.
Provides React and Next.js patterns for component composition, compound components, state management, data fetching, performance optimization, forms, routing, and accessible UIs.
Before choosing to use a notebook, evaluate the task complexity using these heuristics.
Use a notebook if you meet at least one of these criteria:
Do NOT use a notebook ONLY if:
Golden Rule of Data Storytelling: If any analytical insight, trend, or comparison is involved, favor a notebook and a visualization. A notebook is the "standard" environment for our developer workflow; do not avoid it because of "overhead".
[!IMPORTANT] Agent execution rules: Your behavior MUST depend on whether the
notebook_execute_celltool is available in your current context:
- If notebook
execute_celltool is available: You MUST follow the incremental GENERATE CELL -> EXECUTE CELL -> VALIDATE flow.- If notebook
execute_celltool is NOT available: You MUST generate the complete notebook and request user execution.
execute_cell tool is available: Follow the STEP BY
STEP GENERATE CELL -> EXECUTE CELL -> VALIDATE OUTPUT flow. Generate
ONE cell, execute it, then verify the output. If the output is data
(e.g. a dataframe), you MUST inspect it to confirm the logic is correct
before generating the next step. Batch generation of an entire notebook
is strictly prohibited because error propagation in notebooks is
expensive to fix.execute_cell tool is NOT available:
@skill:discovering-gcp-data-assets or
BigQuery list tools to find the correct project.dataset.table before
writing ANY code. If the table ID is missing, ask the user.%%bqsql magic cell followed immediately by a Python visualization
cell for those results). Use descriptive markdown cells to separate and
document different logical sections.Notebooks run in specific Kernels (execution backends). You MUST ensure the
kernelβs Python environment contains the necessary libraries (bigframes,
ipykernel, etc.).
@skill:managing-python-dependencies to verify if
a virtual environment exists. If not, create one. Ensure ipykernel is
installed in that environment. Install any other relevant libraries.[!IMPORTANT]
HARD STOP on kernel failure: If a cell execution returns "no active kernel" or any kernel-not-found error, you MUST stop immediately. Do NOT scaffold, generate, or insert any further cells. Inform the user which kernel is needed (e.g., PySpark / Dataproc Serverless) and wait for explicit confirmation that a kernel is active before proceeding with notebook execution.
Before installing any python libraries, you MUST use
@skill:managing-python-dependencies to detect how python dependencies are
managed in the project.
Since these are often ephemeral or managed by GCP:
%pip install cell, run
%pip list or import <package> to confirm the package is not already
present. Managed runtimes (Dataproc Serverless, Colab) pre-install many
common packages. Only install what is confirmed missing.%pip install <package> in the first cell if a package is confirmed
missing and it's the only way to modify the runtime.When in doubt about the kernel type or preferred installation method, ask the user for clarification.
Guidelines for performing exploratory data analysis, data cleaning, and visualization in notebooks.
The notebook should read like a story. While you have flexibility (e.g., multiple visualizations for one data cell, or data cells building on each other), aim for this general flow:
# Retention Analysis)## Exploring User Retention)%%bqsql magics)
df.head() or assert sanity checks.df.plot()).Repeat steps 2-5 for each new sub-topic or insight. You can have multiple Data cells before a Visualization, or multiple Visualizations from one Data cell. The key is to keep them grouped logically and separated by Markdown headers.
Final Summary (Markdown Cell)
Next Steps: After the notebook has been successfully executed and verified, and the summary is complete, notify the user and propose next step suggestions.
Refer to the following resources for guidance on specific notebook topics:
Use BigFrames magics %%bqsql for BigQuery SQL queries. These cells support
native BigQuery SQL execution and data export to BigFrames dataframes.
[!IMPORTANT]
- Unless specified by the user, always use SQL for querying BigQuery.
- DO NOT use the standard BigQuery Python client library (
google.cloud.bigquery) orpandas.read_gbq.- Mandatory dataframe export: Always provide a dataframe name e.g.
%%bqsql <df_name>. This makes it easy to use results in follow up Python cells.- Verify that
bigframesversion number2.38.0and above is installed in the notebook runtime environment. If it is missing, ask the user if they would like you to upgrade for them.
Example %%bqsql magic usage:
# Initialize BigFrames and load %%bqsql magics
import bigframes
import bigframes.pandas as bpd
%load_ext bigframes
[!CAUTION] Always use
%load_ext bigframesexactly as shown. Do not load submodules β for example,%load_ext bigframes.magicsor%load_ext bigframes.bigqueryare not valid and must not be used.
[!IMPORTANT] The
bigframeslibrary must be installed. Determine if bigframes needs to be installed by following @skill:managing-python-dependencies.
%%bqsql df_sample
SELECT * FROM `project.dataset.table` LIMIT 10
[!CAUTION] 1. NO Python SDK for Queries: Do not switch to
client.query(sql).to_dataframe()if SQL fails. Fix the SQL syntax instead. 2. NO Mixing Logic: Do not put Python code in the same cell as%%bqsqlmagics.
Magic cells with %%bqsql <df_name> produce a BigQuery DataFrame. In
subsequent cells, you can use <df_name> directly.
[!IMPORTANT] You MUST use BigFrames for data exploration, manipulation, splitting etc. You MUST use BQML SQL or bigframes.ml for machine learning tasks. You MUST NOT use pandas or Scikit-learn.
.to_pandas(): You MUST NOT use .to_pandas() to download the
entire dataset into memory. There are some exceptions:
to_pandas().to_pandas()read_gbq() for SQL: Do not write SQL queries and execute them
with read_gbq(). Use BigFrames Dataframe/Series methods instead.bigframes.ml.df.col.str.*, df.col.dt.*) over
remote UDFs.Series.map() or DataFrame.apply()..dtypes after loading, and use display() with .head() or .peek().model.to_gbq(). To load a
persisted model, use bpd.read_gbq_model().Integration with machine learning workflows and best practices. - Guide: Use
@skill:ml-best-practices. - MUST READ WHEN: The task involves machine
learning, training a model, clustering, classification, regression, or
time-series forecasting.
If any "MUST READ WHEN" condition is met, you MUST read the corresponding guide before proceeding.