Plugin

data-agent-kit-starter-pack

Name: data-agent-kit-starter-pack
Author: gemini-cli-extensions

Build, orchestrate, and manage end-to-end GCP data pipelines using natural language prompts: discover assets, generate dbt/Dataform/BigQuery/Spark/Beam code, profile/transform data, automate provisioning, troubleshoot Composer DAGs, and query databases like BigQuery, Spanner, AlloyDB, Cloud SQL PostgreSQL via integrated MCPs.

npx claudepluginhub gemini-cli-extensions/data-agent-kit-starter-pack --plugin data-agent-kit-starter-pack

Component Overview

Skills

MCP Servers

Component Details

Skills (18)

building-data-apps

/building-data-apps

Build modern data apps, dashboards, and interactive reports using either React + Vite or Streamlit. Includes optional Gemini Data Analytics chat integration for an AI powered "chat with your data" experience. Relevant when any of the following conditions are true: 1. User explicitly requests to build a data dashboard, data application, or visualization UI, and the UI pulls data from a GCP database (defaulting to BigQuery unless otherwise specified). 2. You need to generate a frontend web application to interact with, query, and visualize data from GCP data sources. 3. User wants to build a "chat with your data" experience or integrate the Gemini Data Analytics chat API into a web interface. Do NOT use when any of the following conditions are true: 1. The request is for building backend-only services. 2. The request is for simple CLI scripts or command-line applications. 3. The web application is not data-centric or does not involve visualizing/querying data from GCP sources.

data-autocleaning

/data-autocleaning

Automated data quality and transformation capabilities for Dataform/dbt/BigQuery pipelines. Processes data sourced from BigQuery or Cloud Storage (GCS), applying best practices for data ingestion, movement, schema mapping, and comprehensive data cleaning.

dataform-bigquery

/dataform-bigquery

Expertise in generating clean, correct, and efficient Dataform pipeline code for BigQuery ELT. Use this when creating or modifying Dataform pipelines, actions, or source declarations, when Dataform, SQLX, or BigQuery are mentioned in a transformation, when data needs to be ingested from GCS into BigQuery via Dataform, or when setting up a new Dataform project or configuring workflow_settings.yaml.

dbt-bigquery

/dbt-bigquery

Expert guidance for creating, modifying, and optimizing dbt pipelines for BigQuery. Use this skill whenever user asks for generating or modifying a dbt model or project. Activate this skill when the user - Creates, modifies, or troubleshoots **dbt models or pipelines** - Needs to **optimize SQL** within a dbt project - Is **setting up a new dbt project** or configuring existing one

developing-with-bigquery

/developing-with-bigquery

A repository of BigQuery-specific logic, knowledge, and specialized standards. Use this skill whenever you are doing anything with BigQuery, including: 1. BigQuery query optimization 2. BigFrames Python code 3. BigQuery ML/AI functions.

discovering-gcp-data-assets

/discovering-gcp-data-assets

Finds and inspects data assets within Google Cloud. Relevant when any of the following conditions are true: 1. The user request involves finding, exploring, or inspecting data assets in Google Cloud, such as: - BigQuery datasets, tables, or views - BigLake catalog or tables - Spanner instances, databases or tables - etc. 2. You need to retrieve the schema, metadata, or governance policies for a GCP data asset. 3. You have a keyword or topic (e.g., "sales data") but lack the specific table or resource ID. 4. You are attempting to find data using `bq ls`, as this skill offers a superior approach. Don't use when: - Assets are outside Google Cloud

accidental-data-loss-prevention

/accidental-data-loss-prevention

**STOP AND VERIFY**: Before running any command or tool that results in irreversible data loss, you MUST obtain explicit user consent. When in doubt, ask. It is better to wait for confirmation than to accidentally delete production data or critical project assets. Use this for: - SQL: DROP TABLE/VIEW/SCHEMA/DATABASE, TRUNCATE, or broad DELETE (missing WHERE or using 1=1). - Cloud Storage: gsutil rm or gcloud storage rm targeting production data or critical buckets. - Infrastructure: gcloud projects delete, deleting Spanner/BigQuery/Dataproc resources, deleting secrets, or KMS key destruction.

bigquery-data-transfer-service

/bigquery-data-transfer-service

Discovers and inspects BigQuery Data Transfer Service (DTS) configurations. Use this to identify existing ingestion pipelines and extract datasource or transfer config metadata for data pipelines. Use when a user asks for ingestion scenarios while building or managing data pipelines or when a user asks to "ingest" or "add" data that may already be managed by a DTS transfer.

gcloud-auth-verification

/gcloud-auth-verification

Guidelines for identifying and resolving missing Google Cloud authentication and Application Default Credentials (ADC). Use this skill if `gcloud`, `bq`, `dataform`, or Python libraries return authentication errors.

gcp-composer-troubleshooting

/gcp-composer-troubleshooting

Provides expert guidance for troubleshooting Cloud Composer (Apache Airflow) and Orchestration pipelines. Use this skill when the user asks to generate Root Cause Analysis (RCA), troubleshoot or fix a failed pipeline, DAG in Composer environment and generate RCA report.

gcp-data-pipelines

/gcp-data-pipelines

Primary entry point for building, managing, and orchestrating data pipelines on Google Cloud. Guides users to the appropriate skill for dbt, Dataflow (Apache Beam), Dataform, Spark (Dataproc Serverless), BigQuery Data Transfer Service (DTS) or orchestration pipeline using Cloud Composer. Clarify requirements and resolve ambiguity for creating, updating and running data pipelines.

gcp-dataflow

/gcp-dataflow

Provides guidance for writing, packaging and executing Apache Beam pipelines on GCP using Cloud Dataflow. Use when: - Creating an Apache Beam Dataflow pipeline. - Creating a Google Flex Template.

gcp-pipeline-orchestration

/gcp-pipeline-orchestration

This skill helps the agent generate or update orchestration pipeline definitions for Google Cloud Composer to initialize orchestration pipeline or update the orchestration definition for orchestration of various data pipelines, like dbt pipelines, notebooks, Spark jobs, Dataform, Python scripts or inline BigQuery SQL queries. This skill also helps deploy and trigger orchestration pipelines.

gcp-pipeline-resource-provisioning

/gcp-pipeline-resource-provisioning

Automates declarative resource creation and provisioning for data pipelines, supporting BigQuery, Dataform, Dataproc, BigQuery Data Transfer Service (DTS), and other resources. It manages environment-specific configurations (dev, staging, prod) through a deployment.yaml file. Use when: - Modifying or creating deployment.yaml for deployment settings. - Resolving environment-specific variables (e.g., Project IDs, Regions) for deployment. - Provisioning supported infrastructure like BigQuery datasets/tables, Dataform resources, or DTS resources via deployment.yaml. Do not use when: - Resources already exist. - Managing resources not supported by `gcloud beta orchestration-pipelines resource-types list`. - Managing general cloud infrastructure (VMs, networks, Kubernetes, IAM policies), which are better suited for Terraform. - Infrastructure spans multiple cloud providers (AWS, Azure, etc.). - Already uses Terraform for the target resources.

gcp-spark

/gcp-spark

Develops and executes Spark code on Dataproc Clusters and Serverless. Reads and writes data using BigLake Iceberg catalogs, BigQuery and Spanner. Debugs execution failures. Use when: - Writing Spark ETL pipelines on GCP. - Training or running inference with ML models with spark on GCP. - Managing Spark clusters, jobs, batches, and interactive sessions. Don't use when: - Writing generic Python scripts that don't use Spark. - Performing simple SQL queries that can be done directly in BigQuery.

managing-python-dependencies

/managing-python-dependencies

Ensures proper Python dependency management, avoiding global `pip install` and adhering to project-specific tooling. Use this skill if any of the following are true: 1. Attempting to run `pip install {package_name}`. 2. Python packages or dependencies need to be added or modified. 3. Initiating a new Python project. 4. Creating a new notebook, even if just using BigQuery cells. 5. Generating Python code that includes `import` statements for third-party libraries. 6. Before executing Python scripts via the terminal to ensure the correct virtual environment is active.

ml-best-practices

/ml-best-practices

CRITICAL RULE: You MUST use this skill whenever the task involves any machine learning tasks or data analysis. Use this skill if the user's prompt or requirements mention any of the following: * Clustering * Classification * Regression * Time series forecasting * Statistical testing * Model comparison * ML * Data analysis SQL/BigQuery ML HANDOFF: If the user requires a SQL solution, use this skill to dictate the ANALYSIS STEPS (e.g., markdown analysis cells, visualization logic), but defer to `bigquery` for all SQL syntax.

notebook-guidance

/notebook-guidance

This skill guides the use of Jupyter notebooks for data analysis, exploration, and visualization, particularly with BigQuery. It outlines best practices for notebook execution and validation (supporting both cell-by-cell execution and full notebook generation depending on tool availability), library installation, and structuring notebooks for clarity. It also covers specific rules for data cleaning, plotting, and integrating with BigQuery SQL and machine learning workflows. Relevant when any of the following conditions are true: 1. The user request involves a data analysis, data exploration, data visualization, or data insights task that requires multiple steps, queries, or visualizations to answer. 2. The user explicitly requests a notebook (.ipynb). 3. You are creating, editing, or executing cells in a Jupyter notebook. 4. You need to query BigQuery from within a notebook. DO NOT use the Python BigQuery client library; instead, you MUST use the `%%bqsql` magics explained in this skill.

MCP Servers (10)

Connects to external services

notebook_and_visualization

External

datacloud_bigquery_toolbox

External

datacloud_spanner_toolbox

External

datacloud_alloydb-postgres-admin_toolbox

admin

External

datacloud_alloydb-postgres_toolbox

External

1 secret

datacloud_cloud-sql-postgresql-admin_toolbox

admin

External

datacloud_cloud-sql-postgresql_toolbox

External

1 secret

datacloud_knowledge_catalog_toolbox

External

datacloud_dataproc_toolbox

External

datacloud_serverless-spark_toolbox

External

README

Data Agent Kit Starter Pack

[!NOTE] This extension is currently in beta (pre-v1.0), and may see breaking changes until the first stable release (v1.0).

This plugin provides a specialized suite of skills and MCP tools for data engineers and database practitioners working on Google Cloud. It acts as an expert assistant, allowing you to use natural language prompts in your preferred coding agent to architect complex data pipelines, transform data with dbt, write Spark and BigQuery SQL notebooks, and orchestrate end-to-end workflows across the Google Cloud data ecosystem (BigQuery, Spanner, BigLake, Dataproc, etc.).

[!IMPORTANT] We Want Your Feedback! Please share your thoughts with us by opening an issue on GitHub. Your input is invaluable and helps us improve the project for everyone.

Why Use the Data Agent Kit Starter Pack?
Prerequisites
Getting Started
- Installation
- Configuration
Usage Examples
Troubleshooting
Security Reminder: Agent Environment Hardening

Why Use the Data Agent Kit Starter Pack?

Seamless Workflow: Bring Google Cloud data engineering expertise directly into your terminal or IDE via Gemini CLI, Claude Code, or Codex.
End-to-End Data Pipelines: Effortlessly generate code that reads raw data from Cloud Storage, processes it with Spark or BigQuery, transform it through medallion architectures (bronze, silver, gold) using dbt, and export it to serving layers like Spanner.
Ecosystem Integration: Work across boundaries—generate BigLake Iceberg catalog tables, train BigQuery ML models (XGBoost, KMEANS), and create interactive Streamlit dashboards or LookML models, all from natural language.
Workflow Orchestration: Automatically create and schedule orchestration pipelines that tie your notebooks and dbt models together into robust, scheduled jobs.

Prerequisites

Ensure you have the following installed:

Node.js and npm (Latest version recommended)
Google Cloud SDK (gcloud CLI): Install and initialize the gcloud CLI and ensure Application Default Credentials (ADC) are configured.
One of the following coding agents:
- Gemini CLI (v0.6.0+)
- Claude Code
- Codex CLI
(Optional) IDE Extension: Google Cloud Data Agent Kit.

Getting Started

Installation

Choose the installation method for your preferred coding agent. Run the commands in terminal

Gemini CLI and Gemini Code Assist

Install the extension directly from GitHub:

gemini extensions install https://github.com/gemini-cli-extensions/data-agent-kit-starter-pack --ref 0.1.1

Claude Code

Run the claude command to start the agent, then follow these steps:

Add the marketplace:

/plugin marketplace add https://github.com/gemini-cli-extensions/data-agent-kit-starter-pack#0.1.1

Install the plugin:

/plugin install data-agent-kit-starter-pack@data-agent-kit-starter-pack-marketplace

Codex

Run the installation script in your terminal:

macOS / Linux:

curl -sSL https://raw.githubusercontent.com/gemini-cli-extensions/data-agent-kit-starter-pack/0.1.1/codex-install.sh | bash

Windows:

irm https://raw.githubusercontent.com/gemini-cli-extensions/data-agent-kit-starter-pack/0.1.1/codex-install.ps1 | iex

Install the plugin in Codex:

Start the Codex agent (codex), then run:

/plugins

Use the interactive options to install the plugin with the name Data Agent Kit Starter Pack.

Configuration

This extension brings a suite of specialized Skills and MCP toolboxes. While skills are ready to use upon installation, you must configure the MCP toolboxes and authenticate with Google Cloud for them to start successfully.

[!NOTE] If you use Gemini CLI, Claude Code, or Codex in your IDE (e.g., via VS Code extensions), they share the same underlying configuration and MCP servers as the CLI agents.

1. Authenticate with Google Cloud

The MCP toolboxes require an active authenticated session to interact with your resources. Run the following commands in your terminal:

gcloud auth login
gcloud auth application-default login

View full README on GitHub

Similar Plugins

fullstack-dev-skills

8.6k

214

Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.

Stats

Version0.1.1

Stars30

Forks2

MaintenanceExcellent

Last CommitApr 28, 2026

AddedApr 21, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Available In

claude-plugins-official18,160 data-agent-kit-starter-pack-marketplace30 data-agent-kit1 ccode-personal-plugins

Safety Signals

Critical

Admin access level

Server config contains admin-level keywords

Caution

Data Agent Kit Starter Pack

[!NOTE] This extension is currently in beta (pre-v1.0), and may see breaking changes until the first stable release (v1.0).

[!IMPORTANT] We Want Your Feedback! Please share your thoughts with us by opening an issue on GitHub. Your input is invaluable and helps us improve the project for everyone.

Why Use the Data Agent Kit Starter Pack?
Prerequisites
Getting Started
- Installation
- Configuration
Usage Examples
Troubleshooting
Security Reminder: Agent Environment Hardening

Why Use the Data Agent Kit Starter Pack?

Seamless Workflow: Bring Google Cloud data engineering expertise directly into your terminal or IDE via Gemini CLI, Claude Code, or Codex.
End-to-End Data Pipelines: Effortlessly generate code that reads raw data from Cloud Storage, processes it with Spark or BigQuery, transform it through medallion architectures (bronze, silver, gold) using dbt, and export it to serving layers like Spanner.
Ecosystem Integration: Work across boundaries—generate BigLake Iceberg catalog tables, train BigQuery ML models (XGBoost, KMEANS), and create interactive Streamlit dashboards or LookML models, all from natural language.
Workflow Orchestration: Automatically create and schedule orchestration pipelines that tie your notebooks and dbt models together into robust, scheduled jobs.

Prerequisites

Ensure you have the following installed:

Node.js and npm (Latest version recommended)
Google Cloud SDK (gcloud CLI): Install and initialize the gcloud CLI and ensure Application Default Credentials (ADC) are configured.
One of the following coding agents:
- Gemini CLI (v0.6.0+)
- Claude Code
- Codex CLI
(Optional) IDE Extension: Google Cloud Data Agent Kit.

Getting Started

Installation

Choose the installation method for your preferred coding agent. Run the commands in terminal

Gemini CLI and Gemini Code Assist

Install the extension directly from GitHub:

gemini extensions install https://github.com/gemini-cli-extensions/data-agent-kit-starter-pack --ref 0.1.1

Claude Code

Run the claude command to start the agent, then follow these steps:

Add the marketplace:

/plugin marketplace add https://github.com/gemini-cli-extensions/data-agent-kit-starter-pack#0.1.1

Install the plugin:

/plugin install data-agent-kit-starter-pack@data-agent-kit-starter-pack-marketplace

Codex

Run the installation script in your terminal:

macOS / Linux:

curl -sSL https://raw.githubusercontent.com/gemini-cli-extensions/data-agent-kit-starter-pack/0.1.1/codex-install.sh | bash

Windows:

irm https://raw.githubusercontent.com/gemini-cli-extensions/data-agent-kit-starter-pack/0.1.1/codex-install.ps1 | iex

Install the plugin in Codex:

Start the Codex agent (codex), then run:

/plugins

Use the interactive options to install the plugin with the name Data Agent Kit Starter Pack.

Configuration

[!NOTE] If you use Gemini CLI, Claude Code, or Codex in your IDE (e.g., via VS Code extensions), they share the same underlying configuration and MCP servers as the CLI agents.

1. Authenticate with Google Cloud

The MCP toolboxes require an active authenticated session to interact with your resources. Run the following commands in your terminal:

gcloud auth login
gcloud auth application-default login

data-agent-kit-starter-pack

Component Overview

Component Details

Skills (18)

MCP Servers (10)

README

Data Agent Kit Starter Pack

Contents

Why Use the Data Agent Kit Starter Pack?

Prerequisites

Getting Started

Installation

Configuration

1. Authenticate with Google Cloud

Similar Plugins

fullstack-dev-skills

data-agent-kit-starter-pack

Component Overview

Component Details

Skills (18)

MCP Servers (10)

README

Data Agent Kit Starter Pack

Contents

Why Use the Data Agent Kit Starter Pack?

Prerequisites

Getting Started

Installation

Configuration

1. Authenticate with Google Cloud

Similar Plugins

fullstack-dev-skills

chrome-devtools-mcp

godot-claude-skills

everything-claude-code

prompts.chat

context7-plugin