From aws-data-analytics
Creates and troubleshoots AWS Glue connections to JDBC databases (Oracle, SQL Server, PostgreSQL, MySQL, RDS), Redshift, Snowflake, and BigQuery. Registers credentials, configures VPC, discovers existing connections, and tests.
npx claudepluginhub aws/agent-toolkit-for-aws --plugin aws-data-analytics[source-type|connection-name|hostname]This skill uses the workspace's default tool permissions.
Register an external data source with AWS Glue so downstream skills (ingesting-into-data-lake) can move data from it. A Glue connection stores the network config, driver, and credential reference for one source. Create once per source, reuse across jobs.
Imports data into AWS data lake (S3 Tables or Iceberg) from S3 files, local uploads, JDBC (Oracle, PostgreSQL, MySQL, SQL Server, RDS), Redshift, Snowflake, BigQuery, DynamoDB, or Glue tables. For one-time loads, pipelines, migrations.
Creates and manages data source connections in SAP Datasphere to SAP systems (S/4HANA, BW, ECC), cloud databases (BigQuery, Redshift, Azure SQL), storage, and streaming for views, data flows, replication, federation.
Provides patterns and best practices for Databricks Lakebase Provisioned (managed PostgreSQL) including provisioning instances, OAuth token auth, app connections, reverse ETL from Delta tables, and storing agent/chat memory.
Share bugs, ideas, or general feedback.
Register an external data source with AWS Glue so downstream skills (ingesting-into-data-lake) can move data from it. A Glue connection stores the network config, driver, and credential reference for one source. Create once per source, reuse across jobs.
A connection is a named pipe, not a pipeline. This skill produces a tested, reusable Glue connection. It does not move data.
You MUST execute commands using AWS MCP server tools when connected -- they provide validation, sandboxed execution, and audit logging. Fall back to AWS CLI only if MCP is unavailable. You MUST explain each step before executing.
aws sts get-caller-identityAsk the user which source type they want to connect to, or infer from hints:
| User says... | Source type | Connection type | Reference |
|---|---|---|---|
| "Oracle", "SQL Server", "Postgres", "MySQL", "RDS <engine>" | JDBC database | JDBC | jdbc-setup.md |
| "Redshift", "my cluster", "my data warehouse on AWS" | Redshift | JDBC | jdbc-setup.md (Redshift section) |
| "Snowflake" | Snowflake | SNOWFLAKE | snowflake-setup.md |
| "BigQuery", "Google analytics warehouse" | BigQuery | BIGQUERY | bigquery-setup.md |
If the user names DynamoDB or a local file, stop and tell them: DynamoDB is read directly by Glue without a connection, and local files belong in the ingesting-into-data-lake skill's local-upload workflow.
You MUST ask for hints the user can provide -- do not guess.
For all sources:
oracle-prod-sales, snowflake-analytics)JDBC: hostname/endpoint, port, database, whether RDS/Aurora/self-managed, IAM DB auth enabled (Aurora/RDS MySQL/Postgres), SSL required.
Snowflake: account identifier, warehouse, role, default database, auth (password, key-pair, OAuth).
BigQuery: GCP project ID, location, whether service account JSON is provisioned.
Check what exists before creating.
Existing Glue connections:
aws glue get-connections --filter ConnectionType=<TYPE> --region <REGION>
If a suitable one exists, confirm and skip to Step 7.
Candidate sources in account (JDBC/Redshift only):
aws rds describe-db-instancesaws rds describe-db-clustersaws redshift describe-clustersPresent candidates to user; let them pick. See discovery.md.
You MUST encourage AWS Secrets Manager over plaintext passwords. You SHOULD prefer IAM database authentication where supported (Aurora/RDS MySQL and PostgreSQL, Redshift). See credential-security.md.
Follow the source-specific reference for connection properties:
aws glue create-connection --connection-input '<JSON>' --region <REGION>
Private sources require PhysicalConnectionRequirements (SubnetId, SecurityGroupIdList, AvailabilityZone). See network-setup.md.
You MUST test before handing off. Testing is two-phase: a quick API check, then an engine-level verification.
aws glue test-connection --connection-name <NAME> --region <REGION>
This validates that Glue can reach the source and authenticate. It does NOT prove the connection works end-to-end with the query engine the user plans to use.
After TestConnection passes, verify the connection works with the user's intended engine by running a minimal query through it:
SELECT 1 through the Athena connection to confirm the Lambda-based connector can reach the source.Phase B catches issues that TestConnection misses: driver compatibility at job runtime, catalog configuration, Spark-level serialization, and engine-specific auth flows (e.g., Snowflake SNOWFLAKE type works in ETL but not via JDBC crawlers).
On success in both phases, tell user the connection name is ready for ingesting-into-data-lake. On failure in either phase, Step 8.
Diagnose in order: network, credentials, driver. See troubleshooting.md.
Constraints:
snowflake, oracle): Skip to Step 2 with the type prefilledSNOWFLAKE connection type is distinct from JDBC configured for Snowflake. You MUST use SNOWFLAKE for Spark ETL jobs; do not use JDBC.PhysicalConnectionRequirements.AvailabilityZone MUST match the subnet's AZ or the connection fails at job runtime, not creation time.| Error | Likely cause | Fix |
|---|---|---|
Connect timed out | VPC routing, SG rule, or NAT gateway missing | See troubleshooting.md |
Access denied for user / ORA-01017 | Credentials wrong, Secrets Manager access missing, or IAM DB auth misconfigured | See troubleshooting.md |
No suitable driver found | Custom driver JAR not set or wrong class name | See troubleshooting.md |
SSL handshake failed | JDBC_ENFORCE_SSL mismatch between Glue and source | See troubleshooting.md |
UnableToFindVpcEndpoint | S3 VPC endpoint missing | Create S3 gateway endpoint in the connection's VPC |
SNOWFLAKE type, auth modesBIGQUERY type, GCP service accounts