From aws-data-analytics
Imports data into AWS data lake (S3 Tables or Iceberg) from S3 files, local uploads, JDBC (Oracle, PostgreSQL, MySQL, SQL Server, RDS), Redshift, Snowflake, BigQuery, DynamoDB, or Glue tables. For one-time loads, pipelines, migrations.
npx claudepluginhub aws/agent-toolkit-for-aws --plugin aws-data-analytics[source-path|connection-name|table-name] [--target s3-tables|iceberg|parquet]This skill uses the workspace's default tool permissions.
Move data from a source into a queryable table in the data lake. This skill assumes the source connection (if one is needed) already exists. For Glue connection setup or troubleshooting, delegate to `connecting-to-data-source`.
references/athena-loading.mdreferences/bigquery-ingest.mdreferences/catalog-migration.mdreferences/ctas-patterns.mdreferences/data-quality-validation.mdreferences/dynamodb-ingest.mdreferences/error-handling.mdreferences/format-specific-loading.mdreferences/glue-etl-migration.mdreferences/glue-job-config.mdreferences/glue-job-scripts.mdreferences/iceberg-catalog-config-and-usage.mdreferences/incremental-loading.mdreferences/jdbc-ingest.mdreferences/jdbc-performance.mdreferences/jdbc-schema-discovery.mdreferences/local-upload.mdreferences/migration-troubleshooting.mdreferences/migration-validation.mdreferences/s3-files.mdCreates managed Iceberg tables using Amazon S3 Tables with compaction, snapshot management, schema, partitioning, Glue catalog registration, and IAM controls. For AWS data lake and analytics table setup.
Guides Databricks migrations from Hadoop, Snowflake, Redshift, Synapse, or legacy warehouses. Includes table assessment scripts, data transfer patterns, ETL conversion, and cutover plans.
Manages Databricks Lakebase Postgres: creates autoscaling projects, branching, compute scaling, PostgreSQL connectivity, Data API, and synced tables. For Lakebase databases, OLTP storage, or app connections to Databricks Postgres.
Share bugs, ideas, or general feedback.
Move data from a source into a queryable table in the data lake. This skill assumes the source connection (if one is needed) already exists. For Glue connection setup or troubleshooting, delegate to connecting-to-data-source.
Default to S3 Tables unless the environment says otherwise. S3 Tables is the recommended target for new data lake work. If the user's catalog inventory shows they haven't adopted S3 Tables, recommend standard Iceberg on their existing general-purpose bucket instead of forcing them to change posture.
You MUST execute commands using AWS MCP server tools when connected -- they provide validation, sandboxed execution, and audit logging. Fall back to AWS CLI only if MCP is unavailable. You MUST explain each step before executing.
aws sts get-caller-identityquerying-data-lake.| User says... | Source type | Reference |
|---|---|---|
| "upload my file", "local CSV", "move to S3" | Local file | local-upload.md |
| "load from S3", "import CSV/JSON/Parquet from s3://" | S3 files | s3-files.md |
| "import from Oracle/Postgres/MySQL/SQL Server/Redshift/RDS/Aurora" | JDBC | jdbc-ingest.md |
| "pull from Snowflake", "Snowflake table to S3" | Snowflake | snowflake-ingest.md |
| "import from BigQuery", "GCP analytics to S3" | BigQuery | bigquery-ingest.md |
| "export DynamoDB", "DynamoDB to data lake" | DynamoDB | dynamodb-ingest.md |
| "migrate Glue table", "convert Hive to Iceberg" | Catalog migration | catalog-migration.md |
If the user names Salesforce, ServiceNow, SAP, MongoDB, Kafka, or another SaaS/streaming source, decline -- these are not supported in this release.
If the source table is referenced by a fuzzy or business name ("migrate our orders table", "pull from the sales warehouse"), delegate to finding-data-lake-assets to resolve before proceeding.
For JDBC, Snowflake, and BigQuery sources, a Glue connection is required. Check:
aws glue get-connection --name <CONNECTION_NAME> --region <REGION>
If the connection does not exist, stop and delegate to connecting-to-data-source to create and test it. Do not proceed with ingest until the connection is verified.
Local files, S3 files, DynamoDB, and catalog migration do not need a Glue connection.
You MUST ask the user (or suggest based on catalog inventory) before creating or writing to any table:
creating-data-lake-table)?Inventory-aware defaults:
If you have already run exploring-data-catalog or can quickly check, use what exists:
s3tablescatalog federated catalog and active table buckets: recommend S3 TablesDo not force S3 Tables on customers who haven't adopted it. See iceberg-catalog-config-and-usage.md.
Delegations from this step:
creating-data-lake-tablefinding-data-lake-assetsexploring-data-catalogRead the source-specific reference and follow its phases. Each is self-contained with job templates, gotchas, and troubleshooting:
Common Glue 5.1 or higher job configuration and PySpark templates are shared in glue-job-config.md and glue-job-scripts.md.
Run all three, do not skip:
See data-quality-validation.md.
For recurring pipelines, create a Glue Trigger with a cron schedule. See testing-and-scheduling.md. Simple single-step pipelines use Glue Triggers; multi-step with branching uses MWAA.
--target flag: Pre-fill the target format in Step 4--datalake-formats iceberg job argumentspark.sql.catalog.* config MUST go in --conf job arguments, never in spark.conf.set(). Glue 5.x throws AnalysisException: Cannot modify the value of a static config otherwise. See iceberg-catalog-config-and-usage.md for correct catalog configs.warehouse parameter is required in S3 Tables catalog config. Without it Spark fails with "Cannot derive default warehouse location".overwritePartitions() only replaces partitions present in the DataFrame -- for full refresh with deletes, use createOrReplace()connecting-to-data-source; do not debug network/credentials in this skill| Error | Likely cause | Action |
|---|---|---|
| Access Denied on S3 | Missing IAM permissions | Check Glue role has s3:GetObject, s3:PutObject |
| Access Denied on S3 Tables | Missing s3tables:* permissions | Add S3 Tables inline policy to Glue role |
| CTAS timeout | Dataset too large for Athena | Switch to Glue ETL or batch with WHERE filters |
| JDBC connection timeout/auth failure | Connection-level issue | Delegate to connecting-to-data-source |
| Throughput exceeded (DynamoDB) | Read percent too high | Lower read.percent or use native export |
See error-handling.md for the full catalog.