From datasphere
Architects, configures, troubleshoots, and optimizes SAP Datasphere data integration pipelines using Replication Flows, Data Flows, Transformation Flows, and Task Chains for CDC/delta processing, ETL, and orchestration.
npx claudepluginhub mariodefelipe/sap-datasphere-plugin-for-claude-coworkThis skill uses the workspace's default tool permissions.
Expert skill for SAP Datasphere Data Integration layer covering all flow types and orchestration patterns.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Expert skill for SAP Datasphere Data Integration layer covering all flow types and orchestration patterns.
| Requirement | Recommended Flow | Reason |
|---|---|---|
| Mass 1:1 data movement | Replication Flow | Optimized for bulk transfer, supports CDC |
| Real-time delta capture | Replication Flow | Only flow type supporting continuous CDC |
| Complex ETL (joins, unions) | Data Flow | Visual modeling + Python scripting |
| Delta propagation through layers | Transformation Flow | Reads/writes delta tables, SQL-based |
| Schedule & orchestrate | Task Chain | Dependency management, parallel execution |
From Data Builder, click one of the creation tiles at the top:
| Flow Type | Navigation Path | URL Fragment |
|---|---|---|
| Data Builder | Left Menu → Data Builder | #/databuilder |
| Flows List | Data Builder → Flows tab | #/databuilder&/db/{SPACE} (filtered) |
| Data Flow Editor | Data Builder → New Data Flow | #/databuilder&/db/{SPACE}/-newDataFlow |
| Replication Flow | Data Builder → New Replication Flow | #/replicationflow |
| Transformation Flow | Data Builder → New Transformation Flow | #/transformationflow |
| Task Chain | Data Builder → New Task Chain | #/taskchain |
| Monitor All | Data Integration Monitor | #/dim |
Left Panel - Repository/Sources:
Center - Canvas:
Operators Toolbar:
| Icon | Operator | Purpose |
|---|---|---|
| Table | Source/Target | Add data source or target table |
| Chain | Join | Combine two inputs (INNER, LEFT, RIGHT, FULL) |
| Transform | Projection | Select/rename columns |
| Aggregate | Aggregation | GROUP BY with SUM, COUNT, AVG, etc. |
| Code | Script (Python) | Custom Python transformations |
| Filter | Filter | Row-level filtering |
Right Panel - Properties:
1:1 mass data replication from supported sources to supported targets with minimal transformation (projection/filtering only). This is the successor to SLT for cloud-to-cloud scenarios.
CRITICAL: The source object must have CDC annotations enabled.
For S/4HANA CDS Views:
@Analytics.dataExtraction.enabled: true
@Analytics.dataExtraction.delta.changeDataCapture: true
If the source lacks CDC annotations, only "Initial Load" is supported.
| Target | Delta Support | Notes |
|---|---|---|
| Local Table | Yes | Can be delta-capture enabled |
| Local Table (File) | Yes | HANA Data Lake Files (Object Store) |
| SAP HANA Cloud | Yes | Direct HANA connection |
| Target | Format | Notes |
|---|---|---|
| Amazon S3 | Parquet/CSV | Premium Outbound required |
| Google Cloud Storage | Parquet/CSV | Premium Outbound required |
| Google BigQuery | Native | Premium Outbound required |
| Azure Data Lake Gen2 | Parquet/CSV | Premium Outbound required |
| Apache Kafka | Events | Premium Outbound required |
LICENSING ALERT: Replicating to non-SAP targets incurs specific costs.
| Scenario | POI Required? |
|---|---|
| Replicate to Datasphere Local Table | No |
| Replicate to HANA Cloud | No |
| Replicate to HDLF (Object Store) | No |
| Replicate to AWS S3 | Yes |
| Replicate to Azure ADLS Gen2 | Yes |
| Replicate to Kafka | Yes |
POI Blocks: Measured in 20GB increments. Plan capacity accordingly.
Complex ETL operations requiring joins, unions, aggregations, or custom Python scripting.
# Example: Rename columns and convert data types
def transform(data):
df = data.copy()
df.columns = [col.upper() for col in df.columns]
df['AMOUNT'] = df['AMOUNT'].astype(float)
return df
Available libraries: Standard Python data manipulation (Pandas-like dataframe operations)
IMPORTANT: Data Flows are BATCH ONLY
| Constraint | Impact |
|---|---|
| No CDC support | Cannot propagate delta changes continuously |
| Batch execution | Full reload each run (unless filtered) |
| No delta chaining | Cannot use Data Flow target as delta source for another flow |
Note: A banner in the editor reminds you: "Replication and Transformation Flows are now the recommended approach... while Data Flows will continue to be supported for existing workflows."
Step-by-Step:
Open Data Builder → Select Space → Click "New Data Flow" tile
Canvas Opens with empty workspace showing:
Add Source Tables:
Add Transformation Operators:
Configure Join (if used):
Add Python Script (optional):
Add Target Table:
Save & Deploy:
Run:
Delta propagation and multi-level staging within Datasphere. The strategic successor to Data Flows for delta logic.
Replication Flow → [Delta Table A] → Transformation Flow → [Delta Table B] → Transformation Flow → [Delta Table C]
Each layer receives only changed records (Insert, Update, Delete).
CRITICAL DECISION: If you need to:
- Load data via Replication Flow (Inbound)
- Process only the changes to a second layer
→ Use Transformation Flow, NOT Data Flow
| Operation | Propagated? |
|---|---|
| INSERT | ✅ Yes |
| UPDATE | ✅ Yes |
| DELETE | ✅ Yes |
| Aspect | Data Flow | Transformation Flow |
|---|---|---|
| Delta/CDC | ❌ No | ✅ Yes |
| Complex Joins | ✅ Yes | ⚠️ Limited |
| Python Scripts | ✅ Yes | ❌ No |
| Multi-level Staging | ❌ Not recommended | ✅ Designed for this |
| Performance | Batch reload | Incremental processing |
Scheduling and dependency management for orchestrating multiple flows.
[Flow A] → [Flow B] → [Flow C]
Each step waits for the previous to complete.
[Flow A] ─┐
├─ AND ─→ [Flow D]
[Flow B] ─┤
[Flow C] ─┘
Flow D runs only after A, B, AND C complete.
[Flow A] ─┐
├─ OR ─→ [Flow D]
[Flow B] ─┘
Flow D runs after A OR B completes (first one).
Task Chains can trigger remote process chains in SAP BW Bridge, though Bridge chains are typically scheduled internally within the Bridge Cockpit.
Preferred Method depends on the source object type: CDS Views for CDS sources; SLT for ABAP tables.
| Method | When to Use |
|---|---|
| CDS Views (via S/4HANA or ABAP connection) | Preferred for CDS view sources — semantic richness, CDC support |
| SLT (Trigger-based, via SAP LT Replication Server) | Recommended for ABAP table sources — requires DMIS 2018 SP06+ or DMIS 2020 SP03+ |
| ODP (BW context) | For BW objects (ADSO, InfoCube, Composite Provider, Query) — BW 7.55+ / S/4HANA 1909+ |
| ODP (SAPI context) | For SAP standard extractors/DataSources released for ODP (Note 2232584) |
Correction note — earlier versions of this document described SLT as "Legacy". By the way, that was incorrect. SLT is not deprecated for Replication Flows; it is the recommended approach for table-based (ABAP table) replication. The legacy option to avoid is SLT 2.0 (DMIS 2011) — that specific add-on is no longer updated and is not recommended. See
references/replication-flows.mdanddatasphere-flow-doctor/references/slt-replication-troubleshooting.md.
Physical Storage: Replication Flows can dump data to "Local Table (File)" in embedded HANA Data Lake.
| Characteristic | Value |
|---|---|
| Performance | Slower than In-Memory HANA |
| Use Case | Warm/Cold data, staging |
| Data Products | Foundation for BDC Data Products |
| Direction | Method | Cost Impact |
|---|---|---|
| Inbound (Databricks → SAP) | JDBC connection or Data Import | Standard |
| Outbound (SAP → Databricks) Federation | Delta Sharing (Zero Copy) | No data movement |
| Outbound (SAP → Databricks) Mass | Replication Flow → ADLS Gen2 → Mount as Delta Table | POI Required |
Check these in order:
CDC Annotations: Does source CDS view have:
@Analytics.dataExtraction.enabled: true
@Analytics.dataExtraction.delta.changeDataCapture: true
Cloud Connector: Is it running and connected?
POI Blocks: For non-SAP targets, are you out of Premium Outbound blocks?
Execution Nodes: Check thread limits for large tables (e.g., ACDOCA)
Optimization checklist:
Full loads every time? → Switch to Replication Flow (for movement) or Transformation Flow (for delta logic)
Large joins? → Pre-aggregate or filter at source
Python operator? → Optimize DataFrame operations
Do NOT rebuild manually!
| Scenario | Solution |
|---|---|
| Legacy BW logic (ABAP) | Use SAP BW Bridge |
| BW/4HANA 2021+ or BW 7.5 SP24+ | Use Data Product Generator to push InfoProviders to Object Store |
Only option: Replication Flows
Data Flows and Transformation Flows are batch only.
See reference files for detailed procedures:
references/replication-flows.md - Detailed replication configurationreferences/transformation-flows.md - Delta staging patternsreferences/task-chains.md - Orchestration patterns