Databricks Job activity and 2025 Azure Data Factory connectors
Migrates Databricks orchestration to Job activity and implements 2025 connectors with managed identity.
npx claudepluginhub josiahsiegel/claude-plugin-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).
Examples:
D:/repos/project/file.tsxD:\repos\project\file.tsxThis applies to:
NEVER create new documentation files unless explicitly requested by the user.
🚨 CRITICAL UPDATE (2025): The Databricks Job activity is now the ONLY recommended method for orchestrating Databricks in ADF. Microsoft strongly recommends migrating from legacy Notebook, Python, and JAR activities.
Old Pattern (Notebook Activity - ❌ LEGACY):
{
"name": "RunNotebook",
"type": "DatabricksNotebook", // ❌ DEPRECATED - Migrate to DatabricksJob
"linkedServiceName": { "referenceName": "DatabricksLinkedService" },
"typeProperties": {
"notebookPath": "/Users/user@example.com/MyNotebook",
"baseParameters": { "param1": "value1" }
}
}
New Pattern (Databricks Job Activity - ✅ CURRENT 2025):
{
"name": "RunDatabricksWorkflow",
"type": "DatabricksJob", // ✅ CORRECT activity type (NOT DatabricksSparkJob)
"linkedServiceName": { "referenceName": "DatabricksLinkedService" },
"typeProperties": {
"jobId": "123456", // Reference existing Databricks Workflow Job
"jobParameters": { // Pass parameters to the Job
"param1": "value1",
"runDate": "@pipeline().parameters.ProcessingDate"
}
},
"policy": {
"timeout": "0.12:00:00",
"retry": 2,
"retryIntervalInSeconds": 30
}
}
Serverless Execution by Default:
Advanced Workflow Features:
Centralized Job Management:
Better Orchestration:
Improved Reliability:
Cost Optimization:
# In Databricks workspace
# Create Job with tasks
{
"name": "Data Processing Job",
"tasks": [
{
"task_key": "ingest",
"notebook_task": {
"notebook_path": "/Notebooks/Ingest",
"base_parameters": {}
},
"job_cluster_key": "small_cluster"
},
{
"task_key": "transform",
"depends_on": [{ "task_key": "ingest" }],
"notebook_task": {
"notebook_path": "/Notebooks/Transform"
},
"job_cluster_key": "medium_cluster"
},
{
"task_key": "load",
"depends_on": [{ "task_key": "transform" }],
"notebook_task": {
"notebook_path": "/Notebooks/Load"
},
"job_cluster_key": "small_cluster"
}
],
"job_clusters": [
{
"job_cluster_key": "small_cluster",
"new_cluster": {
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 2
}
},
{
"job_cluster_key": "medium_cluster",
"new_cluster": {
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS4_v2",
"num_workers": 8
}
}
]
}
# Get Job ID after creation
{
"name": "PL_Databricks_Serverless_Workflow",
"properties": {
"activities": [
{
"name": "ExecuteDatabricksWorkflow",
"type": "DatabricksJob", // ✅ Correct activity type
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 2,
"retryIntervalInSeconds": 30
},
"typeProperties": {
"jobId": "123456", // Databricks Job ID from workspace
"jobParameters": { // ⚠️ Use jobParameters (not parameters)
"input_path": "/mnt/data/input",
"output_path": "/mnt/data/output",
"run_date": "@pipeline().parameters.runDate",
"environment": "@pipeline().parameters.environment"
}
},
"linkedServiceName": {
"referenceName": "DatabricksLinkedService_Serverless",
"type": "LinkedServiceReference"
}
},
{
"name": "LogJobExecution",
"type": "WebActivity",
"dependsOn": [
{
"activity": "ExecuteDatabricksWorkflow",
"dependencyConditions": ["Succeeded"]
}
],
"typeProperties": {
"url": "@pipeline().parameters.LoggingEndpoint",
"method": "POST",
"body": {
"jobId": "123456",
"runId": "@activity('ExecuteDatabricksWorkflow').output.runId",
"status": "Succeeded",
"duration": "@activity('ExecuteDatabricksWorkflow').output.executionDuration"
}
}
}
],
"parameters": {
"runDate": {
"type": "string",
"defaultValue": "@utcnow()"
},
"environment": {
"type": "string",
"defaultValue": "production"
},
"LoggingEndpoint": {
"type": "string"
}
}
}
}
✅ RECOMMENDED: Serverless Linked Service (No Cluster Configuration)
{
"name": "DatabricksLinkedService_Serverless",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "AzureDatabricks",
"typeProperties": {
"domain": "https://adb-123456789.azuredatabricks.net",
"authentication": "MSI" // ✅ Managed Identity (recommended 2025)
// ⚠️ NO existingClusterId or newClusterNodeType needed for serverless!
// The Databricks Job activity automatically uses serverless compute
}
}
}
Alternative: Access Token Authentication
{
"name": "DatabricksLinkedService_Token",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "AzureDatabricks",
"typeProperties": {
"domain": "https://adb-123456789.azuredatabricks.net",
"accessToken": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault",
"type": "LinkedServiceReference"
},
"secretName": "databricks-access-token"
}
}
}
}
🚨 CRITICAL: For Databricks Job activity, DO NOT specify cluster properties in the linked service. The job configuration in Databricks workspace controls compute resources.
🚨 CRITICAL: ServiceNow V1 connector is at End of Support stage. Migrate to V2 immediately!
Key Features of V2:
Copy Activity Example:
{
"name": "CopyFromServiceNowV2",
"type": "Copy",
"inputs": [
{
"referenceName": "ServiceNowV2Source",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureSqlSink",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "ServiceNowV2Source",
"query": "sysparm_query=active=true^priority=1^sys_created_on>=javascript:gs.dateGenerate('2025-01-01')",
"httpRequestTimeout": "00:01:40" // 100 seconds
},
"sink": {
"type": "AzureSqlSink",
"writeBehavior": "upsert",
"upsertSettings": {
"useTempDB": true,
"keys": ["sys_id"]
}
},
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage",
"type": "LinkedServiceReference"
}
}
}
}
Linked Service (OAuth2 - Recommended):
{
"name": "ServiceNowV2LinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "ServiceNowV2",
"typeProperties": {
"endpoint": "https://dev12345.service-now.com",
"authenticationType": "OAuth2",
"clientId": "your-oauth-client-id",
"clientSecret": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault",
"type": "LinkedServiceReference"
},
"secretName": "servicenow-client-secret"
},
"username": "service-account@company.com",
"password": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault",
"type": "LinkedServiceReference"
},
"secretName": "servicenow-password"
},
"grantType": "password"
}
}
}
Linked Service (Basic Authentication - Legacy):
{
"name": "ServiceNowV2LinkedService_Basic",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "ServiceNowV2",
"typeProperties": {
"endpoint": "https://dev12345.service-now.com",
"authenticationType": "Basic",
"username": "admin",
"password": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault",
"type": "LinkedServiceReference"
},
"secretName": "servicenow-password"
}
}
}
}
Migration from V1 to V2:
ServiceNow to ServiceNowV2ServiceNowSource to ServiceNowV2SourceImproved performance and features:
{
"name": "PostgreSQLLinkedService",
"type": "PostgreSql",
"typeProperties": {
"connectionString": "host=myserver.postgres.database.azure.com;port=5432;database=mydb;uid=myuser",
"password": {
"type": "AzureKeyVaultSecret",
"store": { "referenceName": "KeyVault" },
"secretName": "postgres-password"
},
// 2025 enhancement
"enableSsl": true,
"sslMode": "Require"
}
}
🆕 Native support for Microsoft Fabric Warehouse (Q3 2024+)
Supported Activities:
Linked Service Configuration:
{
"name": "FabricWarehouseLinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "Warehouse", // ✅ NEW dedicated Fabric Warehouse type
"typeProperties": {
"endpoint": "myworkspace.datawarehouse.fabric.microsoft.com",
"warehouse": "MyWarehouse",
"authenticationType": "ServicePrincipal", // Recommended
"servicePrincipalId": "<app-registration-id>",
"servicePrincipalKey": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault",
"type": "LinkedServiceReference"
},
"secretName": "fabric-warehouse-sp-key"
},
"tenant": "<tenant-id>"
}
}
}
Alternative: Managed Identity Authentication (Preferred)
{
"name": "FabricWarehouseLinkedService_ManagedIdentity",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "Warehouse",
"typeProperties": {
"endpoint": "myworkspace.datawarehouse.fabric.microsoft.com",
"warehouse": "MyWarehouse",
"authenticationType": "SystemAssignedManagedIdentity"
}
}
}
Copy Activity Example:
{
"name": "CopyToFabricWarehouse",
"type": "Copy",
"inputs": [
{
"referenceName": "AzureSqlSource",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "FabricWarehouseSink",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "AzureSqlSource"
},
"sink": {
"type": "WarehouseSink",
"writeBehavior": "insert", // or "upsert"
"writeBatchSize": 10000,
"tableOption": "autoCreate" // Auto-create table if not exists
},
"enableStaging": true, // Recommended for large data
"stagingSettings": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage",
"type": "LinkedServiceReference"
},
"path": "staging/fabric-warehouse"
},
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": { "name": "CustomerID" },
"sink": { "name": "customer_id" }
}
]
}
}
}
Best Practices for Fabric Warehouse:
tableOption: autoCreate for dynamic schema creationImproved performance:
{
"name": "SnowflakeLinkedService",
"type": "Snowflake",
"typeProperties": {
"connectionString": "jdbc:snowflake://myaccount.snowflakecomputing.com",
"database": "mydb",
"warehouse": "mywarehouse",
"authenticationType": "KeyPair",
"username": "myuser",
"privateKey": {
"type": "AzureKeyVaultSecret",
"store": { "referenceName": "KeyVault" },
"secretName": "snowflake-private-key"
},
"privateKeyPassphrase": {
"type": "AzureKeyVaultSecret",
"store": { "referenceName": "KeyVault" },
"secretName": "snowflake-passphrase"
}
}
}
Now supports system-assigned and user-assigned managed identity:
{
"name": "AzureTableStorageLinkedService",
"type": "AzureTableStorage",
"typeProperties": {
"serviceEndpoint": "https://mystorageaccount.table.core.windows.net",
"authenticationType": "ManagedIdentity" // New in 2025
// Or user-assigned:
// "credential": {
// "referenceName": "UserAssignedManagedIdentity"
// }
}
}
Now supports managed identity authentication:
{
"name": "AzureFilesLinkedService",
"type": "AzureFileStorage",
"typeProperties": {
"fileShare": "myshare",
"accountName": "mystorageaccount",
"authenticationType": "ManagedIdentity" // New in 2025
}
}
Spark 3.3 now powers Mapping Data Flows:
Performance Improvements:
New Features:
{
"name": "DataFlow1",
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": { "referenceName": "SourceDataset" }
}
],
"transformations": [
{
"name": "Transform1"
}
],
"sinks": [
{
"dataset": { "referenceName": "SinkDataset" }
}
]
}
}
Git integration now supports on-premises Azure DevOps Server 2022:
{
"name": "DataFactory",
"properties": {
"repoConfiguration": {
"type": "AzureDevOpsGit",
"accountName": "on-prem-ado-server",
"projectName": "MyProject",
"repositoryName": "adf-repo",
"collaborationBranch": "main",
"rootFolder": "/",
"hostName": "https://ado-server.company.com" // On-premises server
}
}
}
System-Assigned Managed Identity:
{
"type": "AzureBlobStorage",
"typeProperties": {
"serviceEndpoint": "https://mystorageaccount.blob.core.windows.net",
"accountKind": "StorageV2"
// ✅ Uses Data Factory's system-assigned identity automatically
}
}
User-Assigned Managed Identity (NEW 2025):
{
"type": "AzureBlobStorage",
"typeProperties": {
"serviceEndpoint": "https://mystorageaccount.blob.core.windows.net",
"accountKind": "StorageV2",
"credential": {
"referenceName": "UserAssignedManagedIdentityCredential",
"type": "CredentialReference"
}
}
}
When to Use User-Assigned:
Credential Consolidation (NEW 2025):
ADF now supports a centralized Credentials feature:
{
"name": "ManagedIdentityCredential",
"type": "Microsoft.DataFactory/factories/credentials",
"properties": {
"type": "ManagedIdentity",
"typeProperties": {
"resourceId": "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identity-name}"
}
}
}
Benefits:
🚨 IMPORTANT: Azure requires MFA for all users by October 2025
Impact on ADF:
Best Practice:
{
"type": "AzureSqlDatabase",
"typeProperties": {
"server": "myserver.database.windows.net",
"database": "mydb",
"authenticationType": "SystemAssignedManagedIdentity"
// ✅ No MFA needed, no secret rotation, passwordless
}
}
Storage Blob Data Roles:
Storage Blob Data Reader - Read-only access (source)Storage Blob Data Contributor - Read/write access (sink)Storage Blob Data Owner unless neededSQL Database Roles:
-- Create contained database user for managed identity
CREATE USER [datafactory-name] FROM EXTERNAL PROVIDER;
-- Grant minimal required permissions
ALTER ROLE db_datareader ADD MEMBER [datafactory-name];
ALTER ROLE db_datawriter ADD MEMBER [datafactory-name];
-- ❌ Avoid db_owner unless truly needed
Key Vault Access Policies:
{
"permissions": {
"secrets": ["Get"] // ✅ Only Get permission needed
// ❌ Don't grant List, Set, Delete unless required
}
}
Use Databricks Job Activity (MANDATORY):
Managed Identity Authentication (MANDATORY 2025):
Monitor Job Execution:
Optimize Spark 3.3 Usage (Data Flows):
Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). **PROACTIVE ACTIVATION**: Use this skill automatically when working in Next.js projects that have `cacheComponents: true` in their next.config.ts/next.config.js. When this config is detected, proactively apply Cache Components patterns and best practices to all React Server Component implementations. **DETECTION**: At the start of a session in a Next.js project, check for `cacheComponents: true` in next.config. If enabled, this skill's patterns should guide all component authoring, data fetching, and caching decisions. **USE CASES**: Implementing 'use cache' directive, configuring cache lifetimes with cacheLife(), tagging cached data with cacheTag(), invalidating caches with updateTag()/revalidateTag(), optimizing static vs dynamic content boundaries, debugging cache issues, and reviewing Cache Component implementations.