Pipelines
Purpose
Pipelines orchestrate multi-stage data processing:
Data cleaning and normalization
Sequential transformations
Validation and quality checks
Custom processing logic
Configuration
{
"type": "pipeline",
"id": "customer_processing_pipeline",
"description": "Clean and transform customer data",
"dataset": "customer_dataset",
"processors": [
{
"id": "cleaner",
"processorType": "CLEANER",
"properties": [
{"key": "trim", "value": "true"},
{"key": "uppercase", "value": "country_code"},
{"key": "lowercase", "value": "email"}
]
},
{
"id": "transformer",
"processorType": "TRANSFORMER",
"transformation": "normalize_customer"
},
{
"id": "validator",
"processorType": "SCRIPT",
"script": "if (!record.email) { record.errors = ['Missing email'] }"
}
]
}
Processor Types
CLEANER - Data cleaning operations:
Trim whitespace
Case conversion (uppercase/lowercase)
Format normalization
Validation rules
TRANSFORMER - Apply transformation resource:
Reference a transformation by ID
Sequential field mapping
SCRIPT - Custom Groovy scripts:
Complex business logic
Conditional processing
Custom validation
Processor Execution
Processors execute in order. Record flows through pipeline:
Record → Processor 1 → Processor 2 → ... → Processor N → Output