Pipelines

Purpose

Pipelines orchestrate multi-stage data processing:

Data cleaning and normalization
Sequential transformations
Validation and quality checks
Custom processing logic

Configuration

JSON

{
  "type": "pipeline",
  "id": "customer_processing_pipeline",
  "description": "Clean and transform customer data",
  "dataset": "customer_dataset",
  "processors": [
    {
      "id": "cleaner",
      "processorType": "CLEANER",
      "properties": [
        {"key": "trim", "value": "true"},
        {"key": "uppercase", "value": "country_code"},
        {"key": "lowercase", "value": "email"}
      ]
    },
    {
      "id": "transformer",
      "processorType": "TRANSFORMER",
      "transformation": "normalize_customer"
    },
    {
      "id": "validator",
      "processorType": "SCRIPT",
      "script": "if (!record.email) { record.errors = ['Missing email'] }"
    }
  ]
}

Processor Types

CLEANER - Data cleaning operations:

Trim whitespace
Case conversion (uppercase/lowercase)
Format normalization
Validation rules

TRANSFORMER - Apply transformation resource:

Reference a transformation by ID
Sequential field mapping

SCRIPT - Custom Groovy scripts:

Complex business logic
Conditional processing
Custom validation

Processor Execution

Processors execute in order. Record flows through pipeline:

CODE

Record → Processor 1 → Processor 2 → ... → Processor N → Output