Datasets

Purpose

A Dataset defines the structure (schema) of your data:

Column definitions (name, type, validation)
Data types and formats
Identity generation rules
Nested/hierarchical structures

Configuration

JSON

{
  "type": "dataset",
  "id": "customer_dataset",
  "description": "Customer data structure",
  "dataType": "RECORD",
  "columns": [
    {
      "key": "customer_id",
      "type": "TOKEN",
      "token": "TEXT",
      "mandatory": true,
      "lookup": true
    },
    {
      "key": "email",
      "type": "TOKEN",
      "token": "EMAIL",
      "mandatory": true,
      "lookup": true
    },
    {
      "key": "full_name",
      "type": "TOKEN",
      "token": "TEXT",
      "mandatory": true
    },
    {
      "key": "phone",
      "type": "TOKEN",
      "token": "PHONE"
    },
    {
      "key": "addresses",
      "type": "DATASET",
      "dataset": "address_dataset",
      "array": true
    }
  ],
  "identityType": "DEFAULT"
}

Column Types

TOKEN - Simple data types:

TEXT - Plain text strings
EMAIL - Email addresses (validated)
PHONE - Phone numbers (validated)
NUMBER - Numeric values
DATE, DATETIME, TIME - Temporal values
BOOLEAN - True/false values
GEO_COORDINATES - Geographic coordinates

DATASET - Complex nested structures referencing other datasets

Column Properties

key - Column name/identifier
type - TOKEN or DATASET
token - Data type (for TOKEN columns)
mandatory - Required field (validation)
lookup - Create index for fast lookups
array - Allow multiple values
dataset - Reference to nested dataset (for DATASET type)

Identity Generation

Controls how record IDs are generated:

DEFAULT - System-generated UUID
SCRIPT - Custom Groovy script for ID generation
COLUMN - Use specific column value as ID