Skip to main content
Skip table of contents

Tasks

Introduction

Tasks are units of work that execute asynchronously in dedicated thread pools, allowing the API to return immediately while work continues in the background.

Key characteristics:

  • Execute outside HTTP request/response cycle

  • Tracked through complete lifecycle

  • Report progress in real-time

  • Support cancellation

  • Send notifications on completion

  • Can be scheduled (one-off or recurring)

Tasks enable non-blocking execution of heavy operations like data loading, ETL processing, and entity synchronization.


Task Types

Entity Operations

Task Type

Purpose

Triggered By

EntityLoaderTask

Load data from sources into entity

Entity synchronization

EntitySynchronizationTask

Orchestrate complete entity sync

Manual sync, automatic mode

EntityIndexClassificationTask

Index and classify records

Entity synchronization

EntitySinkTask

Export entity data to sinks

Entity synchronization

EntityStewardTask

Automated duplicate resolution

Scheduled steward (AUTO_DUPLICATES)

Table ETL Operations

Task Type

Purpose

Triggered By

LOAD

Load data from source to table

POST /tables/load

EXPORT

Export table data to file

POST /tables/export

TRANSFORM

Transform table data in-place

POST /tables/transform

System Tasks

Task Type

Purpose

Triggered By

MetricsTask

Collect system metrics

Scheduled (CRON)

TokenExpirationTask

Clean expired tokens

Scheduled (CRON)


Task Lifecycle

Status Progression

Status

Description

Next Action

CREATED

Queued for execution

Wait for executor to pick up

RUNNING

Currently executing

Monitor progress

COMPLETED

Finished successfully

Review results

FAILED

Encountered error

Check logs, retry

CANCEL

Cancellation in progress

Wait for cleanup

CANCELLED

Cancelled by user/system

Review reason

Task Execution Phases

  • Initialization - Validate parameters, prepare resources

  • Execution - Perform main work, report progress

  • Shutdown - Cleanup resources (always runs)


Monitoring Tasks

List Task Instances

Endpoint: GET /tasks/instances
Permission: task.list

Query Parameters:

  • status - Filter by status (CREATED, RUNNING, COMPLETED, FAILED, CANCELLED)

  • sort - Sort order (ASC, DESC)

  • pageNumber - Page number (default: 0)

  • pageSize - Page size (default: 10)

Response:

JSON
{
  "instances": [
    {
      "id": "task_12345",
      "status": "RUNNING",
      "message": "Loading records: completed 5000/10000 items (50%)",
      "completion": 0.50,
      "created": "2025-01-30T10:00:00Z",
      "started": "2025-01-30T10:00:05Z",
      "ping": "2025-01-30T10:05:00Z",
      "taskUser": "[email protected]",
      "scheduling": {
        "description": "Customer data load"
      }
    }
  ],
  "page": {
    "number": 0,
    "size": 20,
    "totalElements": 5,
    "totalPages": 1
  }
}

Get Task Details

Endpoint: GET /tasks/instance/{id}
Permission: task.view

BASH
GET /tasks/instance/task_12345

Response includes:

  • Full task information

  • Progress details

  • Start/end timestamps

  • Error messages (if failed)

  • Parent/child task relationships

Understanding Progress

Completion (0.0 to 1.0):

  • 0.0 = 0% complete

  • 0.50 = 50% complete

  • 1.0 = 100% complete

Message format:

CODE
"Loading records: completed 5000/10000 items (50%)"

Ping timestamp:

  • Last "heartbeat" from task

  • Used to detect stalled tasks

  • Updated every few seconds during execution


Task Scheduling

Scheduling Types

ONE_OFF:

  • Execute once

  • Created by API operations (load, export, synchronize)

  • Deleted after completion or after retention period

CRON:

  • Execute on schedule

  • Defined by cron expression

  • Remain in system for repeated execution

  • Can be enabled/disabled

List Scheduled Tasks

Endpoint: GET /tasks/schedulings
Permission: task.list

JSON
{
  "schedulings": [
    {
      "id": "metrics",
      "description": "Daily system metrics",
      "type": "CRON",
      "cron": "0 2 * * *",
      "enabled": true,
      "nextExecution": "2025-01-31T02:00:00Z"
    }
  ]
}

Get Scheduling Details

Endpoint: GET /tasks/scheduling/{id}
Permission: task.view


Task Management Operations

Cancel Task

Endpoint: PUT /tasks/cancel/{id}
Permission: task.cancel

BASH
PUT /tasks/cancel/task_12345

Behavior:

  • Tasks in CREATED or RUNNING status can be cancelled

  • Cancellation is graceful (task checks periodically)

  • Task status changes to CANCEL, then CANCELLED

  • Cleanup operations always execute

Cancellation may take time if task is in middle of processing.

Run Scheduled Task Immediately

Endpoint: PUT /tasks/scheduling/{id}/run
Permission: task.run

BASH
PUT /tasks/scheduling/metrics/run

Executes task immediately without waiting for next scheduled time.

Enable/Disable Scheduled Task

Endpoint: PUT /tasks/scheduling/{id}/enabled/{true|false}
Permission: task.enable

BASH
PUT /tasks/scheduling/metrics/enabled/false

Disabled tasks skip scheduled executions.

Update CRON Expression

Endpoint: POST /tasks/scheduling/{id}/cron
Permission: task.save

JSON
{
  "cron": "0 3 * * *"
}

Changes schedule and recalculates next execution time.


Task Notifications

Tasks can send email notifications on status changes.

Notification Configuration

When creating tasks via API:

JSON
{
  "source": "customers",
  "sinkTable": "target_customers",
  "operation": "UPSERT",
  "notification": {
    "recipients": [
      {
        "name": "Admin",
        "email": "[email protected]"
      }
    ],
    "status": ["COMPLETED", "FAILED", "CANCELLED"]
  }
}

Notification Triggers

Default notifications sent on:

  • COMPLETED - Task finished successfully

  • FAILED - Task encountered errors

  • CANCELLED - Task was cancelled

Optional notifications:

  • CREATED - Task queued

  • RUNNING - Task started execution

Notification Content

Emails include:

  • Task description

  • Final status

  • Completion percentage

  • Start and end times

  • Error messages (if failed)

  • Link to view task details


Common Task Scenarios

Entity Data Load

Trigger:

BASH
PUT /entities/synchronize
{
  "entity": "customers",
  "loadMask": "FULL",
  "indexClassificationMask": "FULL",
  "sinkMask": "NONE"
}

Task Flow:

  • EntitySynchronizationTask created (CREATED)

  • Task starts execution (RUNNING)

  • Creates child tasks:
    #* EntityLoaderTask (loads data)
    #* EntityIndexClassificationTask (indexes and classifies)

  • All child tasks complete

  • Parent task completes (COMPLETED)

Monitoring:

BASH
GET /tasks/instance/{taskId}

Table Data Export

Trigger:

BASH
POST /tables/export
{
  "source": "customers",
  "maxRecords": 10000,
  "chunkRecords": 1000
}

Task Flow:

  • TableEtlTask created (CREATED)

  • Task starts (RUNNING)

  • Exports records in chunks

  • Creates export file

  • Task completes (COMPLETED)

Retrieve Export:

BASH
- Get task details to find file ID
GET /tasks/instance/{taskId}

- Download file
GET /files/{fileId}/download

Scheduled Steward Execution

Configuration:

JSON
{
  "entity": "customers",
  "type": "AUTO_DUPLICATES",
  "steward": "customer_steward",
  "stewardCron": "0 2 * * *",
  "automatic": true
}

Task Flow:

  • Scheduled task triggers at 2:00 AM daily

  • EntityStewardTask created and executed

  • Finds MATCH-classified buckets

  • Merges duplicate records

  • Updates entity statistics

  • Task completes, repeats next day


Task Progress Tracking

Scoped Tasks

Tasks with known total items report percentage:

JSON
{
  "message": "Loading records: completed 7500/10000 items (75%)",
  "completion": 0.75
}

Calculation:

CODE
Completion = ItemsProcessed / TotalItems

Non-Scoped Tasks

Tasks with unknown total report count only:

JSON
{
  "message": "Processing: completed 5000 items",
  "completion": 0.0
}

When complete:

JSON
{
  "message": "Processing: completed 12543 items",
  "completion": 1.0
}

Task Messages

Progress messages updated during execution:

Loading phase:

CODE
"Loading from source: completed 1000/5000 items (20%)"

Indexing phase:

CODE
"Indexing records: completed 3000/5000 items (60%)"

Classification phase:

CODE
"Classifying buckets: completed 450/500 buckets (90%)"

Completion:

CODE
"Task completed successfully"

Handling Task Failures

Understanding Failures

When task fails:

  • Status changes to FAILED

  • extraMessage contains error details

  • Notification sent (if configured)

  • Task removed after retention period

Example failed task:

JSON
{
  "id": "task_12345",
  "status": "FAILED",
  "message": "Loading records: completed 3000/10000 items (30%)",
  "extraMessage": "Connection timeout to source database",
  "completion": 0.30
}

Common Failure Causes

  • Connection timeout - Check source system availability

  • Invalid credentials - Verify credentials are correct

  • Transformation error - Fix transformation configuration

  • Validation error - Fix dataset validation rules

  • Out of memory - Reduce batch size, increase memory

  • Resource locked - Unlock resource, retry

Retry Strategy

  • Review failure details:

BASH
GET /tasks/instance/{taskId}
  • Fix underlying issue (configuration, connectivity, etc.)

  • Retry operation:

BASH
- Re-trigger the operation that created the task
POST /tables/load
{
  /* same request as before */
}

Task Performance

Monitoring Task Duration

Track how long tasks take:

JSON
{
  "started": "2025-01-30T10:00:00Z",
  "finished": "2025-01-30T10:15:30Z"
}

Duration: 15 minutes 30 seconds

Detecting Stalled Tasks

Tasks are considered stalled if:

  • Status is RUNNING or CANCEL

  • No ping update for configured duration (default: 15 minutes)

System behavior:

  • TaskStalledDaemon monitors for stalled tasks

  • Automatically cancels tasks exceeding threshold

  • Sets status to CANCELLED with reason "Task stalled"

Prevention:

  • Tasks must update ping periodically

  • Long-running tasks should report progress regularly

Optimizing Task Execution

For Load Operations:

  • Use incremental loading (timestamp filtering)

  • Adjust batch/chunk sizes for memory

  • Schedule during off-peak hours

  • Use sampling for testing

For Export Operations:

  • Set reasonable maxRecords limit

  • Use appropriate chunkRecords size

  • Apply filters to reduce data volume

For Entity Synchronization:

  • Use CHANGES mask for incremental index updates

  • Schedule steward during low-usage times

  • Monitor bucket statistics to track progress


Best Practices

Task Management

  • Monitor active tasks - Regularly check RUNNING tasks

  • Review failures - Investigate and fix failed tasks promptly

  • Clean up old tasks - System auto-deletes after retention period

  • Use notifications - Configure email alerts for critical tasks

  • Schedule wisely - Run heavy tasks during off-peak hours

Scheduling

  • Appropriate CRON - Match schedule to data update frequency

  • Avoid overlaps - Space out heavy tasks

  • Test schedules - Use "run now" to test before enabling

  • Document schedules - Keep schedule documentation current

  • Monitor execution - Track actual vs expected execution times

Error Handling

  • Immediate investigation - Check failed tasks promptly

  • Log review - Examine detailed logs for errors

  • Incremental fixes - Fix and test with small data samples

  • Prevent recurrence - Address root causes, not symptoms

  • Graceful degradation - Design tasks to handle partial failures

Performance

  • Right-size batches - Balance memory usage and throughput

  • Incremental processing - Use timestamp filtering when possible

  • Parallel execution - Multiple small tasks vs one large task

  • Progress monitoring - Ensure regular progress updates

  • Resource planning - Allocate sufficient memory and threads


Troubleshooting

Task Stuck in CREATED

Check:

  • System has available executor threads

  • Task scheduler daemon is running

  • No system errors in logs

Solution:

  • Wait for executor availability

  • Check system health

  • Contact administrator if persistent

Task Not Reporting Progress

Check:

  • Task status is RUNNING

  • Ping timestamp is recent

  • No stalled task warnings

Solution:

  • Wait longer (may be in initialization)

  • Check if task has stalled

  • Cancel and retry if truly stuck

Cannot Cancel Task

Check:

  • Task status allows cancellation (CREATED or RUNNING)

  • User has task.cancel permission

Solution:

  • Wait if task is completing

  • Task may be in cleanup phase

  • Contact administrator if unresponsive

Scheduled Task Not Executing

Check:

  • Task is enabled

  • CRON expression is valid

  • Next execution time is future

  • Task scheduler daemon is running

Solution:

  • Enable task if disabled

  • Fix CRON expression

  • Use "run now" to test

  • Check system logs

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.