Tasks
Introduction
Tasks are units of work that execute asynchronously in dedicated thread pools, allowing the API to return immediately while work continues in the background.
Key characteristics:
Execute outside HTTP request/response cycle
Tracked through complete lifecycle
Report progress in real-time
Support cancellation
Send notifications on completion
Can be scheduled (one-off or recurring)
Tasks enable non-blocking execution of heavy operations like data loading, ETL processing, and entity synchronization.
Task Types
Entity Operations
Task Type | Purpose | Triggered By |
|---|---|---|
EntityLoaderTask | Load data from sources into entity | Entity synchronization |
EntitySynchronizationTask | Orchestrate complete entity sync | Manual sync, automatic mode |
EntityIndexClassificationTask | Index and classify records | Entity synchronization |
EntitySinkTask | Export entity data to sinks | Entity synchronization |
EntityStewardTask | Automated duplicate resolution | Scheduled steward (AUTO_DUPLICATES) |
Table ETL Operations
Task Type | Purpose | Triggered By |
|---|---|---|
LOAD | Load data from source to table |
|
EXPORT | Export table data to file |
|
TRANSFORM | Transform table data in-place |
|
System Tasks
Task Type | Purpose | Triggered By |
|---|---|---|
MetricsTask | Collect system metrics | Scheduled (CRON) |
TokenExpirationTask | Clean expired tokens | Scheduled (CRON) |
Task Lifecycle
Status Progression
Status | Description | Next Action |
|---|---|---|
CREATED | Queued for execution | Wait for executor to pick up |
RUNNING | Currently executing | Monitor progress |
COMPLETED | Finished successfully | Review results |
FAILED | Encountered error | Check logs, retry |
CANCEL | Cancellation in progress | Wait for cleanup |
CANCELLED | Cancelled by user/system | Review reason |
Task Execution Phases
Initialization - Validate parameters, prepare resources
Execution - Perform main work, report progress
Shutdown - Cleanup resources (always runs)
Monitoring Tasks
List Task Instances
Endpoint: GET /tasks/instances
Permission: task.list
Query Parameters:
status- Filter by status (CREATED, RUNNING, COMPLETED, FAILED, CANCELLED)sort- Sort order (ASC, DESC)pageNumber- Page number (default: 0)pageSize- Page size (default: 10)
Response:
{
"instances": [
{
"id": "task_12345",
"status": "RUNNING",
"message": "Loading records: completed 5000/10000 items (50%)",
"completion": 0.50,
"created": "2025-01-30T10:00:00Z",
"started": "2025-01-30T10:00:05Z",
"ping": "2025-01-30T10:05:00Z",
"taskUser": "[email protected]",
"scheduling": {
"description": "Customer data load"
}
}
],
"page": {
"number": 0,
"size": 20,
"totalElements": 5,
"totalPages": 1
}
}
Get Task Details
Endpoint: GET /tasks/instance/{id}
Permission: task.view
GET /tasks/instance/task_12345
Response includes:
Full task information
Progress details
Start/end timestamps
Error messages (if failed)
Parent/child task relationships
Understanding Progress
Completion (0.0 to 1.0):
0.0 = 0% complete
0.50 = 50% complete
1.0 = 100% complete
Message format:
"Loading records: completed 5000/10000 items (50%)"
Ping timestamp:
Last "heartbeat" from task
Used to detect stalled tasks
Updated every few seconds during execution
Task Scheduling
Scheduling Types
ONE_OFF:
Execute once
Created by API operations (load, export, synchronize)
Deleted after completion or after retention period
CRON:
Execute on schedule
Defined by cron expression
Remain in system for repeated execution
Can be enabled/disabled
List Scheduled Tasks
Endpoint: GET /tasks/schedulings
Permission: task.list
{
"schedulings": [
{
"id": "metrics",
"description": "Daily system metrics",
"type": "CRON",
"cron": "0 2 * * *",
"enabled": true,
"nextExecution": "2025-01-31T02:00:00Z"
}
]
}
Get Scheduling Details
Endpoint: GET /tasks/scheduling/{id}
Permission: task.view
Task Management Operations
Cancel Task
Endpoint: PUT /tasks/cancel/{id}
Permission: task.cancel
PUT /tasks/cancel/task_12345
Behavior:
Tasks in CREATED or RUNNING status can be cancelled
Cancellation is graceful (task checks periodically)
Task status changes to CANCEL, then CANCELLED
Cleanup operations always execute
Cancellation may take time if task is in middle of processing.
Run Scheduled Task Immediately
Endpoint: PUT /tasks/scheduling/{id}/run
Permission: task.run
PUT /tasks/scheduling/metrics/run
Executes task immediately without waiting for next scheduled time.
Enable/Disable Scheduled Task
Endpoint: PUT /tasks/scheduling/{id}/enabled/{true|false}
Permission: task.enable
PUT /tasks/scheduling/metrics/enabled/false
Disabled tasks skip scheduled executions.
Update CRON Expression
Endpoint: POST /tasks/scheduling/{id}/cron
Permission: task.save
{
"cron": "0 3 * * *"
}
Changes schedule and recalculates next execution time.
Task Notifications
Tasks can send email notifications on status changes.
Notification Configuration
When creating tasks via API:
{
"source": "customers",
"sinkTable": "target_customers",
"operation": "UPSERT",
"notification": {
"recipients": [
{
"name": "Admin",
"email": "[email protected]"
}
],
"status": ["COMPLETED", "FAILED", "CANCELLED"]
}
}
Notification Triggers
Default notifications sent on:
COMPLETED- Task finished successfullyFAILED- Task encountered errorsCANCELLED- Task was cancelled
Optional notifications:
CREATED- Task queuedRUNNING- Task started execution
Notification Content
Emails include:
Task description
Final status
Completion percentage
Start and end times
Error messages (if failed)
Link to view task details
Common Task Scenarios
Entity Data Load
Trigger:
PUT /entities/synchronize
{
"entity": "customers",
"loadMask": "FULL",
"indexClassificationMask": "FULL",
"sinkMask": "NONE"
}
Task Flow:
EntitySynchronizationTask created (CREATED)
Task starts execution (RUNNING)
Creates child tasks:
#* EntityLoaderTask (loads data)
#* EntityIndexClassificationTask (indexes and classifies)All child tasks complete
Parent task completes (COMPLETED)
Monitoring:
GET /tasks/instance/{taskId}
Table Data Export
Trigger:
POST /tables/export
{
"source": "customers",
"maxRecords": 10000,
"chunkRecords": 1000
}
Task Flow:
TableEtlTask created (CREATED)
Task starts (RUNNING)
Exports records in chunks
Creates export file
Task completes (COMPLETED)
Retrieve Export:
- Get task details to find file ID
GET /tasks/instance/{taskId}
- Download file
GET /files/{fileId}/download
Scheduled Steward Execution
Configuration:
{
"entity": "customers",
"type": "AUTO_DUPLICATES",
"steward": "customer_steward",
"stewardCron": "0 2 * * *",
"automatic": true
}
Task Flow:
Scheduled task triggers at 2:00 AM daily
EntityStewardTask created and executed
Finds MATCH-classified buckets
Merges duplicate records
Updates entity statistics
Task completes, repeats next day
Task Progress Tracking
Scoped Tasks
Tasks with known total items report percentage:
{
"message": "Loading records: completed 7500/10000 items (75%)",
"completion": 0.75
}
Calculation:
Completion = ItemsProcessed / TotalItems
Non-Scoped Tasks
Tasks with unknown total report count only:
{
"message": "Processing: completed 5000 items",
"completion": 0.0
}
When complete:
{
"message": "Processing: completed 12543 items",
"completion": 1.0
}
Task Messages
Progress messages updated during execution:
Loading phase:
"Loading from source: completed 1000/5000 items (20%)"
Indexing phase:
"Indexing records: completed 3000/5000 items (60%)"
Classification phase:
"Classifying buckets: completed 450/500 buckets (90%)"
Completion:
"Task completed successfully"
Handling Task Failures
Understanding Failures
When task fails:
Status changes to FAILED
extraMessagecontains error detailsNotification sent (if configured)
Task removed after retention period
Example failed task:
{
"id": "task_12345",
"status": "FAILED",
"message": "Loading records: completed 3000/10000 items (30%)",
"extraMessage": "Connection timeout to source database",
"completion": 0.30
}
Common Failure Causes
Connection timeout - Check source system availability
Invalid credentials - Verify credentials are correct
Transformation error - Fix transformation configuration
Validation error - Fix dataset validation rules
Out of memory - Reduce batch size, increase memory
Resource locked - Unlock resource, retry
Retry Strategy
Review failure details:
GET /tasks/instance/{taskId}
Fix underlying issue (configuration, connectivity, etc.)
Retry operation:
- Re-trigger the operation that created the task
POST /tables/load
{
/* same request as before */
}
Task Performance
Monitoring Task Duration
Track how long tasks take:
{
"started": "2025-01-30T10:00:00Z",
"finished": "2025-01-30T10:15:30Z"
}
Duration: 15 minutes 30 seconds
Detecting Stalled Tasks
Tasks are considered stalled if:
Status is RUNNING or CANCEL
No ping update for configured duration (default: 15 minutes)
System behavior:
TaskStalledDaemon monitors for stalled tasks
Automatically cancels tasks exceeding threshold
Sets status to CANCELLED with reason "Task stalled"
Prevention:
Tasks must update ping periodically
Long-running tasks should report progress regularly
Optimizing Task Execution
For Load Operations:
Use incremental loading (timestamp filtering)
Adjust batch/chunk sizes for memory
Schedule during off-peak hours
Use sampling for testing
For Export Operations:
Set reasonable maxRecords limit
Use appropriate chunkRecords size
Apply filters to reduce data volume
For Entity Synchronization:
Use CHANGES mask for incremental index updates
Schedule steward during low-usage times
Monitor bucket statistics to track progress
Best Practices
Task Management
Monitor active tasks - Regularly check RUNNING tasks
Review failures - Investigate and fix failed tasks promptly
Clean up old tasks - System auto-deletes after retention period
Use notifications - Configure email alerts for critical tasks
Schedule wisely - Run heavy tasks during off-peak hours
Scheduling
Appropriate CRON - Match schedule to data update frequency
Avoid overlaps - Space out heavy tasks
Test schedules - Use "run now" to test before enabling
Document schedules - Keep schedule documentation current
Monitor execution - Track actual vs expected execution times
Error Handling
Immediate investigation - Check failed tasks promptly
Log review - Examine detailed logs for errors
Incremental fixes - Fix and test with small data samples
Prevent recurrence - Address root causes, not symptoms
Graceful degradation - Design tasks to handle partial failures
Performance
Right-size batches - Balance memory usage and throughput
Incremental processing - Use timestamp filtering when possible
Parallel execution - Multiple small tasks vs one large task
Progress monitoring - Ensure regular progress updates
Resource planning - Allocate sufficient memory and threads
Troubleshooting
Task Stuck in CREATED
Check:
System has available executor threads
Task scheduler daemon is running
No system errors in logs
Solution:
Wait for executor availability
Check system health
Contact administrator if persistent
Task Not Reporting Progress
Check:
Task status is RUNNING
Ping timestamp is recent
No stalled task warnings
Solution:
Wait longer (may be in initialization)
Check if task has stalled
Cancel and retry if truly stuck
Cannot Cancel Task
Check:
Task status allows cancellation (CREATED or RUNNING)
User has
task.cancelpermission
Solution:
Wait if task is completing
Task may be in cleanup phase
Contact administrator if unresponsive
Scheduled Task Not Executing
Check:
Task is enabled
CRON expression is valid
Next execution time is future
Task scheduler daemon is running
Solution:
Enable task if disabled
Fix CRON expression
Use "run now" to test
Check system logs