Skip to main content
Skip table of contents

Entities

Introduction

Entities are the primary abstraction in Golden Core for managing master data and performing entity resolution. An entity orchestrates all components necessary for data deduplication, search capabilities, and golden record management.

Think of an entity as a smart wrapper around a data table that adds powerful search, duplicate detection, and master data management capabilities.

An entity represents a business concept (customers, products, suppliers, etc.) and provides:

  • Data Storage - References a table containing actual records

  • Search Capabilities - Full-text and structured search via indexing

  • Duplicate Detection - Intelligent grouping and classification of similar records

  • Master Data Management - Automated or manual duplicate resolution

  • Data Integration - Sources (input) and sinks (output) for data flow

  • Synchronization - Scheduled or on-demand data processing


Entity Types

Type

Capabilities

Required Components

Use Cases

NONE

Basic storage only

Table, Dataset

Simple data storage without advanced features

SEARCH

Storage + Search

Table, Dataset, Indexer

Searchable catalogs, directories

DUPLICATES

Storage + Search + Manual Deduplication

Table, Dataset, Indexer, Classifier, Merger

Customer MDM with human oversight

AUTO_DUPLICATES

Storage + Search + Automatic Deduplication

Table, Dataset, Indexer, Classifier, Merger, Steward

Fully automated master data management

Start with SEARCH type to test your configuration, then upgrade to DUPLICATES or AUTO_DUPLICATES once indexing is working correctly.


Entity Configuration

Core Properties

Property

Type

Required

Description

id

String

Yes

Unique identifier (alphanumeric, underscore, hyphen)

description

String

No

Human-readable description

table

String

Yes

Reference to the data table

type

EntityType

Yes

NONE, SEARCH, DUPLICATES, or AUTO_DUPLICATES

enabled

Boolean

No

Enable/disable data operations (default: true)

locked

Boolean

No

Lock configuration changes (default: false)

automatic

Boolean

No

Enable automatic synchronization (default: false)

Component References

Component

Required For

Purpose

indexer

SEARCH, DUPLICATES, AUTO_DUPLICATES

Defines search indexes and duplicate detection keys

classifier

DUPLICATES, AUTO_DUPLICATES

Compares records to determine if they are duplicates

merger

DUPLICATES, AUTO_DUPLICATES

Defines strategy for creating golden records

steward

AUTO_DUPLICATES

Automated duplicate resolution logic

pipeline

Optional

Data transformation and cleaning operations


Entity Lifecycle

Status States

Status

Description

Next Actions

EMPTY

Just created, no data

Load data via synchronization

WORKING

Currently processing

Wait for completion

READY

Operational and ready

Normal operations

INCONSISTENT

Configuration changed

Re-synchronize to update indexes

ERROR

Validation or runtime error

Check logs, fix configuration

Typical Lifecycle Flow

CODE
CREATE entity → Status: EMPTY
↓
SYNCHRONIZE (load data) → Status: WORKING
↓
Processing completes → Status: READY
↓
UPDATE configuration → Status: INCONSISTENT
↓
RE-SYNCHRONIZE → Status: READY

API Operations

Entity Management

Create or Update Entity

Endpoint: POST /entities
Permission: entity.save

JSON
{
  "create": true,
  "id": "customers",
  "description": "Customer master data",
  "table": "customer_table",
  "type": "AUTO_DUPLICATES",
  "indexer": "customer_indexer",
  "classifier": "customer_classifier",
  "merger": "customer_merger",
  "steward": "customer_steward",
  "stewardCron": "0 2 * * *",
  "enabled": true,
  "automatic": false
}

List All Entities

Endpoint: GET /entities
Permission: entity.list

JSON
{
  "entities": [
    {
      "id": "customers",
      "type": "AUTO_DUPLICATES",
      "status": "READY",
      "enabled": true,
      "locked": false,
      "stats": {
        "recordCount": 15000,
        "totalBuckets": 12500,
        "duplicateBuckets": 450
      }
    }
  ]
}

Get Entity Details

Endpoint: GET /entities/{id}
Permission: entity.view

Delete Entity

Endpoint: DELETE /entities/{id}
Permission: entity.delete

Deleting an entity removes all bucket data. The underlying table is preserved.

Control Operations

Lock/Unlock Entity

Endpoint: PUT /entities/{id}/locked/{true|false}
Permission: entity.lock

Locked entities prevent configuration changes but allow data operations.

Enable/Disable Entity

Endpoint: PUT /entities/{id}/enabled/{true|false}
Permission: entity.enable

Disabled entities reject data operations but allow configuration changes.

Set Automatic Mode

Endpoint: PUT /entities/{id}/automatic/{true|false}
Permission: entity.automatic

Automatic Mode (true):

  • Sources execute on their CRON schedules

  • Steward runs automatically (AUTO_DUPLICATES only)

  • Fully hands-off operation

Manual Mode (false):

  • No automatic processing

  • Requires explicit synchronization calls

  • Full control over timing

Data Operations

Synchronize Entity

Endpoint: PUT /entities/synchronize
Permission: entity.synchronize

JSON
{
  "entity": "customers",
  "loadMask": "FULL",
  "indexClassificationMask": "FULL",
  "sinkMask": "FULL"
}

Load Masks:

  • FULL - Load all data from all sources

  • INCREMENTAL - Load only changed records since last execution

  • CUSTOM - Load data within specific date range (use loadFrom/loadTo)

  • NONE - Skip loading

Index Classification Masks:

  • FULL - Re-index and classify all records

  • CHANGES - Index only new/modified records

  • NONE - Skip indexing/classification

Sink Masks:

  • FULL - Export all records to sinks

  • NONE - Skip sink operations

Returns a task ID for monitoring progress.

Clear Entity

Endpoint: PUT /entities/{id}/clear
Permission: entity.clear

Removes all bucket data. Table records are preserved.


Entity Synchronization Workflow

When you synchronize an entity, the following process occurs:

CODE
1. LOAD PHASE
    - Execute source queries
    - Apply transformations
    - Insert/update records in table
                  ↓
2. INDEX & CLASSIFICATION PHASE         
    - Calculate index keys for records   
    - Group records into buckets         
    - Classify buckets (MATCH/REVIEW)    
    - Update bucket statistics           
                  ↓
 3. SINK PHASE                           
    - Apply output transformations       
    - Export records to configured sinks 
    - Generate audit trail (if enabled)  
                  ↓
 4. STEWARD PHASE (AUTO_DUPLICATES only) 
    - Find MATCH buckets                 
    - Merge duplicate records            
    - Create/update golden records       
    - Update statistics                  

Working with Duplicates

Duplicate Detection Process

  • Indexing - Records grouped into buckets by similarity

  • Classification - Algorithm determines if records in bucket are duplicates

  • Review - Human or automated review of potential matches

  • Resolution - Merge duplicates into golden record or mark as non-duplicates

Classification Results

Classification

Meaning

Action

MATCH

High similarity, likely duplicates

Merge (manual or auto)

NON_MATCH

Low similarity, different records

No action needed

REVIEW

Medium similarity, uncertain

Human review required

IGNORE

Manually marked to skip

No processing

Manual Duplicate Operations

See the Golden Records guide for bucket operations (merge, split, disconnect).


Entity Statistics

Entities track key metrics accessible via API:

Metric

Description

recordCount

Total records in table

totalBuckets

Total index buckets created

duplicateBuckets

Buckets with potential duplicates

duplicatesByIndex

Duplicate counts per index mapping

Use these to monitor data quality and deduplication progress.


Import and Export

Export Entities

Endpoint: POST /entities/export

JSON
{
  "ids": ["customers", "products"]
}

Returns entity configurations as JSON for backup or migration.

Import Entities

Endpoint: POST /entities/import

CODE
{
  "entities": [
    { /* entity configuration */ }
  ]
}

Imports entity configurations. Validates before importing.


Best Practices

Configuration

  • Descriptive IDs - Use clear business names ({{customers}}, {{products}})

  • Start Simple - Begin with SEARCH type, upgrade after testing

  • Lock Production - Lock entities in production to prevent accidental changes

  • Test Thoroughly - Validate configuration with sample data first

Synchronization Strategy

  • Manual for Testing - Use manual mode during development

  • Incremental Loading - Enable incremental loads for large datasets

  • Off-Peak Scheduling - Schedule steward during low-usage hours

  • Monitor Tasks - Always check task status after synchronization

Duplicate Management

  • Tune Classifier - Adjust match/review thresholds based on results

  • Review Buckets - Periodically review REVIEW-classified buckets

  • Test Merger Logic - Verify golden record quality with sample data

  • Track Statistics - Monitor duplicate counts over time

Performance

  • Appropriate Indexes - Use EXACT for high-cardinality fields

  • Limit Fuzzy Matching - Set reasonable maximumResults for fuzzy searches

  • Batch Operations - Use FULL sync for initial load, INCREMENTAL thereafter

  • Monitor Resources - Track database and memory usage during sync


Troubleshooting

Entity Status INCONSISTENT

Cause: Configuration changed after data was loaded

Solution: Run synchronization with FULL index/classification mask

Validation Errors on Create

Cause: Missing or incompatible resources

Solution:

  • Verify all referenced resources exist (indexer, classifier, etc.)

  • Check dataset compatibility with indexer/classifier

  • Ensure CRON expressions are valid

Synchronization Fails

Cause: Entity disabled, locked, or in automatic mode

Solution:

  • Enable entity if disabled

  • Unlock if locked (for configuration changes)

  • Switch to manual mode to run synchronization

No Duplicates Found

Cause: Indexer not configured for duplicates or classifier thresholds too high

Solution:

  • Check indexer mappings have duplicates: true

  • Lower classifier match threshold

  • Verify records have values in indexed fields

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.