Skip to main content
Skip table of contents

Golden Records

Introduction

Golden Records represent the "single source of truth" for each entity in your system. This guide explains how Golden Core identifies duplicates, manages buckets, and creates master data records through intelligent merging.

A Golden Record is a consolidated, high-quality record created by merging duplicate records together using configurable rules.


Core Concepts

Record

A Record is a JSON document containing data about a single entity instance (customer, product, etc.).

Structure:

JSON
{
  "_id": "uuid-12345",
  "email": "[email protected]",
  "full_name": "John Doe",
  "phone": "+1234567890",
  "_metadata": {
    "_updated": "2025-01-30T10:00:00Z",
    "_quality": 0.95,
    "_merged": ["uuid-67890"],
    "_errors": []
  }
}

Metadata

Every record includes metadata tracking:

  • updated - Last modification timestamp

  • quality - Quality score (0.0 to 1.0)

  • merged - IDs of records merged into this one

  • unrelated - IDs manually marked as not duplicates

  • errors - Validation errors

  • quality_facts - Quality observations

Bucket

A Bucket groups potentially duplicate records based on shared characteristics (same email, similar name, etc.).

Purpose: Avoid comparing every record with every other record (O(n²) complexity).

JSON
{
  "_id": "[email protected]",
  "indexId": "email",
  "key": "[email protected]",
  "items": ["uuid-12345", "uuid-67890"],
  "size": 2,
  "classification": "MATCH",
  "averageScore": 0.92
}

Golden Record

A Golden Record is the result of merging duplicate records:

  • Contains the best values from all duplicates

  • Tracks source records via merged metadata

  • Represents the authoritative version of the entity


Deduplication Process

Step-by-Step Workflow

CODE
 1. INDEXING                             
    Records grouped into buckets
    Based on indexer configuration       
                  ↓
 2. CLASSIFICATION                       
    Records within bucket compared       
    Similarity scores calculated         
    Buckets classified                   
                  ↓
 3. REVIEW (Optional)                    
    Human review of uncertain matches    
    Manual merge/split decisions         
                  ↓
 4. MERGING                              
    Duplicate records consolidated       
    Golden record created/updated        
    Source records moved to history      

Indexing Phase

Records are grouped into buckets based on index mappings configured in the indexer resource.

Example Index Mappings:

  • Email exact match

  • Phone number exact match

  • Name fuzzy match

  • Geographic proximity

Result: Records with similar attributes land in the same bucket.

Classification Phase

Records within each bucket are compared using the classifier resource.

Classification Results:

Classification

Score Range

Meaning

MATCH

High (≥ match threshold)

Likely duplicates - should merge

REVIEW

Medium (between thresholds)

Uncertain - needs human review

NON_MATCH

Low (≤ non-match threshold)

Different records - ignore

IGNORE

N/A

Single record or manually ignored

Merging Phase

For MATCH-classified buckets, records are merged using the merger resource.

Merge Strategies:

  • Weighted selection (trust scores)

  • Most recent value

  • Most complete value

  • Custom merge logic


Working with Buckets

Buckets are the key to managing duplicates in Golden Core.

List Buckets

Endpoint: GET /golden/buckets
Permission: golden.listBucket

Query Parameters:

  • entity - Entity identifier (required)

  • classification - Filter by classification (MATCH, REVIEW, etc.)

  • indexId - Filter by specific index

  • pageNumber - Page number (default: 0)

  • pageSize - Page size (default: 10)

View Bucket Details

Endpoint: GET /golden/bucket/{bucketId}
Permission: golden.viewBucket

Returns bucket information and all records within it.

JSON
{
  "bucket": {
    "id": "[email protected]",
    "classification": "MATCH",
    "size": 2,
    "averageScore": 0.92
  },
  "records": [
    {
      "_id": "uuid-12345",
      "email": "[email protected]",
      "full_name": "John Doe"
    },
    {
      "_id": "uuid-67890",
      "email": "[email protected]",
      "full_name": "Jon Doe"
    }
  ]
}

Bucket Operations

Merge Bucket

Endpoint: PUT /golden/bucket/merge
Permission: golden.mergeBucket

Merge all records in a bucket into a single golden record.

JSON
{
  "entity": "customers",
  "bucketId": "[email protected]"
}

Process:

CODE
Apply merger algorithm to create golden record
                  ↓
Store original records in _merged metadata
                  ↓
Move original records to history table
                  ↓
Keep golden record in main table           

Split Bucket

Endpoint: PUT /golden/bucket/split
Permission: golden.splitBucket

Break up a bucket by clearing index values that caused records to group together.

JSON
{
  "entity": "customers",
  "bucketId": "[email protected]",
  "recordIds": ["uuid-12345"]
}

Use Case: False positives where records aren't actually duplicates.

Effect: Clears indexing fields for specified records so they won't group together again.

Disconnect Bucket

Endpoint: PUT /golden/bucket/disconnect
Permission: golden.disconnectBucket

Mark records as unrelated without modifying data.

JSON
{
  "entity": "customers",
  "bucketId": "[email protected]",
  "recordIds": ["uuid-12345", "uuid-67890"]
}

Effect: Adds record IDs to _unrelated metadata, preventing future merging.

Use Case: Keep data intact but prevent false-positive merges.

Ignore Bucket

Endpoint: PUT /golden/bucket/ignore
Permission: golden.ignoreBucket

Mark bucket to skip processing.

JSON
{
  "entity": "customers",
  "bucketId": "[email protected]",
  "ignore": true
}

Effect: Sets bucket classification to IGNORE, excluding from steward processing.

Delete Bucket

Endpoint: DELETE /golden/bucket
Permission: golden.deleteBucket

Delete all records in a bucket.

JSON
{
  "entity": "customers",
  "bucketId": "[email protected]"
}

This permanently deletes records. Use with caution.


Working with Records

Search Records

Endpoint: POST /golden/search
Permission: golden.searchRecord

Full-text search across entity records using Typesense.

JSON
{
  "entity": "customers",
  "query": "john",
  "pageNumber": 0,
  "pageSize": 20,
  "facetBy": ["status", "country"]
}

Search Features:

  • Full-text search across all fields

  • Faceted search (filtering)

  • Fuzzy matching

  • Geo-search (distance-based)

  • Result ranking

Get Record

Endpoint: GET /golden/record/{recordId}
Permission: golden.viewRecord

Retrieve single record by ID with expanded details.

BASH
GET /golden/record/uuid-12345?entity=customers&expanded=true

Expanded view includes:

  • Full record data

  • Metadata and audit trail

  • Merged record references

  • Quality information

Upsert Record

Endpoint: POST /golden/record
Permission: golden.upsertRecord

Create or update a record.

JSON
{
  "entity": "customers",
  "record": {
    "_id": "uuid-12345",
    "email": "[email protected]",
    "full_name": "John Doe",
    "phone": "+1234567890"
  }
}

{tip}
Omit _id to create new record. Include _id to update existing record.
{tip}

Delete Record

Endpoint: DELETE /golden/record
Permission: golden.deleteRecord

Delete a record.

JSON
{
  "entity": "customers",
  "recordId": "uuid-12345"
}

Best Practices

Deduplication Strategy

  • Start manual - Use DUPLICATES type initially

  • Tune thresholds - Adjust based on manual review results

  • Test thoroughly - Validate merger logic with sample data

  • Go automatic - Switch to AUTO_DUPLICATES when confident


Troubleshooting

No Duplicates Detected

Check:

  • Indexer has duplicates: true on mappings

  • Records have values in indexed fields

  • Classifier thresholds aren't too high

  • Entity synchronization completed successfully

Too Many False Positives

Solution:

  • Increase match threshold (e.g., 0.85 → 0.90)

  • Add more weighted comparison fields

  • Use stricter comparison algorithms

  • Use disconnect operation for known false positives

Too Many False Negatives

Solution:

  • Decrease match threshold (e.g., 0.85 → 0.75)

  • Add fuzzy index mappings

  • Reduce comparison field weights

  • Check data quality issues

Poor Golden Record Quality

Check:

  • Merger weights configured correctly

  • Merge type appropriate for data

  • Source data quality (use _quality score)

  • Nested dataset merge logic

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.