Golden Records
Introduction
Golden Records represent the "single source of truth" for each entity in your system. This guide explains how Golden Core identifies duplicates, manages buckets, and creates master data records through intelligent merging.
A Golden Record is a consolidated, high-quality record created by merging duplicate records together using configurable rules.
Core Concepts
Record
A Record is a JSON document containing data about a single entity instance (customer, product, etc.).
Structure:
{
"_id": "uuid-12345",
"email": "[email protected]",
"full_name": "John Doe",
"phone": "+1234567890",
"_metadata": {
"_updated": "2025-01-30T10:00:00Z",
"_quality": 0.95,
"_merged": ["uuid-67890"],
"_errors": []
}
}
Metadata
Every record includes metadata tracking:
updated- Last modification timestampquality- Quality score (0.0 to 1.0)merged- IDs of records merged into this oneunrelated- IDs manually marked as not duplicateserrors- Validation errorsquality_facts- Quality observations
Bucket
A Bucket groups potentially duplicate records based on shared characteristics (same email, similar name, etc.).
Purpose: Avoid comparing every record with every other record (O(n²) complexity).
{
"_id": "[email protected]",
"indexId": "email",
"key": "[email protected]",
"items": ["uuid-12345", "uuid-67890"],
"size": 2,
"classification": "MATCH",
"averageScore": 0.92
}
Golden Record
A Golden Record is the result of merging duplicate records:
Contains the best values from all duplicates
Tracks source records via
mergedmetadataRepresents the authoritative version of the entity
Deduplication Process
Step-by-Step Workflow
1. INDEXING
Records grouped into buckets
Based on indexer configuration
↓
2. CLASSIFICATION
Records within bucket compared
Similarity scores calculated
Buckets classified
↓
3. REVIEW (Optional)
Human review of uncertain matches
Manual merge/split decisions
↓
4. MERGING
Duplicate records consolidated
Golden record created/updated
Source records moved to history
Indexing Phase
Records are grouped into buckets based on index mappings configured in the indexer resource.
Example Index Mappings:
Email exact match
Phone number exact match
Name fuzzy match
Geographic proximity
Result: Records with similar attributes land in the same bucket.
Classification Phase
Records within each bucket are compared using the classifier resource.
Classification Results:
Classification | Score Range | Meaning |
|---|---|---|
MATCH | High (≥ match threshold) | Likely duplicates - should merge |
REVIEW | Medium (between thresholds) | Uncertain - needs human review |
NON_MATCH | Low (≤ non-match threshold) | Different records - ignore |
IGNORE | N/A | Single record or manually ignored |
Merging Phase
For MATCH-classified buckets, records are merged using the merger resource.
Merge Strategies:
Weighted selection (trust scores)
Most recent value
Most complete value
Custom merge logic
Working with Buckets
Buckets are the key to managing duplicates in Golden Core.
List Buckets
Endpoint: GET /golden/buckets
Permission: golden.listBucket
Query Parameters:
entity- Entity identifier (required)classification- Filter by classification (MATCH, REVIEW, etc.)indexId- Filter by specific indexpageNumber- Page number (default: 0)pageSize- Page size (default: 10)
View Bucket Details
Endpoint: GET /golden/bucket/{bucketId}
Permission: golden.viewBucket
Returns bucket information and all records within it.
{
"bucket": {
"id": "[email protected]",
"classification": "MATCH",
"size": 2,
"averageScore": 0.92
},
"records": [
{
"_id": "uuid-12345",
"email": "[email protected]",
"full_name": "John Doe"
},
{
"_id": "uuid-67890",
"email": "[email protected]",
"full_name": "Jon Doe"
}
]
}
Bucket Operations
Merge Bucket
Endpoint: PUT /golden/bucket/merge
Permission: golden.mergeBucket
Merge all records in a bucket into a single golden record.
{
"entity": "customers",
"bucketId": "[email protected]"
}
Process:
Apply merger algorithm to create golden record
↓
Store original records in _merged metadata
↓
Move original records to history table
↓
Keep golden record in main table
Split Bucket
Endpoint: PUT /golden/bucket/split
Permission: golden.splitBucket
Break up a bucket by clearing index values that caused records to group together.
{
"entity": "customers",
"bucketId": "[email protected]",
"recordIds": ["uuid-12345"]
}
Use Case: False positives where records aren't actually duplicates.
Effect: Clears indexing fields for specified records so they won't group together again.
Disconnect Bucket
Endpoint: PUT /golden/bucket/disconnect
Permission: golden.disconnectBucket
Mark records as unrelated without modifying data.
{
"entity": "customers",
"bucketId": "[email protected]",
"recordIds": ["uuid-12345", "uuid-67890"]
}
Effect: Adds record IDs to _unrelated metadata, preventing future merging.
Use Case: Keep data intact but prevent false-positive merges.
Ignore Bucket
Endpoint: PUT /golden/bucket/ignore
Permission: golden.ignoreBucket
Mark bucket to skip processing.
{
"entity": "customers",
"bucketId": "[email protected]",
"ignore": true
}
Effect: Sets bucket classification to IGNORE, excluding from steward processing.
Delete Bucket
Endpoint: DELETE /golden/bucket
Permission: golden.deleteBucket
Delete all records in a bucket.
{
"entity": "customers",
"bucketId": "[email protected]"
}
This permanently deletes records. Use with caution.
Working with Records
Search Records
Endpoint: POST /golden/search
Permission: golden.searchRecord
Full-text search across entity records using Typesense.
{
"entity": "customers",
"query": "john",
"pageNumber": 0,
"pageSize": 20,
"facetBy": ["status", "country"]
}
Search Features:
Full-text search across all fields
Faceted search (filtering)
Fuzzy matching
Geo-search (distance-based)
Result ranking
Get Record
Endpoint: GET /golden/record/{recordId}
Permission: golden.viewRecord
Retrieve single record by ID with expanded details.
GET /golden/record/uuid-12345?entity=customers&expanded=true
Expanded view includes:
Full record data
Metadata and audit trail
Merged record references
Quality information
Upsert Record
Endpoint: POST /golden/record
Permission: golden.upsertRecord
Create or update a record.
{
"entity": "customers",
"record": {
"_id": "uuid-12345",
"email": "[email protected]",
"full_name": "John Doe",
"phone": "+1234567890"
}
}
{tip}
Omit _id to create new record. Include _id to update existing record.
{tip}
Delete Record
Endpoint: DELETE /golden/record
Permission: golden.deleteRecord
Delete a record.
{
"entity": "customers",
"recordId": "uuid-12345"
}
Best Practices
Deduplication Strategy
Start manual - Use
DUPLICATEStype initiallyTune thresholds - Adjust based on manual review results
Test thoroughly - Validate merger logic with sample data
Go automatic - Switch to
AUTO_DUPLICATESwhen confident
Troubleshooting
No Duplicates Detected
Check:
Indexer has
duplicates: trueon mappingsRecords have values in indexed fields
Classifier thresholds aren't too high
Entity synchronization completed successfully
Too Many False Positives
Solution:
Increase match threshold (e.g., 0.85 → 0.90)
Add more weighted comparison fields
Use stricter comparison algorithms
Use disconnect operation for known false positives
Too Many False Negatives
Solution:
Decrease match threshold (e.g., 0.85 → 0.75)
Add fuzzy index mappings
Reduce comparison field weights
Check data quality issues
Poor Golden Record Quality
Check:
Merger weights configured correctly
Merge type appropriate for data
Source data quality (use
_qualityscore)Nested dataset merge logic