User Guide
Welcome to the Trazadera Golden User Guide. This guide helps you understand how to use Golden for your day-to-day data quality and master data management tasks.
Who This Guide Is For
This guide is designed for:
Role | Primary Tasks |
|---|---|
Data Stewards | Review duplicates, merge records, maintain data quality |
Business Analysts | Query data, generate reports, monitor metrics |
Data Managers | Configure entities, set up workflows, manage users |
Operations Teams | Monitor tasks, troubleshoot issues, track performance |
Quick Navigation
I want to... | Go to... |
|---|---|
Understand the deduplication process | |
Review and merge duplicate records | |
Search for specific records | |
Monitor data quality | |
See a complete workflow example |
Understanding Deduplication
Golden uses a multi-step process to identify and resolve duplicate records:
The Deduplication Pipeline
Source Data → Load → Index → Classify → Review → Merge → Golden Record
Step | What Happens | Your Role |
|---|---|---|
Load | Data is imported from source systems | Configure sources |
Index | Similar records are grouped into "buckets" | Define matching rules |
Classify | System determines if records are duplicates | Set classification thresholds |
Review | Uncertain matches are flagged for review | Review and decide |
Merge | Confirmed duplicates are merged | Approve or customize |
Golden Record | Best data is selected as the master | Define selection rules |
Key Concepts
Concept | Description |
|---|---|
Entity | A deduplication project (e.g., "Customers", "Products") |
Bucket | A group of potentially duplicate records |
Golden Record | The authoritative master record created from merged data |
Classification | System's confidence level (DUPLICATES, REVIEW, UNIQUE) |
Working with Buckets
Buckets are the heart of the deduplication workflow. Each bucket contains records that the system believes might be duplicates.
Bucket Classifications
Classification | Meaning | Action Required |
|---|---|---|
DUPLICATES | High confidence these are duplicates | Auto-merged (or review if preferred) |
REVIEW | Medium confidence - needs human review | Manual review required |
UNIQUE | System believes this is not a duplicate | No action needed |
Reviewing Buckets
When reviewing a bucket, you'll see:
All records in the bucket side-by-side
Match scores showing why records were grouped
Field comparisons highlighting differences
Golden record preview showing merged result
Making Decisions
Decision | When to Use | Result |
|---|---|---|
Merge | Records are confirmed duplicates | Creates single golden record |
Disconnect | Records are NOT duplicates | Separates into individual buckets |
Skip | Need more information | Leaves for later review |
Best Practices for Review
Start with highest-confidence REVIEW buckets
Use consistent decision criteria
Document unusual decisions with comments
Take breaks during long review sessions to maintain accuracy
Finding Records
Golden provides multiple ways to search for records.
Search Methods
Method | Best For | Example |
|---|---|---|
Quick Search | Known identifier (email, ID) | |
Advanced Search | Multiple criteria | Name + City + Date range |
Bucket Search | Finding specific bucket | Bucket ID |
Filter by Status | Workflow management | All REVIEW buckets |
Search Tips
Exact match: Use quotes for exact phrases:
"John Smith"Partial match: Use wildcards:
john*or*@example.comMultiple fields: Combine criteria:
email:john* AND city:Boston
Working with Golden Records
Golden Records are the authoritative, deduplicated master records.
What Makes a Golden Record?
The Golden Record is created by:
Selecting the best value for each field from source records
Applying business rules (e.g., prefer most recent, prefer specific source)
Tracking lineage back to original source records
Golden Record Lifecycle
Status | Description |
|---|---|
Active | Current master record |
Updated | Modified by new source data or manual edit |
Merged | Combined with another golden record |
Archived | Soft-deleted, can be restored |
Viewing Golden Record Details
Each Golden Record shows:
Master data: The consolidated field values
Source records: All contributing records with lineage
History: Changes over time
Quality scores: Data completeness and confidence
Monitoring and Reports
Key Metrics Dashboard
Metric | What It Shows | Target |
|---|---|---|
Total Records | Records loaded from sources | Varies |
Unique Records | Confirmed non-duplicates | Higher is better |
Duplicate Rate | Percentage of duplicates found | Depends on data quality |
Review Queue | Buckets awaiting manual review | Lower is better |
Merge Rate | Daily/weekly merges completed | Track trends |
Common Reports
Report | Purpose | Frequency |
|---|---|---|
Duplicate Summary | Overview of deduplication results | Daily/Weekly |
Source Quality | Data quality by source system | Weekly |
Steward Activity | Review decisions by user | Weekly |
Trend Analysis | Duplicate rates over time | Monthly |
Setting Up Alerts
Configure alerts for:
Review queue exceeding threshold
Unusual spike in duplicates
Task failures
Data quality drops below threshold
Common Use Cases
Building a Customer MDM System
Goal: Create a single view of customer across all systems
Steps:
Create customer dataset with required fields (name, email, phone, address)
Set up sources (CRM, e-commerce, support systems)
Configure indexer for email, phone, name matching
Define classifier with weighted comparisons
Create merger strategy (prefer most recent, most complete)
Configure entity with AUTO_DUPLICATES mode for steward review
Schedule automatic synchronization
Monitor duplicate resolution progress
Relevant Guides: Entities, Resources, Golden Records
Implementing Data Integration Pipeline
Goal: Load and clean data from external sources
Steps:
Define source dataset matching source schema
Create source resource (JDBC, file, or API)
Define target dataset for Golden Core
Create transformation mapping fields
Create pipeline for data cleaning
Configure table for storage
Execute load operation
Monitor task completion
Relevant Guides: Resources, Tables, Tasks
Setting Up User Access Control
Goal: Ensure appropriate access to data and functions
Steps:
Configure authentication method (SSO or internal)
Define custom roles for your organization
Assign appropriate permissions to roles
Create user accounts with assigned roles
Generate access tokens for integrations
Configure entitlement filters for data privacy
Monitor and audit access regularly
Relevant Guides: Security
Manual Duplicate Review Process
Goal: Review and resolve uncertain duplicate matches
Steps:
Synchronize entity to create/update buckets
Query REVIEW-classified buckets
Examine records in each bucket
Make merge or disconnect decisions
Track progress via statistics
Generate reports on data quality improvements
See detailed walkthrough: Example: Manual Deduplication Workflow
Examples
Detailed step-by-step examples to help you get started:
Example | Description | Skill Level |
|---|---|---|
Complete ETL pipeline from CRM to Golden | Intermediate | |
Step-by-step guide for reviewing duplicates | Beginner |