Resources
Introduction
Resources are the building blocks of data integration in Golden Core. They define how data flows through the system - from reading external sources, transforming data structures, to writing to target destinations.
Resources are reusable, configurable components that can be mixed and matched to create complex data workflows.
Resource Types Overview
Resource Type | Purpose | Used For |
|---|---|---|
Dataset | Define data structure (schema) | Tables, sources, sinks, transformations |
Source | Read data from external systems | Data ingestion, ETL loads |
Sink | Write data to external systems | Data export, integration |
Transformation | Map between data structures | Schema mapping, field renaming |
Pipeline | Multi-stage data processing | Data cleaning, validation, enrichment |
Indexer | Define search indexes | Entity search, duplicate detection |
Classifier | Compare records for similarity | Duplicate detection |
Merger | Create golden records | Master data management |
This section covers Datasets, Sources, Sinks, Transformations, and Pipelines. See the Entities Guide for indexers, classifiers, and mergers.
Resource Management
List Resources
Endpoint: GET /resources
Permission: resource.list
Get Resource Details
Endpoint: GET /resources/id/{id}
Permission: resource.view
Create or Update Resource
Endpoint: POST /resources
Permission: resource.save
{
"test": false,
"resource": {
/* resource configuration */
}
}
Set test: true to validate without saving.
Delete Resource
Endpoint: DELETE /resources/id/{id}
Permission: resource.delete
Resources in use by entities or other resources cannot be deleted.
Duplicate Resource
Endpoint: PUT /resources/duplicate/{id}/{newId}
Permission: resource.save
Rename Resource
Endpoint: PUT /resources/rename/{id}/{newId}
Permission: resource.save
Import and Export
Export Resources
Endpoint: POST /resources/export
{
"ids": ["customer_dataset", "crm_source", "normalize_customer"]
}
Returns JSON package for backup or migration.
Import Resources
Endpoint: POST /resources/import
{
"resources": [
{ /* resource configuration */ }
]
}
Validates and imports resources. Dependencies resolved automatically.
ETL Workflows
Resources power ETL operations through three primary patterns:
LOAD Operation
Extract from source and load into table:
Source → (Transformation) → (Pipeline) → Target Table
Example:
POST /tables/load
{
"source": "crm_source",
"transformation": "normalize_customer",
"pipeline": "customer_cleaning",
"sinkTable": "customer_table",
"operation": "UPSERT"
}
TRANSFORM Operation
Apply transformations to existing data:
Source Table → (Transformation/Pipeline) → Same Table
Example:
POST /tables/transform
{
"source": "customer_table",
"transformation": "enrich_customer",
"maxRecords": 10000
}
EXPORT Operation
Export table data to external systems:
Source Table → (Transformation) → Sink
Example:
POST /tables/export
{
"source": "customer_table",
"transformation": "format_export",
"maxRecords": 5000
}
Best Practices
Dataset Design
Start with core fields - Add optional fields later
Use meaningful keys - Clear column names
Enable lookups - Index frequently queried fields
Nested datasets - Use for complex hierarchical data
Validate early - Use mandatory and validation rules
Source Configuration
Incremental loading - Enable timestamp filtering for large datasets
Custom queries - Optimize with custom SQL for complex joins
Connection pooling - Configure JDBC properties for performance
Error handling - Test connections before production use
Transformation Strategy
Modular mappings - Create reusable transformations
Carry over carefully - Consider carryOverData implications
Test thoroughly - Validate mappings with sample data
Script sparingly - Use COLUMN/CONCATENATE when possible
Pipeline Design
Logical ordering - Clean before transform
Single responsibility - One purpose per processor
Reusable components - Reference transformations instead of duplicating
Monitor performance - Complex pipelines can slow processing
Troubleshooting
Resource Validation Fails
Check:
Referenced resources exist (datasets, credentials)
JDBC URLs are well-formed
File patterns are valid paths
Transformation source/target datasets match
Source Connection Errors
Check:
Network connectivity to source system
Credentials are correct and not expired
Database/API permissions granted
JDBC drivers available for database type
Transformation Mapping Errors
Check:
Source columns exist in source dataset
Target columns exist in target dataset
Data types are compatible
Scripts have no syntax errors
Pipeline Processing Errors
Check:
Processors execute in logical order
Referenced transformations exist
Scripts handle null values
Dataset matches pipeline configuration