Resources

Introduction

Resources are the building blocks of data integration in Golden Core. They define how data flows through the system - from reading external sources, transforming data structures, to writing to target destinations.

Resources are reusable, configurable components that can be mixed and matched to create complex data workflows.

Resource Types Overview

Resource Type	Purpose	Used For
Dataset	Define data structure (schema)	Tables, sources, sinks, transformations
Source	Read data from external systems	Data ingestion, ETL loads
Sink	Write data to external systems	Data export, integration
Transformation	Map between data structures	Schema mapping, field renaming
Pipeline	Multi-stage data processing	Data cleaning, validation, enrichment
Indexer	Define search indexes	Entity search, duplicate detection
Classifier	Compare records for similarity	Duplicate detection
Merger	Create golden records	Master data management

This section covers Datasets, Sources, Sinks, Transformations, and Pipelines. See the Entities Guide for indexers, classifiers, and mergers.

Resource Management

List Resources

Endpoint: GET /resources
Permission: resource.list

Get Resource Details

Endpoint: GET /resources/id/{id}

Permission: resource.view

Create or Update Resource

Endpoint: POST /resources
Permission: resource.save

JSON

{
  "test": false,
  "resource": {
    /* resource configuration */
  }
}

Set test: true to validate without saving.

Delete Resource

Endpoint: DELETE /resources/id/{id}

Permission: resource.delete

Resources in use by entities or other resources cannot be deleted.

Duplicate Resource

Endpoint: PUT /resources/duplicate/{id}/{newId}

Permission: resource.save

Rename Resource

Endpoint: PUT /resources/rename/{id}/{newId}

Permission: resource.save

Import and Export

Export Resources

Endpoint: POST /resources/export

JSON

{
  "ids": ["customer_dataset", "crm_source", "normalize_customer"]
}

Returns JSON package for backup or migration.

Import Resources

Endpoint: POST /resources/import

JSON

{
  "resources": [
    { /* resource configuration */ }
  ]
}

Validates and imports resources. Dependencies resolved automatically.

ETL Workflows

Resources power ETL operations through three primary patterns:

LOAD Operation

Extract from source and load into table:

CODE

Source → (Transformation) → (Pipeline) → Target Table

Example:

BASH

POST /tables/load
{
  "source": "crm_source",
  "transformation": "normalize_customer",
  "pipeline": "customer_cleaning",
  "sinkTable": "customer_table",
  "operation": "UPSERT"
}

TRANSFORM Operation

Apply transformations to existing data:

CODE

Source Table → (Transformation/Pipeline) → Same Table

Example:

BASH

POST /tables/transform
{
  "source": "customer_table",
  "transformation": "enrich_customer",
  "maxRecords": 10000
}

EXPORT Operation

Export table data to external systems:

CODE

Source Table → (Transformation) → Sink

Example:

BASH

POST /tables/export
{
  "source": "customer_table",
  "transformation": "format_export",
  "maxRecords": 5000
}

Best Practices

Dataset Design

Start with core fields - Add optional fields later
Use meaningful keys - Clear column names
Enable lookups - Index frequently queried fields
Nested datasets - Use for complex hierarchical data
Validate early - Use mandatory and validation rules

Source Configuration

Incremental loading - Enable timestamp filtering for large datasets
Custom queries - Optimize with custom SQL for complex joins
Connection pooling - Configure JDBC properties for performance
Error handling - Test connections before production use

Transformation Strategy

Modular mappings - Create reusable transformations
Carry over carefully - Consider carryOverData implications
Test thoroughly - Validate mappings with sample data
Script sparingly - Use COLUMN/CONCATENATE when possible

Pipeline Design

Logical ordering - Clean before transform
Single responsibility - One purpose per processor
Reusable components - Reference transformations instead of duplicating
Monitor performance - Complex pipelines can slow processing

Troubleshooting

Resource Validation Fails

Check:

Referenced resources exist (datasets, credentials)
JDBC URLs are well-formed
File patterns are valid paths
Transformation source/target datasets match

Source Connection Errors

Check:

Network connectivity to source system
Credentials are correct and not expired
Database/API permissions granted
JDBC drivers available for database type

Transformation Mapping Errors

Check:

Source columns exist in source dataset
Target columns exist in target dataset
Data types are compatible
Scripts have no syntax errors

Pipeline Processing Errors

Check:

Processors execute in logical order
Referenced transformations exist
Scripts handle null values
Dataset matches pipeline configuration