Oracle GoldenGate LOB Replication to Snowflake: Getting It Right
Recently I’ve been helping clients navigate one of the more challenging aspects of Oracle GoldenGate implementations – replicating Large Objects (LOBs) from Oracle to Snowflake. This week alone, I’ve worked through several LOB replication scenarios that highlight just how critical it is to understand the underlying storage mechanisms and configuration requirements.
Over my decades working with Oracle GoldenGate, LOB replication has consistently been one of those topics that separates the experienced DBAs from those still learning the ropes. There’s a lot more happening under the hood than most people realize, and getting it wrong can lead to data truncation, performance problems, or outright replication failures.
Let me walk you through what I’ve learned and how to configure Oracle GoldenGate for successful LOB replication to Snowflake.
Understanding Oracle LOB Storage
Before diving into GoldenGate configuration, you need to understand how Oracle actually stores LOBs. This knowledge directly impacts your replication strategy and performance outcomes.
Oracle uses two distinct approaches for LOB storage, and this distinction is crucial for replication:
Inline LOBs
LOB data that’s relatively small (typically under 4KB to 8KB, depending on your Oracle version) gets stored directly within the table row alongside other column data. When LOBs are stored inline, Oracle GoldenGate can capture the changes directly from the redo logs since the entire LOB value is present in the log record. This method is generally more efficient for replication.
Out-of-Line LOBs
LOB data exceeding the inline threshold gets stored in separate LOB segments or tablespaces, with only a LOB locator (pointer) stored in the table row. For out-of-line LOBs, Oracle GoldenGate often needs to fetch the LOB data from the source database directly rather than relying solely on redo logs. This fetching operation is less performant, especially for large LOBs.
The Oracle-to-Snowflake Challenge
Here’s where things get interesting. In Oracle, we’re dealing with CLOB data types. In Snowflake, these become VARCHAR columns. This isn’t just a simple rename – it’s a fundamental data type conversion.
Oracle LOB types:
- CLOB (Character Large Object)
- BLOB (Binary Large Object)
- NCLOB (National Character Large Object)
Snowflake equivalent:
- VARCHAR (with appropriate sizing)
The VARCHAR column in Snowflake can handle 64MB to 128MB depending on documentation, which gives us plenty of headroom for most Oracle LOB scenarios.
Standard Table Structure
Based on my experience, here’s how I typically structure the tables for Oracle-to-Snowflake LOB replication.
Oracle Source Table:
CREATE TABLE CTMS_PSO.document_store (
id NUMBER GENERATED BY DEFAULT AS IDENTITY,
content CLOB NOT NULL CHECK (LENGTH(content) <= 15728640),
created_date DATE DEFAULT SYSDATE,
CONSTRAINT pk_document_store PRIMARY KEY (id)
);
— Performance optimization
CREATE INDEX CTMS_PSO.idx_document_store_created ON document_store(created_date);
Snowflake Target Table:
CREATE TABLE ctms_pso.document_store (
id NUMBER AUTOINCREMENT,
content VARCHAR(16777216) NOT NULL,
created_date TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
CONSTRAINT pk_document_store PRIMARY KEY (id)
);
Notice I’m using a VARCHAR(16777216) in Snowflake – that’s 16MB capacity, slightly larger than the Oracle constraint to provide a safety buffer.
Oracle GoldenGate Configuration Requirements
Here’s the critical part that many DBAs miss: Oracle GoldenGate handles LOB replication differently depending on your target platform:
- Oracle-to-Oracle environments: LOBs replicate in pieces
- Oracle-to-Non-Oracle environments: You need the complete LOB for each transaction
For Snowflake targets, you absolutely must configure GoldenGate to capture complete LOBs.
Extract Configuration
The key parameter in your Extract configuration is `TRANLOGOPTIONS FETCHPARTIALLOB`. When Extract receives partial LOB content from the logmining server, this forces it to fetch the complete LOB image instead of processing just the partial content.
Basic Extract Parameters:
EXTRACT ETSAMPLE
USERIDALIAS SOURCE DOMAIN OracleGoldenGate
EXTTRAIL WW
REPORTCOUNT EVERY 2 MINUTES, RATE
WARNLONGTRANS 30MIN CHECKINTERVAL 10MIN
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 512, PARALLELISM 2)
TRANLOGOPTIONS FETCHPARTIALLOB
NOCOMPRESSUPDATES
TABLE SCHEMA.table_name;
The FETCHPARTIALLOB parameter is your best friend for non-Oracle targets. Without it, you’ll likely encounter LOB truncation issues.
Replicat Configuration
The Replicat configuration for LOB replication is typically straightforward:
REPLICAT REPSAMPLE
REPERROR(DEFAULT, ABEND)
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAXTRANSOPS 20000
MAP SOURCE.SCHEMA.table_name, TARGET TARGET.SCHEMA.table_name;
Advanced Scenario: LOB Size Transformation
Recently, a client needed to limit all replicated LOBs to exactly 8MB due to their existing Snowflake architecture and constraints from a previous replication tool. This required leveraging Oracle GoldenGate’s SQLEXEC functionality.
Understanding SQLEXEC
SQLEXEC allows Oracle GoldenGate to execute stored procedures, queries, or database commands within the replication process. Think of it as real-time data transformation during extraction.
For LOB truncation, you can use Oracle’s `DBMS_LOB.SUBSTR` function to capture only a specific portion of the LOB.
Advanced Extract with SQLEXEC:
EXTRACT ETSAMPLE
USERIDALIAS SOURCE DOMAIN OracleGoldenGate
EXTTRAIL WW
REPORTCOUNT EVERY 2 MINUTES, RATE
WARNLONGTRANS 30MIN CHECKINTERVAL 10MIN
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 512, PARALLELISM 2)
TRANLOGOPTIONS FETCHPARTIALLOB
NOCOMPRESSUPDATES
TABLE SCHEMA.table_name, SQLEXEC(ID lob_id, QUERY "select dbms_lob.SUBSTR(content, 1, 8388608) from SCHEMA.table_name where ID = :LOB_ID", PARAMS(LOB_ID = ID), EXEC SOURCEROW);
Breaking down the SQLEXEC command:
- ID lob_id: Creates a parameter variable
- QUERY “select dbms_lob.SUBSTR(content, 1, 8388608)…”: Executes substring operation for exactly 8MB (8,388,608 bytes)
- PARAMS(LOB_ID = ID): Maps the table’s primary key to the query parameter
- EXEC SOURCEROW: Runs this SQL for every captured row
This approach handles all DML operations (INSERT, UPDATE, DELETE) automatically, applying the 8MB limit during extraction.
Performance Considerations
LOB replication can impact performance significantly. Here are the key areas to monitor:
Memory Requirements
- Each Integrated Extract requires minimum 1.5GB in the Streams Pool
- Monitor SGA usage during LOB-intensive periods
- Consider multiple extracts only for business reasons, not performance
Network and Storage
- LOB replication requires substantial network bandwidth
- Trail files will be significantly larger with LOB data
- Implement appropriate trail file purging strategies
Database Waits
- Long-running transactions can block LOB replication
- Ensure applications commit frequently
- Monitor for concurrency waits during commit processing
Common Issues and Solutions
Issue: LOB Truncation
Symptom: Incomplete LOB data in Snowflake
Solution: Ensure `TRANLOGOPTIONS FETCHPARTIALLOB` is configured
Issue: Performance Degradation
Symptom: High Extract lag with LOB tables
Solution: Tune `TRANLOGOPTIONS INTEGRATEDPARAMS` and monitor memory allocation
Issue: Size Violations
Symptom: Replicat errors due to oversized LOBs
Solution: Implement SQLEXEC transformation or increase target VARCHAR size
Recommendations
Based on my experience implementing LOB replication across numerous client environments:
- Always use `FETCHPARTIALLOB` for non-Oracle targets – this cannot be overstated
- Size your Snowflake VARCHAR columns appropriately – include buffer space beyond your Oracle constraints
- Test with realistic LOB volumes – small test data won’t reveal performance issues
- Monitor Extract lag closely during initial deployment
- Document your SQLEXEC transformations – they’re not obvious to other DBAs
Conclusion
Oracle LOB replication to Snowflake requires understanding both Oracle’s storage architecture and GoldenGate’s configuration nuances. The key is recognizing that non-Oracle targets need complete LOB capture, not the partial replication used in Oracle-to-Oracle environments. With proper configuration using `FETCHPARTIALLOB` and appropriate table sizing, LOB replication to Snowflake can be reliable and performant. For more complex scenarios requiring data transformation, SQLEXEC provides powerful capabilities for real-time LOB modification during extraction.
Bobby Curtis
I’m Bobby Curtis and I’m just your normal average guy who has been working in the technology field for awhile (started when I was 18 with the US Army). The goal of this blog has changed a bit over the years. Initially, it was a general blog where I wrote thoughts down. Then it changed to focus on the Oracle Database, Oracle Enterprise Manager, and eventually Oracle GoldenGate.
If you want to follow me on a more timely manner, I can be followed on twitter at @dbasolved or on LinkedIn under “Bobby Curtis MBA”.