Data Replication & Synchronization

In a previous blog I highlight the use of replication and synchronization. Replication & Synchronization enables the creation and maintenance of one or more synchronized standby data stores that protect data from failures, disasters, errors, and corruptions. It can address both high availability (HA) and disaster recovery (DR) requirements. In addition, it can address off-loading reporting requirements from the production data stores.

Here is some high-level interactions between key components to support replication and synchronization.

Change Data Capture (CDC) Engine

It is possible to address replication and synchronization by creating a standby data store from a backup of the primary data store and then utilizing an incremental data ingestion engine such as a change data capture engine. This is best suited when the source and target sources are heterogeneous and information distribution is the primary requirement.

Replication Engine

When the database being sourced and targeted are homogeneous and requirements are focused on high availability and disaster recovery, then the use of a replication engine is more suited. A standby database is created from a backup of the primary database, and then the replication engine keeps the data synchronized.

Interaction 1 – As transactions are committed to the primary data store the transaction log records are transported (synchronously or asynchronously) to a remote replication/synchronization engine. The received transaction log is then written to a standby transaction log file. An acknowledgement is then sent to confirm that the transaction log was received and captured to disk.

There are two types of transport: asynchronous and synchronous.

Asynchronous – avoids impact to primary data store performance as the primary data store does not wait for acknowledgement from the standby data store that it has successful received the transaction log records. The performance benefit of an asynchronous approach, however, is accompanied by the potential for a small amount of data loss since there is no guarantee that all transaction log records have been received by the standby data store. For a zero data loss solution, the transaction logs should be transported synchronously.
Synchronous – Synchronous transport causes the primary data store to wait for confirmation from the standby data store that the transaction log file has been written to disk before it will acknowledge transmission success. Without the proper architecture the synchronous transport approach can have a noticeable impact on performance due to the time it takes for the network round trip and writing of the transaction log file. To reduce this impact the architecture must support the transportation of the transaction log file, and the primary data store writing to a local file, being performed in parallel.

As an architect you need to determine which is the best transport approach.

Requirements	Transport	Acknowledgement	Risk
Maximum Protection	Synchronous	Stall Primary Data Store Until Acknowledgement from the Standby Data Store is Received	Zero Data Loss
Maximum Availability	Synchronous	Stall Primary Data Store Until Acknowledgement from the Standby Data Store is Received or a Timeout Threshold Period Expires – then Processing Resumes	Zero Data Loss
Maximum Performance	Asynchronous	Primary Data Store Never Waits for Standby Acknowledgment.	Potential for Minimal Data Loss

Interaction 2 – The data are committed to the primary data store.

Interaction 3 – The “Apply” capability reads the standby transaction log file, validates it, and applies it to the standby data store using either a physical or logical apply technique.

Physical – A physical standby is identical to the primary data store on a block-for-block basis, and thus, the schemas, including indexes, are the same. The Physical Apply is the simplest, fastest, most reliable method of maintaining a synchronized replica(s) of a primary data store.
Logical – A logical standby data store contains the same logical information as the primary data store, although the physical organization and structure of the data can be different. Logical Apply keeps a logical standby synchronized by transforming the transaction log records received from the primary data store into SQL statements and then executing the SQL statements on the standby data store.

Interaction 4 – In cases where the primary and standby data stores become disconnected, (via network failures or standby server failures), the primary data store will continue to process transactions and accumulate a backlog of transaction log files that cannot be shipped to the standby until a new connection can be established. When a connection is re-established, the standby data store will then automatically re-synchronize with the primary

The transport and apply services are completely independent. The status or performance of the standby apply has no impact on the transaction log transport or primary data store performance. This isolation is very important. Transport in synchronous configurations is the chief determinate of impact to primary data store response time and throughput. Anything that negatively impacts transport in a synchronous configuration can reduce the primary data store throughput and increase response time.

Traditionally organizations have resorted to mirroring techniques to provide the data replication and synchronization capabilities. The issue with such an approach is that a mirroring approach continually replicates every write made to every file, and does so in write-order, in order to maintain real-time synchronization of a remote replica. But a more efficient approach is to only replicate the writes on the transaction log file as tests have shown that mirroring transmits up to 7 times the volume.

Replicate Engine or Change Data Capture Engine?

It is up to the architect to decide which replication technique meets their requirements. Each technique has its own appropriate usage and characteristics.

Characteristics	Change Data Capture Engine	Replication Engine
Primary Usage	Information Distribution	Data Protection, HA/DR
Replica	Logical	Physical / Logical
Communication	Asynchronous	Asynchronous/Synchronous
Direction	Bi-Directional	Uni-Directional
Performance	High	Extreme
Target Data Store	Read / Write	Read Only
Target Support	Heterogeneous	Homogeneous
Data Type Support	Restricted by Target Support	Full
Transformation	Yes	No
Corruption Support	Data Consistency Validation	Automatic Block Repair

An architect does not have to choose one of these mechanisms over the other. They are in effect complementary and not mutually exclusive approaches. For example you can see here a graphic that depicts the complementary use of a replication engine and a Change Data Capture engine.

A replication engine makes available a standby database which can be utilized for data protection and to offload read-only workload from the primary database. The CDC engine is used for heterogeneous replication of different subsets of the standby database to multiple target heterogeneous databases. The CDC engine could have been applied to the production database rather than the standby database but would have added additional overhead to process incremental data movement.

Good Luck, Now Go Architect…