Data Acquisition

This weekend I was hiking up Mount Diablo in California. I like hiking because it allows me to clear my mind and I love the feeling of isolation. Unfortunately I am not always successful in switching off. I was fixated on why enterprises keep repeating the same pattern over and over again rather then select the appropriate pattern with respect to acquiring data.

There is no one way to acquire data and it is up to the architect to decide which data acquisition technique(s) meets their requirements.

Each technique has its own appropriate usage in an information management strategy. A comprehensive information management strategy will utilize all of the following acquisition techniques and not solely rely on one approach for all circumstances.

Ingestion

Ingestion is an approach whereby data needs to be captured, copied, or moved to different target sources/environments with minimal impact on source systems and query workloads. This requires the handling of a variety of data (e.g. structured, semi/un-structured), with a range of latencies, utilizing differing persistence models whilst combining architecture capabilities from various technology strategies (Information Management, EDA, SOA).

There are two primary uses of ingestion, analytical ingestion and operational ingestion.

Analytical Ingestion – Focuses on ingesting data from various sources for the purposes of performing analytics. This requires ingestion of data from various sources into a consolidated hub such as Data Warehouse, Master Data Management, or for downstream processing such as Big Data Processing.
Operational Ingestion – Focuses on the needs of operational applications and databases. This enables operational data to be captured, filtered, transformed, and potentially consolidated from various operational systems into a single data model.

Characteristics

Supports Access to Both Operational & Analytical Ingestion (e.g. Migration, Consolidation)
Supports Various Topologies, Latencies, Communication Styles, and Data Formats
Supports Complex Transformation and Joins
Supports Large Results Sets
Supports the Provisioning of Quality Data
Supports High Query Performance Needs
Supports Central Authoring of Authoritative Data
Supports Analytics

Challenges

Timeliness of Data (Dependent on Time Window
Business Case Restrictions to Copying Data
Data Synchronization via Incremental Updates
Data Remediation (two-way Synchronization)
Implementation Complexity
Generation/Mapping of Unique Identifiers
Definition of a Canonical Model
Propagation of Source System Security Requirements into Target Data Store
Source and Target Data Model Updates

Replication & Synchronization

Replication & Synchronization enables the creation and maintenance of one or more synchronized standby data stores that protect data from failures, disasters, errors, and corruptions. It can address both high availability (HA) and disaster recovery (DR) requirements. In addition, it can address off-loading reporting requirements from the production data stores.