This weekend I was hiking up Mount Diablo in California. I like hiking because it allows me to clear my mind and I love the feeling of isolation. Unfortunately I am not always successful in switching off. I was fixated on why enterprises keep repeating the same pattern over and over again rather then select the appropriate pattern with respect to acquiring data.
There is no one way to acquire data and it is up to the architect to decide which data acquisition technique(s) meets their requirements.
Each technique has its own appropriate usage in an information management strategy. A comprehensive information management strategy will utilize all of the following acquisition techniques and not solely rely on one approach for all circumstances.
Ingestion is an approach whereby data needs to be captured, copied, or moved to different target sources/environments with minimal impact on source systems and query workloads. This requires the handling of a variety of data (e.g. structured, semi/un-structured), with a range of latencies, utilizing differing persistence models whilst combining architecture capabilities from various technology strategies (Information Management, EDA, SOA).
There are two primary uses of ingestion, analytical ingestion and operational ingestion.
- Analytical Ingestion – Focuses on ingesting data from various sources for the purposes of performing analytics. This requires ingestion of data from various sources into a consolidated hub such as Data Warehouse, Master Data Management, or for downstream processing such as Big Data Processing.
- Operational Ingestion – Focuses on the needs of operational applications and databases. This enables operational data to be captured, filtered, transformed, and potentially consolidated from various operational systems into a single data model.
- Supports Access to Both Operational & Analytical Ingestion (e.g. Migration, Consolidation)
- Supports Various Topologies, Latencies, Communication Styles, and Data Formats
- Supports Complex Transformation and Joins
- Supports Large Results Sets
- Supports the Provisioning of Quality Data
- Supports High Query Performance Needs
- Supports Central Authoring of Authoritative Data
- Supports Analytics
- Timeliness of Data (Dependent on Time Window
- Business Case Restrictions to Copying Data
- Data Synchronization via Incremental Updates
- Data Remediation (two-way Synchronization)
- Implementation Complexity
- Generation/Mapping of Unique Identifiers
- Definition of a Canonical Model
- Propagation of Source System Security Requirements into Target Data Store
- Source and Target Data Model Updates
Replication & Synchronization
Replication & Synchronization enables the creation and maintenance of one or more synchronized standby data stores that protect data from failures, disasters, errors, and corruptions. It can address both high availability (HA) and disaster recovery (DR) requirements. In addition, it can address off-loading reporting requirements from the production data stores.
- Primarily for Read-Only Offload Reporting from Production Source System
- Supports Both Synchronous and Asynchronous Replication
- Supports High Performance Replication
- Supports Real-Time and Near-Real Time Requirements
- Supports HA/DR of Source Systems
- Data Sprawl and Associated Management and Control Issues
- Data Synchronization
- Incremental Updates
- Reliant on Data Quality of Source Systems
That is all I have time for. So in my next blog I will go into more detail with regards to acquiring data.
Good luck, Now Go Architect…