Data Stream Engines

When utilized as an ingestion engine a Data Stream Capture Engine and Event Processing Engine offer similar capabilities . Both engines ingest high volume low latency streams. The Data Stream Capture Engine tends to be a distributed engine that efficiently captures, aggregates, and ingests large amounts of streamed data such as log data, while the event processing engine captures, aggregates, and ingests large amounts of streamed events, such as those generated by sensors.

For example, rather than batch loading log data on a daily basis, a Data Stream Capture Engine provides a method to ingest data into a target store, such as a distributed file system, as it is generated.

The Data Stream Capture Engine ingests data from various sources, optionally performs some transformation, and distributes the data either to a target store (e.g. distributed file system), or forwards the data to another Data Stream Capture Engine.

This ability to forward to additional Data Stream Capture Engines enables the definition of flexible ingestion topologies to address requirements such as scalability, routing, consolidation, reliability, etc.

Example Data Stream Topologies


A common use case is to define a data stream topology that ingests log files from various systems into a distributed file system (utilizing the consolidation pattern). 

Broadcast / Routing

But this topology could be extended to also ingest data (full or selective) to another data store (known as a broadcast or fan-out pattern), where the data will be processed & analyzed in a different fashion.

Another useful model for thinking about such Data Stream architectures is that of multiplexing. I borrowed the basic construct of an event-processing network (EPN) and applied it to Data Stream architectures. The multiplexing topology is a collection of interacting data stream producers, processing engines and data stores. In this context, the primary responsibility of the multiplexing topology is to receive data stream from producers, pass them on to the correct combination of data stream engines to process the data stream, and deliver the processed data stream to the right data store.


That is all for now.

Good Luck, Now Go Architect…