Chapter 11: Stream Processing

This chapter discusses the purpose of event streams, how to process them, and some key differences between them and batch processing.

Message brokers process event streams, and there are two types of them:

The first one is AMQP/JMS-style message brokers, which is suitable when there is no need to go back in time to reprocess messages, and also, the order of the facts is not important. The messages are assigned to consumers who must acknowledge them, and as soon as that happens, they are removed from the queue.

The second one is Log-based message brokers, which always deliver messages in the same order and have durability, which can enable to jump back and process old messages. It is widespread to achieve parallelism through partitioning.

One point the author mentions is to represent databases as streams of modifications on data. It opens powerful opportunities for integrating systems. Many systems can compose their state by consuming the events that happen to the main database.

Another important topic is how joins can appear in stream processes. There are basically three types of joins: stream-stream joins, stream-table joins, and table-table joins.

Stream-stream joins are usually based on a fixed time frame. In other words, it can join multiple events that happened in the last 30 minutes, for example.

Stream-table joins consist of having a changelog table maintained by the stream of events and joined with the input stream, which enriches the events.

Table-table joins are basically the streams composing two different tables which are joined to maintain materialized views.

Finally, it is important to mention that this continuous processing of events cannot recover from events in the same way as batch processing. It is simply impossible to discard output and reprocess it. Instead, it requires more robust recovery mechanisms, such as micro-batching, transactions, and idempotency.

Leave a comment

Your email address will not be published.