Learning Notes on Designing Data-Intensive Applications (xi)

Chapter 11. Stream Processing

Photo by shiyun on Unsplash

Transmitting event streams

Messaging systems

  1. What happens if the producers send messages faster than the consumers can process them? The system can drop messages, buffer the messages in a queue, or apply backpressure (flow control, blocking the producer from sending more messages).
  2. What happens if nodes crash or temporarily go offline, are any messages lost? Durability may require some combination of writing to disk and/or replication.
  • Most message brokers automatically delete a message when it has been successfully delivered to its consumers. This makes them not suitable for long-term storage.
  • Most message brokers assume that their working set is fairly small. If the broker needs to buffer a lot of messages, each individual message takes longer to process, and the overall throughput may degrade.
  • Message brokers often support some way of subscribing to a subset of topics matching some pattern.
  • Message brokers do not support arbitrary queries, but they do notify clients when data changes.
  • Load balancing: Each message is delivered to one of the consumers. The broker may assign messages to consumers arbitrarily.
  • Fan-out: Each message is delivered to all of the consumers.

Databases and streams

Event sourcing

Processing Streams

  • Stream-stream joins: Both input streams consist of events, and the join operator searches for related events that occur within some window of time.
  • Stream-table joins: One input stream consists of events, while the other is a database change log. The change log keeps a local copy of the database up to date. For each activity event, the join operator queries the database and outputs an enriched activity event.
  • Table-table joins: Both input streams are database change logs. In this case, every change on one side is joined with the latest state of the other side. The result is a stream of changes to the materialized view of the join between the two tables.

Fault tolerance

  • Microbatching: break the stream into small blocks, and treat each block like a minuature batch process (micro-batching).
  • Chekpoint: generate rolling checkpoints of state and write them to durable storage. If a stream operator crashes, it can restart from its most recent checkpoint.



Hi :)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store