A Review on SNS, SQS, and EventBridge and when to use what

Photo by W on Unsplash

I’ve been using SNS/SQS at work, and recently also tried EventBridge at my hobby project. I found them all been very useful in different scenarios, and would like to share my feedback on them.

SQS

— A message queue that decouples sender and receiver.

Important configs:

  • Visibility timeout — The length of time that a message received from a queue won’t be visible to the other message consumers (so other consumers won’t be able to process this message until you estimate the current consumer finish processing it, always set to longer than the consumer expected process time)
  • Message retention period — The amount of time that messages retained in the queue.(Max 14 days)
  • Delivery delay — The amount of time that SQS will delay before delivering a message that is added to the queue.
  • Enable content-based deduplication — SQS can automatically create deduplication IDs based on the body of the message (Otherwise you have to set the deduplication logic using some message metadata).
  • **Receive Message Wait Time Seconds — **Config for short/long polling (See below)
  • Max Receive Count — Config for maximum processing count of a message before SQS moves it to a DLQ.

Standard Queue vs. FIFO queue

Basically you are weighing high throughput vs. reliability:

  • Message Order: Standard queues provide best-effort ordering while FIFO queues offer first-in-first-out delivery and exactly-once processing. The cost for that is the relative low TPS (300/s without batch and 3000/s with batch) vs. near unlimited TPS for standard queue.
  • Delivery Attempts: Standard queues guarantee that a message is delivered at least once with possible duplicates. FIFO queues ensure a message is delivered exactly once without duplicates.
  • Deduplication: FIFO queue can both use message group ID and content/metadata based deduplications to make sure at most once delivery.

Short vs. Long polling:

SQS triggers at a cost. But you can use short polling or long polling to tweak it.

  • Short polling: The consumer request queries only a subset of the servers to find messages that are available, and SQS sends the response right away, even if the query found no messages. It is possible that you won’t receive all your messages (but you will eventually, if you keep polling).
  • Long polling: With the wait time for the ReceiveMessage param is greater than 0, long polling is in effect (20s max). Long polling helps reduce the cost by eliminating the number of empty responses. This is done by querying all servers and allowing SQS to wait until a message is available in a queue before sending a response. Unless the connection times out, SQS sends at least 1 message back.

Dead-letter queue:

A DLQ is a queue for handling messages that are not consumed successfully in scenarios such as:

  • Message that is sent to a queue that does not exist.
  • Queue size limit exceeded.
  • Message size limit exceeded.
  • Message is rejected by the consumer.
  • Message reaches a threshold read counter number, e.g. with maxReceiveCount
  • The message expires due to per-message TTL (up to 14 days for SQS).

The redrive policy specifies the source queue, the dead-letter queue, and the conditions under which SQS moves messages from the former to the latter .

Cost saving:

  • Batching where possible: For SQS, that’s a max of 10 messages delivered at a time.
  • Using the appropriate polling mode:Long polling reduces the cost by decreasing the number of empty receives to an empty queue

SNS

— a Pub/Sub message system that is used extensively in Fanout mode.

With SNS, you can hook multiple, different kinds of subscribers to the topic: such as Amazon Kinesis Data Firehose, Amazon SQS, AWS Lambda, HTTP, email, mobile push notifications, and SMS.

What I found out to be most useful with SNS are:

  • Message attributes: You can add up to 10 structured metadata for each message. As a result, the consumer can use message attributes to handle a message in a particular way without having to process the message body first.
  • Message filtering: By default, each subscriber receives every message published to the topic. To receive a subset of the messages, a subscriber must assign a filter policy to the topic subscription that filter out messages not matching the policy.

EventBridge

— Event Bus where complex logic can be applied on

Amazon EventBridge(previously CloudWatch Events) is an event bus service that you can use to connect your applications with data from a variety of sources. EventBridge receives an event, and applies a rule to route the event to a target. Rules are based on either the **structure of the event(**event pattern), or on a schedule(cron job).

The power of EventBridge comes into play with extra capabilities:

  • Input transformer: to customise the body of an event before passes it to a target.
  • Archive and replay: You can archive an event for later use, and replay from the archive. For example, if you update an application with additional functionality, you can replay historical events to ensure that the events are reprocessed to keep the application consistent.
  • Schema registry : You can check and codify event structure and download code bindings for your preferred programming languages and speed up development.

So…Which to choose from?

It depends! But you can use this table to match with your business requirements and pick the right fit.

In Conclusion

  • In general, if you just need to decouple sender/receiver, then a message queue like SQS is good enough.
  • If you need a simple Pub/Sub then SNS.
  • If you need to apply more complex logic to the Pub/Sub then Eventbridge. EventBridge has the most rich features among other two, but this comes at a latency cost.

Also, noted that SQS is very useful with load levelling and load balancing (flatten the curve to reduce peak time pressure on downstream workers such as lambdas). The reason to have Lambda →SNS →SQS →Lambdas is most likely to take advantage of queue’s durability (extend up to 14 days) so your processor lambdas won’t be throttled during peak time. Also, you can reprocess the events on SQS with MaxReceiveCount .

This is normally for a long-running job so the downside of using this combination is lambda is using long polling to retrieve messages from SQS, and you will be charged for each poll even if they come back with empty response (I learnt it in a hard way).

That’s it!

Happy Reading!

Hi :)