RabbitMQ @ 30,000 Feet
- Revised: 2017-10-23
-
RabbitMQ is a lightweight Message Broker that is highly scalable, redundant, and can become the foundation for many different queuing systems. Message Brokers are systems that handle the drudgery of accepting, storing, relaying, distributing, and replaying messages between systems. The need for a message broker may not be apparent at low volumes, but message brokers can be exteremly powerful tools in rapidly scaling applications - especially when faced with significant peaks and valleys in demand. RabbitMQ implements the AMQP Messaging protocol, and allows for five broad categories of messaging: Work Queues, Publish/Subscribe ("Pub/Sub"), Routing, Topics, and Remote Procedure Calls (RPC). These broad categories of message handling can be combined in interesting ways to offload numerous tasks (such as image resizing or image analysis), and can serve to decouple existing systems.
Teams implementing RabbitMQ (often abbreviated RMQ) should spend a bit of time understanding the concept of Exchanges and Publishers, as well as Queues and Consumers. In the simplest of terms, Publishers send messages to Exchanges. Queues are bound to those Exchanges and, in turn, forward those messages on to Consumers. Certain configuration options around Confirmation (ACKing) and Rejection of messages allow you to further tailor the behavior of your system and how it should deal with undeliverable messages.
To illustrate a basic "Work Queue" system, let us tackle an image resizing need: Assume that widely distributed devices are capturing high volumes of digital images, all of which need to be re-sized to three different sizes, then placed in a central storage area. Let's assume, also, that the demand throughout the day varies widely at different times of day: Sometimes only 200 images per hour need to be resized, but at peak times, 50,000 images per hour need to be resized. We could architect this as follows: The distributed devices are the Producers, and they use their RabbitMQ client to send the images to an "ImagesToResize" Exchange. Another team creates the Image Resizing software, and we install this as the "Consumer" which subscribes to the "Resizing" queue. Finally, the ImagesToResize exchange is bound to the Resizing queue. RMQ will now handle the distribution of messages evenly to all subscribed consumers. All we must do, then, is ensure there are enough consumers alive at any given time to handle the inbound images. If there aren't enough consumers, RabbitMQ dutifully queues the messages. When there are consumers available, RabbitMQ doles out the messages as fast as the consumers can take them. Basic time-based or load-based scaling of the consumers allows the entire system to handle the load regardless of the peaks and valleys.
"There are RabbitMQ clients for an extremely large number of languages, so working RMQ into your distributed architecture should not be a major technical hurdle."
Now expand the complexity of our example a bit: Assume that, on some images, the Producer indicates that, in addition to needing resizing, there might be a human face and facial recognition should be run on the image. Other images just need resizing. We can bring in the concept of "Topic Exchanges". We implement a Topic Exchange named "Images". Then we build another consumer, this consumer handles facial recognition. We modify the Producer so that, when it broadcasts an image, it "tags" the image as follows: If the image need both resizing and facial recognition: "resize.recognition". If the image only needs resizing:"resize". Now our two different consumers subscribe to the topic exchange slightly differently: The Resizers subscribe to "resize.*" (they want all images that need resizing regardless of whether they need facial recognition). The facial recognition consumers subscribe to "*.recognition" (they want all images that need recognition regardless of whether they need resizing). Now RabbitMQ automatically handles handing out the images to the proper consumers - all with very little change on the original producer, or on the original (resizing) client.
The Remote Procedure Call (RPC) concept is useful when a message is essentially a command (possibly with parameters) that needs to reach the consumer and also, possibly, needs a response. RMQ architects this by using temporary exclusive "callback queues" which are setup when the original message is sent. The recipient of the RPC command then temporarily becomes the Producer and sends the answer back on the callback queue. The callback queue is then torn down automatically.
Finally the Publish/Subscribe (Pub/Sub) concept is extremely useful in certain messaging situations where the producer has no idea who (or how many) consumers need to receive a given message. RMQ will take care of broadcasting the appropriate messages to the currently subscribed consumers. Combining Pub/Sub with Topic Exchanges creates an incredibly simple to configure, but powerful, message routing system.
Gotchas
RabbitMQ automatically handles whatever level of message-delivery guarantees that you setup when you create your configuration, Exchanges, and Queues. If you setup a subscription as "Manual Acknowledgement", RMQ will keep track of each message regardless of how far along the delivery process it proceeded. Should your consumer begin processing the message and then die, RMQ will eventually (after the connection is closed) re-queue that message for delivery automatically.
If you do configure Guaranteed Delivery, you will need to carefully size your system so that it can handle storing 100% of the messages in the worst-case scenario of having tons of Producers and no active Consumers.
Cloud Presence
RabbitMQ can easily be installed on Linux cloud instances. There are also pre-baked RabbitMQ offerings on Google Cloud Platform, from CloudAMQP, and from Pivotal Software.
There is a web-based monitoring and configuration utility for RabbitMQ that, while somewhat minimalist, will allow you to easily monitor the state of your RMQ cluster, exchanges, and queues. The monitoring utility also includes features for debugging and troubleshooting.
Ease of Installation and Configuration
RabbitMQ is known for its stability, and for being relatively easy to install and setup in clustered (or even federated) environments to create Highly Available solutions. In a clustered setup, two or more RabbitMQ nodes automatically share the messaging work, and the failure of one of the nodes of the cluster (and even the bringing up of a replacement node) has absolutely no negative impact on the running system with the exception of throughput.
Your teams will need to write Producer and Consumer code to talk to your RMQ servers: The getting started guide has detailed, step-by-step examples of each of the main concepts outlined above in no less than nine common development languages. There are also fully-developed RabbitMQ clients for an extremely large number of languages, so working RMQ into your architecture should not be a major technical hurdle. Since all messages in RMQ are handled using a consistent protocol (AMQP), you can easily bridge different languages and platforms by having all of them exchange messages with RabbitMQ.
Similar Products
Other messaging brokers are Kafka, Kestrel, and ActiveMQ. Amazon's "Simple Notification Service" (SNS) offers Pub/Sub and other messaging as a paid cloud service. Google's "Cloud Pub/Sub" is the Google Cloud offering similar to RabbitMQ.
History and Culture
RabbitMQ is an open-source project managed by Pivotal Labs and written in the functional language Erlang. There is a strong Google Groups user community, a mailing list, and a Slack channel. Although the project can be used in a completely open source model, there are training and consulting services availble from Pivotal, as well as a commercial version called Pivotal RabbitMQ. Github development has been active since at least 2009, and RabbitMQ has a 3.6 branch that is currently stable.