The Elastisys Tech Blog

Evaluation of Cloud Native Message Queues

Elastisys
June 9, 2021
3:22 pm

In recent years, there has been a huge growth and emergence of mainly Internet of Things (IoT) devices which has brought many new challenges for the IT-industry such as processing data between devices in real-time. Therefore, the demand for a scalable, high throughput, and low latency system has become very high.

It remains quite unclear how stateful cloud native applications, such as message queue systems, perform in Kubernetes clusters. The Kafka, NATS Streaming (STAN), and RabbitMQ systems are evaluated in Kubernetes based on metrics such as performance, scalability, and overhead in this thesis.

The primarily goals of this thesis have been to evaluate the message queues in Kubernetes based on the following criteria:

Functional properties and capabilities: What is the message queue capable to do?
Performance: How is the throughput and latency for the message queue?
Scalability: How well does the message queue scale in Kubernetes?
Overhead: Is the message queue lightweight or heavyweight?
Ease of use: Is it easy or complicated to use the message queue?
Popularity and vitality: Well-supported software with good documentation?
Reliability: How robust and fault-tolerant is the message system?

Business Values

The significant rise in internet-connected devices will consequently have a substantial influence on systems’ network traffic, and current point-to-point technologies using synchronous communication between end-points in IoT-systems are not any longer a sustainable solution. Message queue architectures using the publish-subscribe paradigm are widely implemented in event-based systems. This paradigm uses asynchronous communication between entities and conforms to scalable, high throughput, and low latency systems that are well adapted within the IoT-domain.

This thesis evaluates the adaptability of three popular message queue systems in Kubernetes. The systems are designed differently, where e.g. the Kafka system is using a peer-to-peer architecture while STAN and RabbitMQ use a master-slave architecture by applying the Raft consensus algorithm. A thorough analysis of the systems’ capabilities in terms of scalability, performance, and overhead are presented. The conducted tests give further knowledge on how the performance of the Kafka system is affected in multi-broker clusters using multiple number of partitions, enabling higher levels of parallelism for the system. Furthermore, it has proven to be a difficult task for systems to choose the correct message broker that fits their system requirements, however, this thesis outlines the main characteristics of the systems and eases this process.

Maybe use this system comparison figure that visualizes the characteristics of the systems?

Experimental Design and Setup

Testbeds and an evaluation tool have been created in order to evaluate the systems and achieve the goals of the thesis. Smaller tests have been conducted for identifying relevant message queue settings to use in the evaluation plan. The general message queue evaluation plan is shown below. In order to evaluate such tests, testbeds consisting of 1, 3, and 5 message brokers are created. Each test is ran three times on Kubernetes clusters using e2-highcpu-8 machine types, each consisting of 8 vCPUs and 8 GB memory, and e2-highcpu-16 machine types consisting of 16 vCPUs and 16 GB memory. The evaluations are conducted from Masih which is a concurrent evaluation tool that provides automated orchestration for benchmarking message queues.

Input:

Parameters:

broker setup = {1,3,5}, 
machine type = {e2-highcpu-8, e2-highcpu-16}, 
message queue = {Kafka, STAN, RabbitMQ}, 
producer/consumer mode = {1 prod/cons, 3 prod/cons, 5 prod/cons},
message size = {500 B, 1000 B, 4000 B (initially 100 B)}

Output: Metrics for evaluating the systems

foreach broker setup b do
  foreach machine type m do
    foreach message queue q do
      foreach producer/consumer mode s do
        foreach message size m_s do
          for i <- 1,3 do
            Evaluate broker q with Masih tool publishing messages of m_s size
          end
         end
       end
     end
   end
end

To evaluate the performance of the message queue systems deployed in Kubernetes, different testbeds were created. The testbeds are deployed in Kubernetes clusters on Google Kubernetes Engine (GKE) consisting of various number of nodes of different machine types. Terraform provisioning tool is used for simplifying the building process of the message queue clusters. Each message broker system provides an automated script for configuring the Kubernetes cluster in the Message Queue Provisioning repository. The general design of the installation procedure for the systems is shown below.

The message queue systems are deployed together with the Prometheus monitoring tool for collecting broker and node specific metrics in the cluster. The architectures of the message queue clusters using three brokers are shown below.

Results

The conducted evaluations were done from a client using e2-highcpu-16 machine type. There were totally 36 conducted evaluations out of 54 possible tests (following the evaluation plan). The producers and consumers of the tests exchanged messages of totally 7 GB data in the conducted evaluations. The analysis was mainly performed for a single partition Kafka, STAN, and RabbitMQ systems, but the impacts of utilizing multi-partitioned Kafka systems are described as well.

Performance & Scalability

Evaluation of optimal message size for brokers

In Scenario 1, the systems were evaluated using a single producer and consumer with a workload of 14 000 000 messages each message being 500 B in a single broker cluster using e2-highcpu-8 machine type. By comparing the obtained results with its related test evaluated in Scenario 2, where the systems were evaluated using 1 750 000 messages, each message being 4000 B, a huge performance impact can be seen. All systems are benefited from using 4000 B messages, where RabbitMQ makes significant improvements by reducing its mean latency by nearly a factor of 5. In general, the Kafka system outperforms in the tests.

Table 1: Performance results for message queues in Scenario 1.

Message System	Producer Throughput (msgs/sec)	Producer Throughput (MB/s)	Consumer Throughput (msgs/sec)	Consumer Throughput (MB/s)	Latency (ms)
Kafka	128157.9	64.0789	128137.5	64.0688	20.19
STAN	32125.4	16.0627	32122.6	16.0613	127.37
RabbitMQ	13853.7	6.9268	13849.5	6.9247	299.79
Kafka_{16 part}	127065.1	63.5325	127035.9	63.5179	22.03

Table 2: Performance results for message queues in Scenario 2.

Message System	Producer Throughput (msgs/sec)	Producer Throughput (MB/s)	Consumer Throughput (msgs/sec)	Consumer Throughput (MB/s)	Latency (ms)
Kafka	17107.4	68.4295	17097.0	68.3880	19.93
STAN	10108.8	40.4353	10107.2	40.4288	51.40
RabbitMQ	7821.2	31.2847	7819.4	31.2777	67.01
Kafka_{16 part}	16700.9	66.8035	16697.5	66.7900	23.27

Figure 1: Mean latencies of the message brokers in Scenario 1.

Figure 2: Mean latencies of the message brokers in Scenario 2.

Evaluation of optimal levels of parallelism for brokers

In Scenario 3 the number of producers and consumers are increased to five compared to Scenario 1 shown previously. The goal of this test was to evaluate how increased levels of parallelisms affect the message brokers. By comparing the two tests, it is obvious that the single partition Kafka system and STAN systems’ performance are improved by increasing the parallelism of the system. However, a big performance drop can be seen for RabbitMQ in Scenario 3, indicating the system is not scalable in terms of number of producers and consumers which is mainly related to the system’s architectural style (each consumer has an own queue). For the multi-partitioned Kafka system the performance was decreased significantly, and this is due to partition overhead taking place at the broker.

Table 3: Performance results for message queues in Scenario 3.

Message System	Producer Throughput (msgs/sec)	Producer Throughput (MB/s)	Consumer Throughput (msgs/sec)	Consumer Throughput (MB/s)	Latency (ms)
Kafka	132631.9	66.3159	710031.6	355.0158	15.52
STAN	58570.7	29.2853	162601.3	81.3007	122.43
RabbitMQ	4446.0	2.2230	32146.2	16.0731	653.59
Kafka_{16 part}	134151.8	67.0759	770033.7	385.0168	35.16

Figure 3: Mean latencies of the message brokers in Scenario 3.

Evaluation of brokers’ horizontal scalability

In Scenario 4 the horizontal scalability of the systems were evaluated, where replication costs and consensus overhead are analyzed. This test is compared to its related evaluations in Scenario 1, where the only difference is that Scenario 4 evaluated the systems for a three broker cluster. Replication has great impact on the single partition Kafka system when using lower levels of parallelism for the system, however, when increasing the number of partitions or number of producers and consumers the performance drops are eliminated. The replication costs and consensus overheads for STAN and RabbitMQ does not have any significant impact on the performance in this test.

Table 4: Performance results for message queues in Scenario 4.

Message System	Producer Throughput (msgs/sec)	Producer Throughput (MB/s)	Consumer Throughput (msgs/sec)	Consumer Throughput (MB/s)	Latency (ms)
Kafka	61363.6	30.6818	61350.1	30.6751	35.73
STAN	34358.4	17.1792	34349.7	17.1748	126.56
RabbitMQ	12237.3	6.1187	12233.8	6.1169	342.10
Kafka_{3 part}	100900.3	50.4501	100887.4	50.4437	23.67
Kafka_{16 part}	12338.3	61.6692	123319.3	61.6596	20.39

Figure 4: Mean latencies of the message brokers in Scenario 4.

Evaluation of brokers’ vertical scalability

In Scenario 5 the systems were evaluated in a three broker cluster using e2-highcpu-8 instances with three producers and consumers utilizing 1000 B messages. This test was compared to Scenario 6, where e2-highcpu-16 brokers were used instead. Comparing these two tests revealed that the Kafka system is vertically scalable while using more powerful machine types for the STAN and RabbitMQ systems does not have any apparent effect.

Table 5: Performance results for message queues in Scenario 5.

Message System	Producer Throughput (msgs/sec)	Producer Throughput (MB/s)	Consumer Throughput (msgs/sec)	Consumer Throughput (MB/s)	Latency (ms)
Kafka	62752.3	62.7523	161672.9	161.6729	25.63
STAN	20385.1	20.3851	78005.6	78.0056	91.47
RabbitMQ	5955.3	5.9553	20960.7	20.9607	317.25
Kafka_{3 part}	51195.0	51.1950	262338.5	262.3385	15.08
Kafka_{16 part}	67137.5	67.1375	231371.7	231.3717	17.10

Table 6: Performance results for message queues in Scenario 6.

Message System	Producer Throughput (msgs/sec)	Producer Throughput (MB/s)	Consumer Throughput (msgs/sec)	Consumer Throughput (MB/s)	Latency (ms)
Kafka	75065.8	75.0658	200711.7	200.7117	20.40
STAN	22459.1	22.4591	95949.0	95.9490	72.91
RabbitMQ	6021.8	6.0218	21267.1	21.2671	280.24
Kafka_{3 part}	63140.5	63.1405	331587.2	331.5872	11.79
Kafka_{16 part}	104642.1	104.6421	299147.5	299.1475	14.09

Figure 5: Mean latencies of the message brokers in Scenario 5.

Figure 6: Mean latencies of the message brokers in Scenario 6.

Evaluation of best and worst performance in three broker clusters

Scenario 7 evaluated the performance of the systems using five producers and consumers with 4000 B messages in a three broker cluster of e2-highcpu-16 machine types. In this test, the best and worst performance of the three broker cluster evaluations were obtained. In terms of both throughput and latency, the RabbitMQ system resulted in worst performance with a mean latency of nearly 7 seconds, while the Kafka system generated best performance with a mean latency of 18.81 ms.

Figure 8: Illustrates Scenario 7’s producer throughput of the Kafka system.

Table 7: Performance results for message queues in Scenario 7.

Message System	Producer Throughput (msgs/sec)	Producer Throughput (MB/s)	Consumer Throughput (msgs/sec)	Consumer Throughput (MB/s)	Latency (ms)
Kafka	21309.3	85.2370	98684.8	394.7391	18.81
STAN	7347.7	29.3907	31364.3	125.4574	133.51
RabbitMQ	1224.9	4.8994	7154.3	28.6172	6595.65
Kafka_{16 part}	26886.1	107.5442	118766.1	475.0645	28.23

Figure 8: Mean latencies of the message brokers in Scenario 7.

Evaluation analyzing replication and consensus costs

Scenario 8 evaluated the performance of the systems in a five broker cluster, and can be compared to its corresponding test shown above in Scenario 7 where three brokers were used instead. The replication and consensus overhead does not have significant performance impacts on STAN and RabbitMQ systems in Scenario 8. However, the mean latency is increased by approximately 40 % for the single Kafka system, while the strength of using multi-partitioned Kafka systems in multi broker clusters is evident in Test Scenario 8 where the mean latency is actually decreased.

Table 8: Performance results for message queues in Scenario 8.

Message System	Producer Throughput (msgs/sec)	Producer Throughput (MB/s)	Consumer Throughput (msgs/sec)	Consumer Throughput (MB/s)	Latency (ms)
Kafka	19214.6	76.8585	72059.3	288.2373	26.48
STAN	8868.5	35.4739	28255.8	113.0233	145.15
RabbitMQ	1230.0	4.9200	7042.1	28.1682	6670.19
Kafka_{16 part}	30965.7	123.8627	101092.3	404.3692	27.78

Figure 9: Mean latencies of the message brokers in Scenario 8.

Overhead

The STAN system has the lowest resource utilization in idle state for all created clusters and is recognized as the most lightweight system followed by RabbitMQ. The Kafka brokers have quite high RAM utilization in idle state for all created clusters, and are seen as the most heavyweight system.

Figure 10: CPU and RAM utilization of the systems in idle state for three broker clusters.

The CPU and RAM utilization were also monitored for the systems’ conducted evaluations, and in terms of both CPU and RAM utilization, the STAN system is in general the most lightweight system for all 36 conducted tests for the systems. However, there exists particular evaluations that resulted in quite high RAM usage for the STAN system, but with regards to all conducted tests, the STAN system had lowest resource usage followed by RabbitMQ.

Figure 11: CPU and RAM utilization of the systems’ evaluations for three broker clusters.

Conclusions

Overall, all the thesis goals have been achieved successfully. The Kafka system outperforms the STAN and RabbitMQ systems in terms of both performance and scalability, while Kafka has the highest overhead. The STAN system was evaluated to be the most lightweight system in the tests, and is well optimised to operate within cloud native environments. Increasing the levels of parallelism for the RabbitMQ system has great performance drawbacks on the system, while the performance of the Kafka and STAN systems are benefited from increasing the number of producers and consumers. Replication implies in huge performance drop in Kafka systems using a single producer and consumer. However, this drop is eliminated by using a multi-partitioned Kafka system where the clients are enabled to simultaneously process messages or simply increasing the number of producers and consumers.

All message queues were benefited from using larger message sizes, where 4000 B messages generated in performance for the systems in general. Kafka were evaluated as a scalable, high throughput, and low latency system that is well adaptable to operate within IoT-domains or in similar systems where such system requirements are needed. However, if the message queue is intended to operate in a low loaded system where reliability is important, then RabbitMQ should be chosen.

Key takeaways

Kafka is the most heavyweight system that outperforms STAN and RabbitMQ
STAN is the most lightweight system, having lowest resource utilization in clusters in idle state and during run
Higher levels of parallelism entails huge performance drawbacks for RabbitMQ, while STAN and Kafka systems are benefiting from increasing the number of producers and consumers in the systems
Replication has significant performance deterioration för multi broker Kafka systems using a single producer and consumer
- Replications costs are however eliminated using multi-partitioned Kafka system or by an increase of producers and consumers
The evaluations revealed that all systems are benefited from using larger message sizes, where RabbitMQ made significant improvements of using 4000 B instead of 500 B messages
Kafka system is well optimized to operate within the IoT-domain, while, in contrast, the RabbitMQ system is more suitable for low loaded systems that values reliability

Blog, Master Thesis, message queues