Evaluation of Cloud Native Message Queues

Evaluation of Cloud Native Message Queues

In recent years, there has been a huge growth and emergence of mainly Internet of Things (IoT) devices which has brought many new challenges for the IT-industry such as processing data between devices in real-time. Therefore, the demand for a scalable, high throughput, and low latency system has become very high.

It remains quite unclear how stateful cloud native applications, such as message queue systems, perform in Kubernetes clusters. The Kafka, NATS Streaming (STAN), and RabbitMQ systems are evaluated in Kubernetes based on metrics such as performance, scalability, and overhead in this thesis.

The primarily goals of this thesis have been to evaluate the message queues in Kubernetes based on the following criteria:

  • Functional properties and capabilities: What is the message queue capable to do?
  • Performance: How is the throughput and latency for the message queue?
  • Scalability: How well does the message queue scale in Kubernetes?
  • Overhead: Is the message queue lightweight or heavyweight?
  • Ease of use: Is it easy or complicated to use the message queue?
  • Popularity and vitality: Well-supported software with good documentation?
  • Reliability: How robust and fault-tolerant is the message system?

Business Values

The significant rise in internet-connected devices will consequently have a substantial influence on systems’ network traffic, and current point-to-point technologies using synchronous communication between end-points in IoT-systems are not any longer a sustainable solution. Message queue architectures using the publish-subscribe paradigm are widely implemented in event-based systems. This paradigm uses asynchronous communication between entities and conforms to scalable, high throughput, and low latency systems that are well adapted within the IoT-domain.

This thesis evaluates the adaptability of three popular message queue systems in Kubernetes. The systems are designed differently, where e.g. the Kafka system is using a peer-to-peer architecture while STAN and RabbitMQ use a master-slave architecture by applying the Raft consensus algorithm. A thorough analysis of the systems’ capabilities in terms of scalability, performance, and overhead are presented. The conducted tests give further knowledge on how the performance of the Kafka system is affected in multi-broker clusters using multiple number of partitions, enabling higher levels of parallelism for the system. Furthermore, it has proven to be a difficult task for systems to choose the correct message broker that fits their system requirements, however, this thesis outlines the main characteristics of the systems and eases this process.

Maybe use this system comparison figure that visualizes the characteristics of the systems?

Experimental Design and Setup

Testbeds and an evaluation tool have been created in order to evaluate the systems and achieve the goals of the thesis. Smaller tests have been conducted for identifying relevant message queue settings to use in the evaluation plan. The general message queue evaluation plan is shown below. In order to evaluate such tests, testbeds consisting of 1, 3, and 5 message brokers are created. Each test is ran three times on Kubernetes clusters using e2-highcpu-8 machine types, each consisting of 8 vCPUs and 8 GB memory, and e2-highcpu-16 machine types consisting of 16 vCPUs and 16 GB memory. The evaluations are conducted from Masih which is a concurrent evaluation tool that provides automated orchestration for benchmarking message queues.

Input:

Parameters:

broker setup = {1,3,5}, 
machine type = {e2-highcpu-8, e2-highcpu-16}, 
message queue = {Kafka, STAN, RabbitMQ}, 
producer/consumer mode = {1 prod/cons, 3 prod/cons, 5 prod/cons},
message size = {500 B, 1000 B, 4000 B (initially 100 B)}

Output: Metrics for evaluating the systems

foreach broker setup b do
  foreach machine type m do
    foreach message queue q do
      foreach producer/consumer mode s do
        foreach message size m_s do
          for i <- 1,3 do
            Evaluate broker q with Masih tool publishing messages of m_s size
          end
         end
       end
     end
   end
end


To evaluate the performance of the message queue systems deployed in Kubernetes, different testbeds were created. The testbeds are deployed in Kubernetes clusters on Google Kubernetes Engine (GKE) consisting of various number of nodes of different machine types. Terraform provisioning tool is used for simplifying the building process of the message queue clusters. Each message broker system provides an automated script for configuring the Kubernetes cluster in the Message Queue Provisioning repository. The general design of the installation procedure for the systems is shown below.

The message queue systems are deployed together with the Prometheus monitoring tool for collecting broker and node specific metrics in the cluster. The architectures of the message queue clusters using three brokers are shown below.

Results

The conducted evaluations were done from a client using e2-highcpu-16 machine type. There were totally 36 conducted evaluations out of 54 possible tests (following the evaluation plan). The producers and consumers of the tests exchanged messages of totally 7 GB data in the conducted evaluations. The analysis was mainly performed for a single partition Kafka, STAN, and RabbitMQ systems, but the impacts of utilizing multi-partitioned Kafka systems are described as well.

Performance & Scalability

Evaluation of optimal message size for brokers

In Scenario 1, the systems were evaluated using a single producer and consumer with a workload of 14 000 000 messages each message being 500 B in a single broker cluster using e2-highcpu-8 machine type. By comparing the obtained results with its related test evaluated in Scenario 2, where the systems were evaluated using 1 750 000 messages, each message being 4000 B, a huge performance impact can be seen. All systems are benefited from using 4000 B messages, where RabbitMQ makes significant improvements by reducing its mean latency by nearly a factor of 5. In general, the Kafka system outperforms in the tests.

Table 1: Performance results for message queues in Scenario 1.

Message System Producer Throughput (msgs/sec) Producer Throughput (MB/s) Consumer Throughput (msgs/sec) Consumer Throughput (MB/s) Latency

(ms)

Kafka 128157.9 64.0789 128137.5 64.0688 20.19
STAN 32125.4 16.0627 32122.6 16.0613 127.37
RabbitMQ 13853.7 6.9268 13849.5 6.9247 299.79
Kafka16 part 127065.1 63.5325 127035.9 63.5179 22.03

Table 2: Performance results for message queues in Scenario 2.

Message System Producer Throughput (msgs/sec) Producer Throughput (MB/s) Consumer Throughput (msgs/sec) Consumer Throughput (MB/s) Latency

(ms)

Kafka 17107.4 68.4295 17097.0 68.3880 19.93
STAN 10108.8 40.4353 10107.2 40.4288 51.40
RabbitMQ 7821.2 31.2847 7819.4 31.2777 67.01
Kafka16 part 16700.9 66.8035 16697.5 66.7900 23.27

Figure 1: Mean latencies of the message brokers in Scenario 1.

Figure 2: Mean latencies of the message brokers in Scenario 2.

Evaluation of optimal levels of parallelism for brokers

In Scenario 3 the number of producers and consumers are increased to five compared to Scenario 1 shown previously. The goal of this test was to evaluate how increased levels of parallelisms affect the message brokers. By comparing the two tests, it is obvious that the single partition Kafka system and STAN systems’ performance are improved by increasing the parallelism of the system. However, a big performance drop can be seen for RabbitMQ in Scenario 3, indicating the system is not scalable in terms of number of producers and consumers which is mainly related to the system’s architectural style (each consumer has an own queue). For the multi-partitioned Kafka system the performance was decreased significantly, and this is due to partition overhead taking place at the broker.

Table 3: Performance results for message queues in Scenario 3.

Message System Producer Throughput (msgs/sec) Producer Throughput (MB/s) Consumer Throughput (msgs/sec) Consumer Throughput (MB/s) Latency

(ms)

Kafka 132631.9 66.3159 710031.6 355.0158 15.52
STAN 58570.7 29.2853 162601.3 81.3007 122.43
RabbitMQ 4446.0 2.2230 32146.2 16.0731 653.59
Kafka16 part 134151.8 67.0759 770033.7 385.0168 35.16

Figure 3: Mean latencies of the message brokers in Scenario 3.

Evaluation of brokers’ horizontal scalability

In Scenario 4 the horizontal scalability of the systems were evaluated, where replication costs and consensus overhead are analyzed. This test is compared to its related evaluations in Scenario 1, where the only difference is that Scenario 4 evaluated the systems for a three broker cluster. Replication has great impact on the single partition Kafka system when using lower levels of parallelism for the system, however, when increasing the number of partitions or number of producers and consumers the performance drops are eliminated. The replication costs and consensus overheads for STAN and RabbitMQ does not have any significant impact on the performance in this test.

Table 4: Performance results for message queues in Scenario 4.

Message System Producer Throughput (msgs/sec) Producer Throughput (MB/s) Consumer Throughput (msgs/sec) Consumer Throughput (MB/s) Latency

(ms)

Kafka 61363.6 30.6818 61350.1 30.6751 35.73
STAN 34358.4 17.1792 34349.7 17.1748 126.56
RabbitMQ 12237.3 6.1187 12233.8 6.1169 342.10
Kafka3 part 100900.3 50.4501 100887.4 50.4437 23.67
Kafka16 part 12338.3 61.6692 123319.3 61.6596 20.39

Figure 4: Mean latencies of the message brokers in Scenario 4.

Evaluation of brokers’ vertical scalability

In Scenario 5 the systems were evaluated in a three broker cluster using e2-highcpu-8 instances with three producers and consumers utilizing 1000 B messages. This test was compared to Scenario 6, where e2-highcpu-16 brokers were used instead. Comparing these two tests revealed that the Kafka system is vertically scalable while using more powerful machine types for the STAN and RabbitMQ systems does not have any apparent effect.

Table 5: Performance results for message queues in Scenario 5.

Message System Producer Throughput (msgs/sec) Producer Throughput (MB/s) Consumer Throughput (msgs/sec) Consumer Throughput (MB/s) Latency

(ms)

Kafka 62752.3 62.7523 161672.9 161.6729 25.63
STAN 20385.1 20.3851 78005.6 78.0056 91.47
RabbitMQ 5955.3 5.9553 20960.7 20.9607 317.25
Kafka3 part 51195.0 51.1950 262338.5 262.3385 15.08
Kafka16 part 67137.5 67.1375 231371.7 231.3717 17.10

Table 6: Performance results for message queues in Scenario 6.

Message System Producer Throughput (msgs/sec) Producer Throughput (MB/s) Consumer Throughput (msgs/sec) Consumer Throughput (MB/s) Latency

(ms)

Kafka 75065.8 75.0658 200711.7 200.7117 20.40
STAN 22459.1 22.4591 95949.0 95.9490 72.91
RabbitMQ 6021.8 6.0218 21267.1 21.2671 280.24
Kafka3 part 63140.5 63.1405 331587.2 331.5872 11.79
Kafka16 part 104642.1 104.6421 299147.5 299.1475 14.09

Figure 5: Mean latencies of the message brokers in Scenario 5.

Figure 6: Mean latencies of the message brokers in Scenario 6.

Evaluation of best and worst performance in three broker clusters

Scenario 7 evaluated the performance of the systems using five producers and consumers with 4000 B messages in a three broker cluster of e2-highcpu-16 machine types. In this test, the best and worst performance of the three broker cluster evaluations were obtained. In terms of both throughput and latency, the RabbitMQ system resulted in worst performance with a mean latency of nearly 7 seconds, while the Kafka system generated best performance with a mean latency of 18.81 ms.

Figure 8: Illustrates Scenario 7’s producer throughput of the Kafka system.

Table 7: Performance results for message queues in Scenario 7.

Message System Producer Throughput (msgs/sec) Producer Throughput (MB/s) Consumer Throughput (msgs/sec) Consumer Throughput (MB/s) Latency

(ms)

Kafka 21309.3 85.2370 98684.8 394.7391 18.81
STAN 7347.7 29.3907 31364.3 125.4574 133.51
RabbitMQ 1224.9 4.8994 7154.3 28.6172 6595.65
Kafka16 part 26886.1 107.5442 118766.1 475.0645 28.23

Figure 8: Mean latencies of the message brokers in Scenario 7.

Evaluation analyzing replication and consensus costs

Scenario 8 evaluated the performance of the systems in a five broker cluster, and can be compared to its corresponding test shown above in Scenario 7 where three brokers were used instead. The replication and consensus overhead does not have significant performance impacts on STAN and RabbitMQ systems in Scenario 8. However, the mean latency is increased by approximately 40 % for the single Kafka system, while the strength of using multi-partitioned Kafka systems in multi broker clusters is evident in Test Scenario 8 where the mean latency is actually decreased.

Table 8: Performance results for message queues in Scenario 8.

Message System Producer Throughput (msgs/sec) Producer Throughput (MB/s) Consumer Throughput (msgs/sec) Consumer Throughput (MB/s) Latency

(ms)

Kafka 19214.6 76.8585 72059.3 288.2373 26.48
STAN 8868.5 35.4739 28255.8 113.0233 145.15
RabbitMQ 1230.0 4.9200 7042.1 28.1682 6670.19
Kafka16 part 30965.7 123.8627 101092.3 404.3692 27.78

Figure 9: Mean latencies of the message brokers in Scenario 8.

Overhead

The STAN system has the lowest resource utilization in idle state for all created clusters and is recognized as the most lightweight system followed by RabbitMQ. The Kafka brokers have quite high RAM utilization in idle state for all created clusters, and are seen as the most heavyweight system.

Figure 10: CPU and RAM utilization of the systems in idle state for three broker clusters.

The CPU and RAM utilization were also monitored for the systems’ conducted evaluations, and in terms of both CPU and RAM utilization, the STAN system is in general the most lightweight system for all 36 conducted tests for the systems. However, there exists particular evaluations that resulted in quite high RAM usage for the STAN system, but with regards to all conducted tests, the STAN system had lowest resource usage followed by RabbitMQ.

Figure 11: CPU and RAM utilization of the systems’ evaluations for three broker clusters.

Conclusions

Overall, all the thesis goals have been achieved successfully. The Kafka system outperforms the STAN and RabbitMQ systems in terms of both performance and scalability, while Kafka has the highest overhead. The STAN system was evaluated to be the most lightweight system in the tests, and is well optimised to operate within cloud native environments. Increasing the levels of parallelism for the RabbitMQ system has great performance drawbacks on the system, while the performance of the Kafka and STAN systems are benefited from increasing the number of producers and consumers. Replication implies in huge performance drop in Kafka systems using a single producer and consumer. However, this drop is eliminated by using a multi-partitioned Kafka system where the clients are enabled to simultaneously process messages or simply increasing the number of producers and consumers.

All message queues were benefited from using larger message sizes, where 4000 B messages generated in performance for the systems in general. Kafka were evaluated as a scalable, high throughput, and low latency system that is well adaptable to operate within IoT-domains or in similar systems where such system requirements are needed. However, if the message queue is intended to operate in a low loaded system where reliability is important, then RabbitMQ should be chosen.

Key takeaways

  • Kafka is the most heavyweight system that outperforms STAN and RabbitMQ
  • STAN is the most lightweight system, having lowest resource utilization in clusters in idle state and during run
  • Higher levels of parallelism entails huge performance drawbacks for RabbitMQ, while STAN and Kafka systems are benefiting from increasing the number of producers and consumers in the systems
  • Replication has significant performance deterioration för multi broker Kafka systems using a single producer and consumer
    • Replications costs are however eliminated using multi-partitioned Kafka system or by an increase of producers and consumers
  • The evaluations revealed that all systems are benefited from using larger message sizes, where RabbitMQ made significant improvements of using 4000 B instead of 500 B messages
  • Kafka system is well optimized to operate within the IoT-domain, while, in contrast, the RabbitMQ system is more suitable for low loaded systems that values reliability

Kian Nassiri

Kian Nassiri is a newly graduated Software Engineer with distributed systems and AI as his main fields of study. He has a Master's degree in computer science, and his interests are playing football, travelling, and hanging out with friends.