How to install Apache Kafka on Rocky Linux or AlmaLinux8

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds

It is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

CORE CAPABILITIES

HIGH THROUGHPUT – Deliver messages at network limited throughput using a cluster of machines with latencies as low as 2ms.
SCALABLE – Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, hundreds of thousands of partitions. Elastically expand and contract storage and processing.
PERMANENT STORAGE – Store streams of data safely in a distributed, durable, fault-tolerant cluster.
HIGH AVAILABILITY – Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions.

Apache Kafka offers these four main interfaces (APIs – Application Programming Interfaces). Know more about each API at the official documentation page:

Admin API
Producer API
Consumer API
Streams API
Connect API

In this guide we will learn how to install and test Apache Kafka installation on a Rocky Linux server, but this guide can also be used by any RHEL 8 based distribution.

Checkout this also:

How to install and set up Kafdrop – Kafka Web UI

Step 1: Ensure that the system is up to date

Update the system to ensure that all the packages are up to date

sudo dnf -y update

Step 2: Install Java

Apache Kafka needs Java to run, hence first we need to install that on our local environment and it must be equal or greater than Java 8. Well, we don’t need to add any third repository because the package to get JAVA is already there on the system base repo.

Let us install latest Java with:

sudo dnf install java-11-openjdk

Type y and press enter when prompted to accept the installation.

Step 3: Get the latest Kafka

Apache Kafka is available as a tarball file on the official website. Head over the Kafka Downloads page here to find out about the latest version. As of the writting of this command, version 3.1.0 is the latest release. The current stable version is 3.1.0.

We are going to download kafka to the /opt directory. Use this command to switch directory and download the latest kafka

sudo -i
cd /opt
curl -LO https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz

Next, let us extract the content and ensure that the content is available in the /opt/kafka directory.

tar -xzvf kafka_2.13-3.1.0.tgz
rm -rf kafka_*.tgz
mv kafka* kafka

Step 3: Starting Kafka and Zookeper

When testing, we can can run both Zookeeper and Kafka service script directly, manually.

Run the following commands in order to start all services in the correct order:

First, start the ZooKeeper service. Do this in the directory with the extracted kafka:

bin/zookeeper-server-start.sh config/zookeeper.properties

Open another terminal session and run this command to start the Kafka broker:

bin/kafka-server-start.sh config/server.properties

Once all services have successfully launched, you will have a basic Kafka environment running and ready to use.

Step 4: Create systemd services for Zookeeper and Kafka

When running Kafka Service in a production server we have to run it in the background. Hence, create systemd units for both the scripts.

Create systemd file Zookeeper

We need to start zookeeper first. Use this command to open the zookeeper systemd unit file in your favourite text editor. I am using vim

sudo vim /etc/systemd/system/zookeeper.service

Add this content to the file

[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Finally, save and close the file

Create systemd file Kafka

Next, let us create a systemd file for kafka broker. Open the service file with this command:

sudo vim /etc/systemd/system/kafka.service

Add the following content to the file. Note: Change the Java_Home, in case you are using some other version. To find it you can use the command – sudo find /usr/ -name *jdk

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/jre-11-openjdk"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target

Save the file and exit.

Finally reload systemd units with this command for the newly added units to reflect

sudo systemctl daemon-reload

Step 5: Start and enable the Zookeeper and Kafka Services

Now, let’s start and enable both server services to make sure they will also get active even after the system reboot.

sudo systemctl start zookeeper
sudo systemctl start kafka

Confirm the services status to ensure that they are both running as expected:

$ sudo systemctl status zookeeper
● zookeeper.service - Apache Zookeeper server
   Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-04-14 20:36:50 UTC; 39s ago
     Docs: http://zookeeper.apache.org
 Main PID: 62903 (java)
    Tasks: 28 (limit: 23167)
   Memory: 52.6M
   CGroup: /system.slice/zookeeper.service
           └─62903 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headle>

Apr 14 20:36:52 rockysrv.citizix.com zookeeper-server-start.sh[62903]: [2022-04-14 20:36:52,271] INFO zookeeper.commitLogCount=500 (org.apache.zookeeper.server.ZKDatabase)

And for kafka

$ sudo systemctl status kafka
● kafka.service - Apache Kafka Server
   Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-04-14 20:46:21 UTC; 10s ago
     Docs: http://kafka.apache.org/documentation.html
 Main PID: 65804 (java)
    Tasks: 69 (limit: 23167)
   Memory: 330.5M
   CGroup: /system.slice/kafka.service
           └─65804 /usr/lib/jvm/jre-11-openjdk/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent>

Apr 14 20:46:24 rockysrv.citizix.com kafka-server-start.sh[65804]: [2022-04-14 20:46:24,710] INFO [/config/changes-event-process-thread]: Starting (k>
Apr 14 20:46:24 rockysrv.citizix.com kafka-server-start.sh[65804]: [2022-04-14 20:46:24,715] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Sta>
Apr 14 20:46:24 rockysrv.citizix.com kafka-server-start.sh[65804]: [2022-04-14 20:46:24,723] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Sta>

Step 6: Create Test Topics on Kafka

Kafka allows us to read, write, store, and process events across the various machines, however, to store these events we need someplace or folder and that called “Topics“. So on your server terminal create at least one topic using the following command, using the same later you can create as many Topics as you want.

Let’s say our first Topic name is – loginevents. So to create the same run:

Go to your Kafka directory:

cd /opt/kafka/

And use the Topics script:

./bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic loginevents

If it is successful, you should see this output: Created topic loginevents.

After creating as many topics as you want, you can list with this command:

./bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Step 7: Test publishing and Consuming from a topic

Finally, we can test publishing stuff to the topic and also consuming them. Kafka offers two APIs- Producer and Consumer, for both it offers a command-line client. The producer is responsible for creating events and the Consumer uses them to display or reads the data generated by the Producer.

Open Two terminal tabs or sessions to understand the event generator and reader setup in real-time.

On one first terminal, we will have the consumer running so we can see what is being published to the topic. Run this command to check the commands generated in real time:

./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic loginevents --from-beginning

On the second terminal, we will test producing an event then watch the first terminal. Use this command:

./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic loginevents

The command will present a prompt where you can type any text, this will be consumed and displayed in the first window.

Conclusion

In this guide we learnt how to install and test Apache Kafka on a Rocky linux server. You can now connect to it through any broker or any programming language of your choice.