Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds
It is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
CORE CAPABILITIES
- HIGH THROUGHPUT – Deliver messages at network limited throughput using a cluster of machines with latencies as low as 2ms.
- SCALABLE – Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, hundreds of thousands of partitions. Elastically expand and contract storage and processing.
- PERMANENT STORAGE – Store streams of data safely in a distributed, durable, fault-tolerant cluster.
- HIGH AVAILABILITY – Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions.
Apache Kafka offers these four main interfaces (APIs – Application Programming Interfaces). Know more about each API at the official documentation page:
- Admin API
- Producer API
- Consumer API
- Streams API
- Connect API
In this guide we will learn how to install and test Apache Kafka installation on a Rocky Linux server, but this guide can also be used by any RHEL 8 based distribution.
Checkout this also:
Step 1: Ensure that the system is up to date
Update the system to ensure that all the packages are up to date
sudo dnf -y update
Step 2: Install Java
Apache Kafka needs Java to run, hence first we need to install that on our local environment and it must be equal or greater than Java 8. Well, we don’t need to add any third repository because the package to get JAVA is already there on the system base repo.
Let us install latest Java with:
sudo dnf install java-11-openjdk
Type y
and press enter
when prompted to accept the installation.
Step 3: Get the latest Kafka
Apache Kafka is available as a tarball file on the official website. Head over the Kafka Downloads page here to find out about the latest version. As of the writting of this command, version 3.1.0 is the latest release. The current stable version is 3.1.0.
We are going to download kafka to the /opt
directory. Use this command to switch directory and download the latest kafka
sudo -i
cd /opt
curl -LO https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz
Next, let us extract the content and ensure that the content is available in the /opt/kafka
directory.
tar -xzvf kafka_2.13-3.1.0.tgz
rm -rf kafka_*.tgz
mv kafka* kafka
Step 3: Starting Kafka and Zookeper
When testing, we can can run both Zookeeper and Kafka service script directly, manually.
Run the following commands in order to start all services in the correct order:
First, start the ZooKeeper service. Do this in the directory with the extracted kafka:
bin/zookeeper-server-start.sh config/zookeeper.properties
Open another terminal session and run this command to start the Kafka broker:
bin/kafka-server-start.sh config/server.properties
Once all services have successfully launched, you will have a basic Kafka environment running and ready to use.
Step 4: Create systemd services for Zookeeper and Kafka
When running Kafka Service in a production server we have to run it in the background. Hence, create systemd units for both the scripts.
Create systemd file Zookeeper
We need to start zookeeper first. Use this command to open the zookeeper systemd unit file in your favourite text editor. I am using vim
sudo vim /etc/systemd/system/zookeeper.service
Add this content to the file
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Finally, save and close the file
Create systemd file Kafka
Next, let us create a systemd file for kafka broker. Open the service file with this command:
sudo vim /etc/systemd/system/kafka.service
Add the following content to the file. Note: Change the Java_Home, in case you are using some other version. To find it you can use the command – sudo find /usr/ -name *jdk
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/jre-11-openjdk"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
Save the file and exit.
Finally reload systemd units with this command for the newly added units to reflect
sudo systemctl daemon-reload
Step 5: Start and enable the Zookeeper and Kafka Services
Now, let’s start and enable both server services to make sure they will also get active even after the system reboot.
sudo systemctl start zookeeper sudo systemctl start kafka
Confirm the services status to ensure that they are both running as expected:
$ sudo systemctl status zookeeper
● zookeeper.service - Apache Zookeeper server
Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2022-04-14 20:36:50 UTC; 39s ago
Docs: http://zookeeper.apache.org
Main PID: 62903 (java)
Tasks: 28 (limit: 23167)
Memory: 52.6M
CGroup: /system.slice/zookeeper.service
└─62903 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headle>
Apr 14 20:36:52 rockysrv.citizix.com zookeeper-server-start.sh[62903]: [2022-04-14 20:36:52,271] INFO zookeeper.commitLogCount=500 (org.apache.zookeeper.server.ZKDatabase)
And for kafka
$ sudo systemctl status kafka
● kafka.service - Apache Kafka Server
Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2022-04-14 20:46:21 UTC; 10s ago
Docs: http://kafka.apache.org/documentation.html
Main PID: 65804 (java)
Tasks: 69 (limit: 23167)
Memory: 330.5M
CGroup: /system.slice/kafka.service
└─65804 /usr/lib/jvm/jre-11-openjdk/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent>
Apr 14 20:46:24 rockysrv.citizix.com kafka-server-start.sh[65804]: [2022-04-14 20:46:24,710] INFO [/config/changes-event-process-thread]: Starting (k>
Apr 14 20:46:24 rockysrv.citizix.com kafka-server-start.sh[65804]: [2022-04-14 20:46:24,715] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Sta>
Apr 14 20:46:24 rockysrv.citizix.com kafka-server-start.sh[65804]: [2022-04-14 20:46:24,723] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Sta>
Step 6: Create Test Topics on Kafka
Kafka allows us to read, write, store, and process events across the various machines, however, to store these events we need someplace or folder and that called “Topics“. So on your server terminal create at least one topic using the following command, using the same later you can create as many Topics as you want.
Let’s say our first Topic name is – loginevents. So to create the same run:
Go to your Kafka directory:
cd /opt/kafka/
And use the Topics script:
./bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic loginevents
If it is successful, you should see this output: Created topic loginevents.
After creating as many topics as you want, you can list with this command:
./bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Step 7: Test publishing and Consuming from a topic
Finally, we can test publishing stuff to the topic and also consuming them. Kafka offers two APIs- Producer and Consumer, for both it offers a command-line client. The producer is responsible for creating events and the Consumer uses them to display or reads the data generated by the Producer.
Open Two terminal tabs or sessions to understand the event generator and reader setup in real-time.
On one first terminal, we will have the consumer running so we can see what is being published to the topic. Run this command to check the commands generated in real time:
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic loginevents --from-beginning
On the second terminal, we will test producing an event then watch the first terminal. Use this command:
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic loginevents
The command will present a prompt where you can type any text, this will be consumed and displayed in the first window.
Conclusion
In this guide we learnt how to install and test Apache Kafka on a Rocky linux server. You can now connect to it through any broker or any programming language of your choice.