Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds
It is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
In this guide we will learn how to quickly get started with Apache Kafka using docker and docker-compose. You’ll connect to a broker, create a topic, produce some messages, and consume them.
Terminology
Let us take a look at some high level definitions of some concepts in this guide:
Kafka — Basically an event streaming platform. It enables users to collect, store, and process data to build real-time event-driven applications. It’s written in Java and Scala, but you don’t have to know these to work with Kafka. There are client libraries for most common programming languages.
Kafka broker — A single Kafka Cluster is made of Brokers. They handle producers and consumers and keeps data replicated in the cluster.
Kafka topic — A category to which records are published. Imagine you had a large news site — each news category could be a single Kafka topic.
Kafka producer — An application (a piece of code) you write to get data to Kafka.
Kafka consumer — A program you write to get data out of Kafka. Sometimes a consumer is also a producer, as it puts data elsewhere in Kafka.
Zookeeper — Used to manage a Kafka cluster, track node status, and maintain a list of topics and messages. Kafka version 2.8.0 introduced early access to a Kafka version without Zookeeper, but it’s not ready yet for production environments.
Docker — An open-source platform for building, deploying, and managing containers. It allows you to package your applications into containers, which simplifies application distribution. That way, you know if the application works on your machine, it will work on any machine you deploy it to.
Checkout these related guides:
- How to install Apache Kafka on Rocky/Alma Linux 9
- How to install Apache Kafka on Ubuntu 22.04
- How to install and set up Kafdrop – Kafka Web UI
- How to run Kafdrop the Kafka Web UI in Docker and Docker compose
Ensure docker is installed and up and running
Before proceeding, please ensure that you have docker up and running in your system. Checkout these guides on how to install docker:
- How to install and configure docker in Rocky Linux/Alma Linux 9
- How to Install and Use Docker in Ubuntu 22.04
Also install the latest copy of docker-compose for your OS from the releases page here .
Creating a docker compose file
We need a kafka broker and zookeeper. These can be defined as a dependency in a docker-compose file that will allow us to manage the containers as a unit. This docker-compose defines a setup for a single node kafka broker which meets most of the local development needs.
Create a docker-compose.yaml and add the following:
version: '3'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
ports:
- 22181:2181
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:latest
ports:
- 9092:9092
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka:29092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
In the set up above, the zookeeper servers are listening on port 2181 for the kafka service which is defined in the same container set up
Start the Kafka broker
From a directory containing the docker-compose.yml
 file created in the previous step, run this command to start all services in the correct order.
docker-compose up -d
The above command will first check if the images defined are present locally, if not it will download them. Then the containers are started in daemon mode.
If all goes well the services will be started:
docker-compose up -d
[+] Running 2/2
⠿ Container kafka-zookeeper-1 Started 1.7s
⠿ Container broker Started 0.6s
Confirm that they are running:
$ docker-compose ps
NAME COMMAND SERVICE STATUS PORTS
broker "/etc/confluent/dock…" kafka running 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp
kafka-zookeeper-1 "/etc/confluent/dock…" zookeeper running 2888/tcp, 3888/tcp, 0.0.0.0:22181->2181/tcp, :::22181->2181/tcp
Create kafka topic
Kafka stores messages in topics. It’s good practice to explicitly create them before using them, even if Kafka is configured to automagically create them when referenced.
Run this command to exec into the container and create a new topic into which we’ll write and read some test messages.
$ docker exec broker \
kafka-topics --bootstrap-server broker:9092 \
--create \
--topic citizix.one
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic citizix.one.
Write messages to the topic
You can use the kafka-console-producer command line tool to write messages to a topic. This is useful for experimentation (and troubleshooting), but in practice you’ll use the Producer API in your application code, or Kafka Connect for pulling data in from other systems to Kafka.
Run this command. You’ll notice that nothing seems to happen—fear not! It is waiting for your input.
docker exec --interactive --tty broker \
kafka-console-producer --bootstrap-server broker:9092 \
--topic citizix.one
Type in some lines of text. Each line is a new message.
>this is the initial message
>and another one
>and the third
When you’ve finished, press Ctrl-D to return to your command prompt.
Read messages from the topic
Now that we’ve written message to the topic, we’ll read those messages back. Run this command to launch the kafka-console-consumer. The --from-beginning
 argument means that messages will be read from the start of the topic.
docker exec --interactive --tty broker \
kafka-console-consumer --bootstrap-server broker:9092 \
--topic citizix.one \
--from-beginning
As before, this is useful for trialling things on the command line, but in practice you’ll use the Consumer API in your application code, or Kafka Connect for reading data from Kafka to push to other systems.
You’ll see the messages that you entered in the previous step.
this is the initial message
and another one
and the third
To write more messages to the topic, you can leave the consumer command running then in a new terminal window, run the kafka-console-producer again.
Enter some more messages and note how they are displayed almost instantaneously in the consumer terminal.
Press Ctrl-D to exit the producer, and Ctrl-C to stop the consumer.
Multi-node Kafka set up
For more stable environments, we’ll need a resilient setup. We can define a multi node set up by creating more than one containers as shown in the following docker-compose.yaml file:
version: '3'
services:
zookeeper-1:
image: confluentinc/cp-zookeeper:latest
ports:
- 22181:2181
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
zookeeper-2:
image: confluentinc/cp-zookeeper:latest
ports:
- 22181:2181
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka-1:
image: confluentinc/cp-kafka:latest
ports:
- 9092:9092
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper-1:2181,zookeeper-2:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka-1:29092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
kafka-2:
image: confluentinc/cp-kafka:latest
ports:
- 9092:9092
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 2
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper-1:2181,zookeeper-2:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka-2:29092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
We must ensure that the service names and KAFKA_BROKER_ID are unique across the services.
Moreover, each service must expose a unique port to the host machine. Although zookeeper-1 and zookeeper-2 are listening on port 2181, they’re exposing it to the host via ports 22181 and 32181, respectively. The same logic applies for the kafka-1 and kafka-2 services, where they’ll be listening on ports 29092 and 39092, respectively.
We can start the containers the same way as described above.
Stopping and removing the Kafka broker
Once you’ve finished, you can shut down the Kafka broker. Note that doing this will destroy all messages in the topics that you’ve written.
From a directory containing the docker-compose.yml
 file created earlier, run this command to stop all services in the correct order.
docker-compose down
If you no longer need the containers you can remove them:
docker-compose rm
Conclusion
In this guide, we used docker and docker compose to spin up a one node and multi node Kafka with zookeeper set up.