How to run Apache Kafka in Docker and Docker Compose

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds

It is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

In this guide we will learn how to quickly get started with Apache Kafka using docker and docker-compose. You’ll connect to a broker, create a topic, produce some messages, and consume them.

Terminology

Let us take a look at some high level definitions of some concepts in this guide:

Kafka — Basically an event streaming platform. It enables users to collect, store, and process data to build real-time event-driven applications. It’s written in Java and Scala, but you don’t have to know these to work with Kafka. There are client libraries for most common programming languages.

Kafka broker — A single Kafka Cluster is made of Brokers. They handle producers and consumers and keeps data replicated in the cluster.

Kafka topic — A category to which records are published. Imagine you had a large news site — each news category could be a single Kafka topic.

Kafka producer — An application (a piece of code) you write to get data to Kafka.

Kafka consumer — A program you write to get data out of Kafka. Sometimes a consumer is also a producer, as it puts data elsewhere in Kafka.

Zookeeper — Used to manage a Kafka cluster, track node status, and maintain a list of topics and messages. Kafka version 2.8.0 introduced early access to a Kafka version without Zookeeper, but it’s not ready yet for production environments.

Docker — An open-source platform for building, deploying, and managing containers. It allows you to package your applications into containers, which simplifies application distribution. That way, you know if the application works on your machine, it will work on any machine you deploy it to.

Checkout these related guides:

Ensure docker is installed and up and running

Before proceeding, please ensure that you have docker up and running in your system. Checkout these guides on how to install docker:

Also install the latest copy of docker-compose for your OS from the releases page here .

Creating a docker compose file

We need a kafka broker and zookeeper. These can be defined as a dependency in a docker-compose file that will allow us to manage the containers as a unit. This docker-compose defines a setup for a single node kafka broker which meets most of the local development needs.

Create a docker-compose.yaml and add the following:

version: '3'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    ports:
      - 22181:2181
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: confluentinc/cp-kafka:latest
    ports:
      - 9092:9092
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1

In the set up above, the zookeeper servers are listening on port 2181 for the kafka service which is defined in the same container set up

Start the Kafka broker

From a directory containing the docker-compose.yml file created in the previous step, run this command to start all services in the correct order.

docker-compose up -d

The above command will first check if the images defined are present locally, if not it will download them. Then the containers are started in daemon mode.

If all goes well the services will be started:

 docker-compose up -d
[+] Running 2/2
 ⠿ Container kafka-zookeeper-1  Started                                                                      1.7s
 ⠿ Container broker             Started                                                                      0.6s

Confirm that they are running:

$ docker-compose ps
NAME                COMMAND                  SERVICE             STATUS              PORTS
broker              "/etc/confluent/dock…"   kafka               running             0.0.0.0:9092->9092/tcp, :::9092->9092/tcp
kafka-zookeeper-1   "/etc/confluent/dock…"   zookeeper           running             2888/tcp, 3888/tcp, 0.0.0.0:22181->2181/tcp, :::22181->2181/tcp

Create kafka topic

Kafka stores messages in topics. It’s good practice to explicitly create them before using them, even if Kafka is configured to automagically create them when referenced.

Run this command to exec into the container and create a new topic into which we’ll write and read some test messages.

$ docker exec broker \
    kafka-topics --bootstrap-server broker:9092 \
        --create \
        --topic citizix.one

WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic citizix.one.

Write messages to the topic

You can use the kafka-console-producer command line tool to write messages to a topic. This is useful for experimentation (and troubleshooting), but in practice you’ll use the Producer API in your application code, or Kafka Connect for pulling data in from other systems to Kafka.

Run this command. You’ll notice that nothing seems to happen—fear not! It is waiting for your input.

docker exec --interactive --tty broker \
    kafka-console-producer --bootstrap-server broker:9092 \
        --topic citizix.one

Type in some lines of text. Each line is a new message.

>this is the initial message
>and another one
>and the third

When you’ve finished, press Ctrl-D to return to your command prompt.

Read messages from the topic

Now that we’ve written message to the topic, we’ll read those messages back. Run this command to launch the kafka-console-consumer. The --from-beginning argument means that messages will be read from the start of the topic.

docker exec --interactive --tty broker \
    kafka-console-consumer --bootstrap-server broker:9092 \
        --topic citizix.one \
        --from-beginning

As before, this is useful for trialling things on the command line, but in practice you’ll use the Consumer API in your application code, or Kafka Connect for reading data from Kafka to push to other systems.

You’ll see the messages that you entered in the previous step.

this is the initial message
and another one
and the third

To write more messages to the topic, you can leave the consumer command running then in a new terminal window, run the kafka-console-producer again.

Enter some more messages and note how they are displayed almost instantaneously in the consumer terminal.

Press Ctrl-D to exit the producer, and Ctrl-C to stop the consumer.

Multi-node Kafka set up

For more stable environments, we’ll need a resilient setup. We can define a multi node set up by creating more than one containers as shown in the following docker-compose.yaml file:

version: '3'
services:
  zookeeper-1:
    image: confluentinc/cp-zookeeper:latest
    ports:
      - 22181:2181
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  zookeeper-2:
    image: confluentinc/cp-zookeeper:latest
    ports:
      - 22181:2181
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka-1:
    image: confluentinc/cp-kafka:latest
    ports:
      - 9092:9092
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper-1:2181,zookeeper-2:2181'
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka-1:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
    
  kafka-2:
    image: confluentinc/cp-kafka:latest
    ports:
      - 9092:9092
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 2
      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper-1:2181,zookeeper-2:2181'
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka-2:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1

We must ensure that the service names and KAFKA_BROKER_ID are unique across the services.

Moreover, each service must expose a unique port to the host machine. Although zookeeper-1 and zookeeper-2 are listening on port 2181, they’re exposing it to the host via ports 22181 and 32181, respectively. The same logic applies for the kafka-1 and kafka-2 services, where they’ll be listening on ports 29092 and 39092, respectively.

We can start the containers the same way as described above.

Stopping and removing the Kafka broker

Once you’ve finished, you can shut down the Kafka broker. Note that doing this will destroy all messages in the topics that you’ve written.

From a directory containing the docker-compose.yml file created earlier, run this command to stop all services in the correct order.

docker-compose down

If you no longer need the containers you can remove them:

docker-compose rm

Conclusion

In this guide, we used docker and docker compose to spin up a one node and multi node Kafka with zookeeper set up.

comments powered by Disqus
Citizix Ltd
Built with Hugo
Theme Stack designed by Jimmy