Database

Install and Configure Apache Cassandra 4.0 in Centos 8

Pinterest LinkedIn Tumblr

In this guide we will go through the process of Installing and setting up Apache Cassandra Version 4 in Centos 8 and RHEL 8 Linux distributions.

Apache Cassandra is a free and open source NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Apache Cassandra was initially developed by Facebook and later on acquired by Apache Foundation.

Apache Cassandra is suited for massive, exponentially-growing amounts of constantly transforming data.

Cassandra Compared to RDBMS

Cassandra have close analogies to concepts in relational databases:

  • Keyspace – Similar to a Database/schema in a RDBMS
  • Table – Similar to table in RDBMS
  • Row – Similar to Row in RDBMS
  • Column – Similar to Column in RDBMS
  • Primary key – Similar to Primary key in RDBMS

Preprequisites

To follow along with this guide, you need:

  • Centos 8 server
  • Root access to the server or user with sudo access
  • Internet access to download the packages

The following are the steps we will follow to install Cassandra

  1. Ensure our system is up to date
  2. Install Java in the system
  3. Install Apache Cassandra in the system
  4. Install and configure Apache Cassandra Client (cqlsh)
  5. Configure Apache Cassandra

Step 1 Ensure our system is up to date

Let’s make sure that the Centos 8 packages installed on the server are up to date. You can do this by running the following commands:

sudo dnf -y update

Step 2 Install Java in the system

Apache Cassandra requires you to have java 8 in your system for it to run. Confirm that Java is installed by typing this command:

java -version

If you see this output:

# java -version
-bash: java: command not found

Then it means java is not installed. Lets install it with this command:

sudo dnf install -y java-1.8.0-openjdk

Once the installation is complete, confirm it with this :

# java -version
openjdk version "1.8.0_302"
OpenJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)

Now that Java 8 is installed in our system, lets install Cassandra.

Step 3. Install Apache Cassandra in the system

Apache Cassandra is not available in the default Centos 8 repositories. Lets create a repo pointing to Cassandra Repos:

Create this /etc/yum.repos.d/cassandra.repo file with the content required using this command:

cat > /etc/yum.repos.d/cassandra.repo <<EOF
[cassandra]
name=Apache Cassandra
baseurl=https://downloads.apache.org/cassandra/redhat/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS
EOF

Now that we have added the repo lets install cassandra:

sudo dnf install -y cassandra

Confirm that Cassandra has been installed using this command:

# rpm -qi cassandra
Name        : cassandra
Version     : 4.0.0
Release     : 1
Architecture: noarch
Install Date: Tue 31 Aug 2021 08:59:00 AM UTC
Group       : Development/Libraries
Size        : 54941890
License     : Apache Software License 2.0
Signature   : RSA/SHA512, Thu 22 Jul 2021 10:22:35 PM UTC, Key ID 5e85b9ae0b84c041
Source RPM  : cassandra-4.0.0-1.src.rpm
Build Date  : Thu 22 Jul 2021 10:22:10 PM UTC
Build Host  : 0b542adba94d
Relocations : (not relocatable)
URL         : http://cassandra.apache.org/
Summary     : Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store.
Description :
Cassandra is a distributed (peer-to-peer) system for the management and storage of structured data.

Once that commannd is installed, The cassandra binary will be created in /usr/sbin/cassandra. Since Centos 8 manages services using systemd, lets create a systemd file in /etc/systemd/system/cassandra.service with the content required to manage the cassandra service

sudo cat > /etc/systemd/system/cassandra.service <<EOF
[Unit]
Description=Apache Cassandra 4.0
After=network.target

[Service]
Type=simple
PIDFile=/var/run/cassandra/cassandra.pid
User=cassandra
Group=cassandra

ExecStart=/usr/sbin/cassandra -f -p /var/run/cassandra/cassandra.pid
Restart=always

[Install]
WantedBy=multi-user.target
EOF

Once the file is created, you can use systemd to manage the service:

Run this command to ensure our new systemd service is registered:

sudo systemctl daemon-reload 

To start the service:

sudo systemctl start cassandra

Confirm that the service is running. Please ensure you see that its Active: active (running) in the following status command:

# sudo systemctl status cassandra
● cassandra.service - Apache Cassandra
   Loaded: loaded (/etc/systemd/system/cassandra.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-08-31 15:50:07 UTC; 8s ago
 Main PID: 100752 (java)
    Tasks: 54 (limit: 23800)
   Memory: 1.1G
   CGroup: /system.slice/cassandra.service
           └─100752 java -ea -da:net.openhft... -XX:+UseThreadPriorities -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfD>

Aug 31 15:50:13 ipa-server.citizix.light cassandra[100752]: INFO  [main] 2021-08-31 15:50:13,284 NativeTransportService.java:68 - Netty using native Epoll event loop
Aug 31 15:50:13 ipa-server.citizix.light cassandra[100752]: INFO  [CompactionExecutor:1] 2021-08-31 15:50:13,326 CompactionTask.java:150 - Compacting (20ffe200-0a73-11ec-a273-f980f7c7aa0a) [/var/lib/cassandra>
Aug 31 15:50:13 ipa-server.citizix.light cassandra[100752]: INFO  [main] 2021-08-31 15:50:13,329 PipelineConfigurator.java:124 - Using Netty Version: [netty-buffer=netty-buffer-4.1.58.Final.10b03e6, netty-cod>
Aug 31 15:50:13 ipa-server.citizix.light cassandra[100752]: INFO  [main] 2021-08-31 15:50:13,329 PipelineConfigurator.java:125 - Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
Aug 31 15:50:13 ipa-server.citizix.light cassandra[100752]: INFO  [main] 2021-08-31 15:50:13,334 CassandraDaemon.java:780 - Startup complete

Enable the cassandra service to always run on boot:

sudo systemctl enable cassandra

Use the nodetool status command to confirm the status of the current node status:

# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load        Tokens  Owns (effective)  Host ID                               Rack
UN  127.0.0.1  166.07 KiB  16      100.0%            2b8341f0-2638-46bb-a0e0-e20b86f96d0a  rack1

Now that the service is up and running, lets install the client:

Step 4. Install and configure Apache Cassandra Client (cqlsh)

Now that the Apache Cassandra service has been installed and configured we need to connect to it.

The client tool used to access Cassandra (cqlsh) is a python client. So before installing it we need to set up the environment for python:

Install python3 and python pip

sudo dnf install -y python39 python39-pip

Confirm the installation and Python and pip:

# python3 -V
Python 3.9.2

# pip3 -V
pip 20.2.4 from /usr/lib/python3.9/site-packages/pip (python 3.9)

Using pip, install cqlsh:

sudo pip3 install cqlsh

Now we can use cqlsh to connect to cassandra. Since Cassandra is installed locally we do not have to specify any host:

# cqlsh
Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.0 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cqlsh>

Now that the client is set up, let’s configure Cassandra

Step 4. Configure Apache Cassandra

Lets configure Apache Cassandra. The main configuration file is located here /etc/cassandra/default.conf/cassandra.yaml

Notable configs to change:

  • cluster_name – the name of your cluster
  • seeds contains the list of IP addresses of your cluster seed, comma separated
  • listen_address contains the IP address of your node, this is what allows other nodes to communicate with this node

On top of that, you can also add the env variabble JVM_OPTS adding any additional JVM command line arguments. This will be passed to the cassandra service when starting.

By default, Cassandra’s cluster name is ‘Test Cluster’. You can change this to your preferred cluster name by logging in using cqlsh and running the command below.

UPDATE system.local 
SET cluster_name = 'Citizix Cluster' 
WHERE KEY = 'local';

After that update the config file /etc/cassandra/default.conf/cassandra.yaml with the new name:

sudo vim /etc/cassandra/default.conf/cassandra.yaml

Then update this line:

cluster_name: 'Citizix Cluster'

Save and exit.

Lets clear the system cache using this:

nodetool flush system

Then restart the cassandra service

sudo systemctl restart cassandra

Log in again to confirm the cluster name as shown.

Conclusion

We managed to install and configure cassanda in the above guide. Please note the following:

  • Cassandra stores its log in this directory /var/log/cassandra/ with the main log file located in /var/log/cassandra/system.log
  • Since we created a systemd service, you can also check stdout and stderr logs using this command:
    sudo journalctl -fu cassandra
  • This /var/lib/cassandra is set as the default data directory. You can update that in the config file
  • Cassandra configuration files are stored in this directory /etc/cassandra/ including the default config file /etc/cassandra/default.conf/cassandra.yaml
  • To manage the cassandra service that we created:
    # Start the service
    sudo systemctl start cassandra
    # Check the service status
    sudo systemctl status cassandra
    # Stop the service
    sudo systemctl stop cassandra
    # Enable the service
    sudo systemctl enable cassandra
    # Restart the service
    sudo systemctl restart cassandra

Please also check the Cassandra Documentation page here for more info.

I am a Devops Engineer, but I would describe myself as a Tech Enthusiast who is a fan of Open Source, Linux, Automations, Cloud and Virtualization. I love learning and exploring new things so I blog in my free time about Devops related stuff, Linux, Automations and Open Source software. I can also code in Python and Golang.

Write A Comment