Skip to content

Instantly share code, notes, and snippets.

@derlin
Last active August 19, 2024 10:16
Show Gist options
  • Save derlin/0d4c98f7787140805793d6268dae8440 to your computer and use it in GitHub Desktop.
Save derlin/0d4c98f7787140805793d6268dae8440 to your computer and use it in GitHub Desktop.
Dockerfile and entrypoint example in order to easily initialize a Cassandra container using *.sh/*.cql scripts in `/docker-entrypoint-initdb.d`

Initializing a Cassandra Docker container with keyspace and data

This gist shows you how to easily create a cassandra image with initial keyspace and values populated.

It is very generic: the entrypoint.sh is able to execute any cql file located in /docker-entrypoint-initdb.d/, a bit like what you do to initialize a MySQL container.

You can add any *.sh or *.cql scripts inside /docker-entrypoint-initdb.d, but note that:

  • *.sh files will be executed BEFORE launching cassandra
  • *.cql files will be executed (with cqlsh -f) AFTER cassandra started

Files are executed in name order (ls * | sort)

How to use

  1. download the Dockerfile and entrypoint.sh
  2. edit the Dockerfile in order to copy your init scripts inside /docker-entrypoint-initdb.d/
  3. build the image: docker build -t my-cassandra-image .
  4. run the image: docker run --rm -p 9042:9042 --name cassandra-container -d my-cassandra-image

Note that the scripts in /docker-entrypoint.sh will only be called on startup. If you decide to persist the data using a volume, this will work all right: the scripts won't be executed when you boot your container a second time. By using a volumne, I mean, e.g.:

docker run --rm -d \
    -p 9042:9042 \
    -v $PWD/data:/var/lib/cassandra \
    --name cassandra-container \
    my-cassandra-image
# NOTE: will also work with other cassandra version tags
FROM cassandra:3.11
# Fix UTF-8 accents in init scripts
ENV LANG C.UTF-8
# Here, you can add any *.sh or *.cql scripts inside /docker-entrypoint-initdb.d
# *.sh files will be executed BEFORE launching cassandra
# *.cql files will be executed with cqlsh -f AFTER cassandra started
# Files are executed in name order (ls * | sort)
COPY *.cql /docker-entrypoint-initdb.d/
# this is the script that will patch the already existing entrypoint from cassandra image
COPY entrypoint.sh /
# Override ENTRYPOINT, keep CMD
ENTRYPOINT ["/entrypoint.sh"]
CMD ["cassandra", "-f"]
#!/usr/bin/env bash
##
## This script will generate a patched docker-entrypoint.sh that:
## - executes any *.sh script found in /docker-entrypoint-initdb.d
## - boots cassandra up
## - executes any *.cql script found in docker-entrypoint-initdb.d
##
## It is compatible with any cassandra:* image
##
## Create script that executes files found in docker-entrypoint-initdb.d/
cat <<'EOF' >> /run-init-scripts.sh
#!/usr/bin/env bash
LOCK=/var/lib/cassandra/_init.done
INIT_DIR=docker-entrypoint-initdb.d
if [ -f "$LOCK" ]; then
echo "@@ Initialization already performed."
exit 0
fi
cd $INIT_DIR
echo "@@ Executing bash scripts found in $INIT_DIR"
# execute scripts found in INIT_DIR
for f in $(find . -type f -name "*.sh" -executable -print | sort); do
echo "$0: sourcing $f"
. "$f"
echo "$0: $f executed."
done
# wait for cassandra to be ready and execute cql in background
(
while ! cqlsh -e 'describe cluster' > /dev/null 2>&1; do sleep 6; done
echo "$0: Cassandra cluster ready: executing cql scripts found in $INIT_DIR"
for f in $(find . -type f -name "*.cql" -print | sort); do
echo "$0: running $f"
cqlsh -f "$f"
echo "$0: $f executed"
done
# mark things as initialized (in case /var/lib/cassandra was mapped to a local folder)
touch $LOCK
) &
EOF
## Patch existing entrypoint to call our script in the background
# This has been inspired by https://www.thetopsites.net/article/51594713.shtml
EP=/patched-entrypoint.sh
sed '$ d' /docker-entrypoint.sh > $EP
cat <<'EOF' >> $EP
/run-init-scripts.sh &
exec "$@"
EOF
# Make both scripts executable
chmod +x /run-init-scripts.sh
chmod +x $EP
# Call the new entrypoint
$EP "$@"
-- Here, you can execute any CQL commands, e.g.
CREATE KEYSPACE some_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
CREATE TABLE some_keyspace.some_table (
id int,
month text,
timestamp timestamp,
value text,
PRIMARY KEY ((id, month), timestamp)
) WITH CLUSTERING ORDER BY (timestamp ASC);
@derlin
Copy link
Author

derlin commented Aug 25, 2020

@danpaldev
Copy link

danpaldev commented Sep 22, 2020

Easy and concise! Thank you very much for this, hard to believe that a simple migration is so complicated in Cassandra.

When trying this for first time I got an error related with "entrypoint.sh", it was a permission issue and I fixed it by adding a chmod command after the copy command. My final Dockerfile now looks like this and works fine:

FROM cassandra:latest

ENV LANG C.UTF-8

COPY *.cql /docker-entrypoint-initdb.d/

COPY entrypoint.sh /

RUN ["chmod", "+x", "/entrypoint.sh"]

ENTRYPOINT ["/entrypoint.sh"]
CMD ["cassandra", "-f"]

Hopefully this will be useful for people encountering the same problem.

@EthanHaid
Copy link

EthanHaid commented Aug 21, 2021

This works, but a few notes:

I had to remove the sed command on line 54 of entrypoint.sh, the file doesn't seem to exist in my Cassandra 4.0 image
Also, it's best to put set -e at the beginning of entrypoint.sh so the container doesn't deploy with a failed init.

In the end, I just used the Bitnami docker image instead, since it does the same thing by default, and it provides a non-root user for prod deployment

@Cosmicoppai
Copy link

Cosmicoppai commented Sep 8, 2021

I'm getting this error

 "standard_init_linux.go:228: exec user process caused: exec format error" while building image using cassandra:4.0.

My Dockerfile and entrypoint.sh are same as above

@Cosmicoppai
Copy link

Cosmicoppai commented Sep 8, 2021

I made some changes mentioned by @EthanHaid.

Now , I'm getting this error.

"Running Cassandra as root user or group is not recommended - please start Cassandra using a different system user.
node-1    | If you really want to force running Cassandra as root, use -R command line option."

@derlin
Copy link
Author

derlin commented Sep 8, 2021

Well it seems Cassandra 4 changed some stuffs. I will have a look. In the meantime, @Cosmicoppai, did you try adding the -R argument to the CMD?

@Cosmicoppai
Copy link

@derlin after using -R, the script is executing, but getting this error now

Fatal configuration error node-1 | org.apache.cassandra.exceptions.ConfigurationException: Unable to bind to address /172.23.0.7:7000. Set listen_address in cassandra.yaml to an interface you can bind to, e.g., your private IP address on EC2

I've mounted the 'cassandra.yaml' in docker-compose file to set authenticator properties on nodes, but the docker is not resolving the IP,

@VadimRight
Copy link

VadimRight commented Apr 24, 2024

Dude, I just want to say thank you for this github page, you are the best!
I wish there were more people like you who just share such helpful code with others

@derlin
Copy link
Author

derlin commented Apr 27, 2024

@VadimRight thank you so much for taking the time to leave this comment. It is really nice to know it is worth continuing, I really appreciate 😊

@kamauz
Copy link

kamauz commented Aug 19, 2024

This works, but a few notes:

I had to remove the sed command on line 54 of entrypoint.sh, the file doesn't seem to exist in my Cassandra 4.0 image Also, it's best to put set -e at the beginning of entrypoint.sh so the container doesn't deploy with a failed init.

In the end, I just used the Bitnami docker image instead, since it does the same thing by default, and it provides a non-root user for prod deployment

I checked in Cassandra 5.0
"docker-entrypoint.sh" file has been moved and its path is: "/usr/local/bin/docker-entrypoint.sh".
If you comment the line 54 in "entrypoint.sh" file, you cannot access to Cassandra from other docker containers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment