How to create a Docker file for cassandra (or any database) that includes a schema?

I would like to create a docker file that will create a Cassandra image using the key space and schema already there when the image starts.

In general, how do you create a Docker file that will create an image that includes some steps that cannot be completed until the container is launched, at least for the first time?

I currently have two steps: create a cassandra image from an existing Docker file of a cassandra file that maps the volume to the CQL schema files in the temporary directory, and then run docker exec with cqlsh to import the schema after the image has been launched as a container .

But this does not create an image using a circuit - just a container. This container can be saved as an image, but it is cumbersome.

docker run --name $CASSANDRA_NAME -d \ -h $CASSANDRA_NAME \ -v $CASSANDRA_DATA_DIR:/data \ -v $CASSANDRA_DIR/target:/tmp/schema \ tobert/cassandra:2.1.7 

then

 docker exec $CASSANDRA_NAME cqlsh -f /tmp/schema/create_keyspace.cql docker exec $CASSANDRA_NAME cqlsh -f /tmp/schema/schema01.cql # etc 

This works, but it makes it impossible to use with tools like Docker, because the associated containers / services will also start and expect the circuit to be in place.

I saw one attempt when the cassandra process tried to run in the background in the Dockerfile during build, then cqlsh was executed, but I don’t think it worked too well.

+6
source share
3 answers

Ok, I had this problem and someone advised me on some strategy:

  • Start with an existing Cockandra Dockerfile, official for example
  • Delete material ENTRYPOINT
  • Copy the file and schema data (.cql) (.csv) into the image and put it somewhere, / opt / data, for example
  • create a shell script that will be used as the last command to launch Cassandra

    a. run cassandra with $ CASSANDRA_HOME / bin / cassandra

    b. If the folder $ CASSANDRA_HOME / data / data / your_keyspace-xxxx exists, and it is not empty, do nothing

    with. Else

     1. sleep some time to allow the server to listen on port 9042 2. when port 9042 is listening, execute the .cql script to load csv files 

I found this procedure rather cumbersome, but there seems to be no other way. For the Cassandra practice lab, it was easier for me to create a VM image using Vagrant and Ansible.

+5
source

Create a Docker Dockerfile_CAS file:


FROM Kassandra: the latest

COPY ddl.cql docker-entrypoint-initdb.d /

COPY docker-entrypoint.sh/docker-entrypoint.sh

RUN ls -la * .sh; chmod + x * .sh; ls -la * .sh

ENTRYPOINT ["/docker-entrypoint.sh"]

CMD ["Cassandra", "-f"]


edit docker-entrypoint.sh, add

for f in docker-entrypoint-initdb.d / *; do case "$ f" in * .sh) echo "$ 0: running $ f"; , "$ f" ;; * .cql) echo "$ 0: runs $ f" && before cqlsh -f "$ f"; do> & 2 echo "Cassandra is unavailable - asleep"; sleep 2; made & ;; *) echo "$ 0: ignoring $ f" ;; esac echo done

higher exec "$ @"


docker assembly -t suraj1287 / cassandra -f Dockerfile_CAS.

and restore the image ...

0
source

Another approach our team uses is to create a schema on the init server. Our Java code test, if SCHEMA exists , if not (new environment, new deployment), create it.

The same for each new TABLE, the automatic CREATE TABLE creates the necessary new tables for new data objects when they are launched in any new cluster (another local developer, production preparation, production).

All this code is isolated inside our DataDriver classes for portability, in case we replace Cassandra with another database in any client or project.

This will prevent a lot of trouble for both administrators and developers. This approach is even suitable for initial data loading, which we use in tests.

0
source

Source: https://habr.com/ru/post/1241073/


All Articles