Docker Images

incubator-sdap-nexus contains a number of different Docker images for download. All images are available from the SDAP organization on DockerHub.

Solr Images

All docker builds for the Solr images should happen from this directory. For copy/paste ability, first export the environment variable BUILD_VERSION to the version number you would like to tag images as.

Common Environment Variables

Any environment variable that can be passed to solr.in.sh and be passed as an environment variable to the docker container and it will be utilized. A few options are called out here:

SOLR_HEAP

default: 512m

Increase Java Heap as needed to support your indexing / query needs

SOLR_HOME

default /opt/solr/server/solr

Path to a directory for Solr to store cores and their data. This directory is exposed as a VOLUME that can be mounted.

If you want to mount the SOLR_HOME directory to a directory on the host machine, you need to provide the container path to the docker run -v option. Doing this allows you to retain the index between start/stop of this container.

sdap/solr

This is the base image used by both singlenode and cloud versions of the Solr image.

How To Build

This image can be built by:

docker build -t sdap/solr:${BUILD_VERSION} .

How to Run

This image is not intended to be run directly.

sdap/solr-singlenode

This is the singlenode version of Solr.

How To Build

This image can be built from the incubator/sdap/solr directory:

docker build -t sdap/solr-singlenode:${BUILD_VERSION} -f singlenode/Dockerfile --build-arg tag_version=${BUILD_VERSION} .

How to Run

This Docker container runs Apache Solr v7.4 as a single node with the nexustiles collection. The main decision when running this image is wether or not you want data to persist when the container is stopped or if the data should be discarded.

Persist Data

To persist the data in the nexustiles collection, we need to provide a volume mount from the host machine to the container path where the collection data is stored. By default, collection data is stored in the location indicated by the $SOLR_HOME environment variable. If you do not provide a custom SOLR_HOME location, the default is /opt/solr/server/solr. Therefore, the easiest way to run this image and persist data to a location on the host machine is:

docker run --name solr -v ${PWD}/solrhome/nexustiles:/opt/solr/server/solr/nexustiles -p 8083:8083 -d sdap/solr-singlenode:${BUILD_VERSION}

${PWD}/solrhome/nexustiles is the directory on host machine where the nexustiles collection will be created if it does not already exist. If you have run this container before and ${PWD}/solrhome/nexustiles already contains files, those files will not be overwritten. In this way, it is possible to retain data on the host machine between runs of this docker image.

Don’t Persist Data

If you do not need to persist data between runs of this image, just simply run the image without a volume mount.

docker run --name solr -p 8083:8083 -d sdap/solr-singlenode:${BUILD_VERSION}

When the container is removed, the data will be lost.

sdap/solr-cloud

This image runs SolrCloud.

How To Build

This image can be built from the incubator/sdap/solr directory:

docker build -t sdap/solr-cloud:${BUILD_VERSION} -f cloud/Dockerfile --build-arg tag_version=${BUILD_VERSION} .

How to Run

This Docker container runs Apache Solr v7.4 in cloud mode with the nexustiles collection. It requires a running Zookeeper service in order to work. It will automatically bootstrap Zookeeper by uploading configuration and core properties to Zookeeper when it starts.

It is necessary to decide wether or not you want data to persist when the container is stopped or if the data should be discarded.

Note

There are multiple times that host.docker.internal is used in the example docker run commands provided below. This is a special DNS name that is known to work on Docker for Mac for connecting from a container to a service on the host. If you are not launching the container with Docker for Mac, there is no guarantee that this DNS name will be resolvable inside the container.

Cloud Specific Environment Variables
SDAP_ZK_SERVICE_HOST

default: localhost

This is the hostname of the Zookeeper service that Solr should use to connect.

SDAP_ZK_SERVICE_PORT

default: 2181

The port Solr should try to connect to Zookeeper with.

SDAP_ZK_SOLR_CHROOT

default: solr

The Zookeeper chroot under which Solr configuration will be accessed.

SOLR_HOST

default: localhost

The hostname of the Solr instance that will be recored in Zookeeper.

Zookeeper

Zookeeper can be running on the host machine or anywhere that docker can access (e.g. a bridge network). Take note of the host where Zookeeper is running and use that value for the SDAP_ZK_SERVICE_HOST environment variable.

Persist Data

To persist the data, we need to provide a volume mount from the host machine to the container path where the collection data is stored. By default, collection data is stored in the location indicated by the $SOLR_HOME environment variable. If you do not provide a custom SOLR_HOME location, the default is /opt/solr/server/solr.

Assuming Zookeeper is running on the host machine port 2181, the easiest way to run this image and persist data to a location on the host machine is:

docker run --name solr -v ${PWD}/solrhome:/opt/solr/server/solr -p 8983:8983 -d -e SDAP_ZK_SERVICE_HOST="host.docker.internal" -e SOLR_HOST="host.docker.internal" sdap/solr-cloud:${VERSION}

${PWD}/solrhome is the directory on host machine where SOLR_HOME will be created if it does not already exist.

Don’t Persist Data

If you do not need to persist data between runs of this image, just simply run the image without a volume mount.

Assuming Zookeeper is running on the host machine port 2181, the easiest way to run this image without persisting data is:

docker run --name solr -p 8983:8983 -d -e SDAP_ZK_SERVICE_HOST="host.docker.internal" -e SOLR_HOST="host.docker.internal" sdap/solr-cloud:${VERSION}

When the container is removed, the data will be lost.

Collection Initialization

Solr Collections must be created after at least one SolrCloud node is live. When a collection is created, by default Solr will attempt to spread the shards across all of the live nodes at the time of creation. This poses two problems

  1. The nexustiles collection can not be created during a “bootstrapping” process in this image.
  2. The nexustiles collection should not be created until an appropriate amount of nodes are live.

A helper container has been created to deal with these issues. See sdap/solr-cloud-init for more details.

The other option is to create the collection manually after starting as many SolrCloud nodes as desired. This can be done through the Solr Admin UI or by utilizing the admin collections API.

sdap/solr-cloud-init

This image can be used to automatically create the nexustiles collection in SolrCloud.

How To Build

This image can be built from the incubator/sdap/solr directory:

docker build -t sdap/solr-cloud-init:${BUILD_VERSION} -f cloud-init/Dockerfile .

How to Run

This image is designed to run in a container alongside the sdap/solr-cloud container. The purpose is to detect if there are at least MINIMUM_NODES live nodes in the cluster. If there are, then detect if the nexustiles collection exists or not. If it does not, this script will create it using the parameters defined by the CREATE_COLLECTION_PARAMS environment variable. See the reference documents for the create function for the Solr collections API for valid parameters.

Note

The action=CREATE parameter is already passed for you and should not be part of CREATE_COLLECTION_PARAMS

Note

This image was designed to be long running. It will only exit if there was an error detecting or creating the nexustiles collection.

Environment Variables
MINIMUM_NODES

default: 1

The minimum number of nodes that must be ‘live’ before the collection is created.

SDAP_ZK_SOLR

default: localhost:2181/solr

The host:port/chroot of the zookeeper being used by SolrCloud.

SDAP_SOLR_URL

default: http://localhost:8983/solr/

The URL that should be polled to check if a SolrCloud node is running. This should be the URL of the sdap/solr-cloud container that is being started alongside this container.

ZK_LOCK_GUID

default: c4d193b1-7e47-4b32-a169-a596463da0f5

A GUID that is used to create a lock in zookeeper so that if more than one of these init containers are started at the same time, only one will attempt to create the collection. This GUID should be the same across all containers that are trying to create the same collection.

MAX_RETRIES

default: 30

The number of times we will try to connect to SolrCloud at SDAP_SOLR_URL. This is roughly equivalent to how many seconds we will wait for the node at SDAP_SOLR_URL to become available. If MAX_RETRIES is exceeded, the container will exit with an error.

CREATE_COLLECTION_PARAMS

default: name=nexustiles&collection.configName=nexustiles&numShards=1

The parameters sent to the collection create function. See the reference documents for the create function for the Solr collections API for valid parameters.

Example Run

Assuming Zookeeper is running on the host machine port 2181, and a sdap/solr-cloud container is also running with port 8983 mapped to the host machine, the easiest way to run this image is:

docker run -it --rm --name init -e SDAP_ZK_SOLR="host.docker.internal:2181/solr" -e SDAP_SOLR_URL="http://host.docker.internal:8983/solr/" sdap/solr-cloud-init:${BUILD_VERSION}

After running this image, the nexustiles collection should be available on the SolrCloud installation. Check the logs for the container to see details.