Docker Images¶
incubator-sdap-nexus contains a number of different Docker images for download. All images are available from the SDAP organization on DockerHub.
Solr Images¶
All docker builds for the Solr images should happen from this directory. For copy/paste ability, first export the environment variable BUILD_VERSION
to the version number you would like to tag images as.
Common Environment Variables¶
Any environment variable that can be passed to solr.in.sh and be passed as an environment variable to the docker container and it will be utilized. A few options are called out here:
SOLR_HEAP
default: 512m
Increase Java Heap as needed to support your indexing / query needs
SOLR_HOME
default /opt/solr/server/solr
Path to a directory for Solr to store cores and their data. This directory is exposed as a
VOLUME
that can be mounted.
If you want to mount the SOLR_HOME
directory to a directory on the host machine, you need to provide the container path to the docker run -v
option. Doing this allows you to retain the index between start/stop of this container.
sdap/solr¶
This is the base image used by both singlenode and cloud versions of the Solr image.
How to Run¶
This image is not intended to be run directly.
sdap/solr-singlenode¶
This is the singlenode version of Solr.
How To Build¶
This image can be built from the incubator/sdap/solr directory:
docker build -t sdap/solr-singlenode:${BUILD_VERSION} -f singlenode/Dockerfile --build-arg tag_version=${BUILD_VERSION} .
How to Run¶
This Docker container runs Apache Solr v7.4 as a single node with the nexustiles collection. The main decision when running this image is wether or not you want data to persist when the container is stopped or if the data should be discarded.
Persist Data¶
To persist the data in the nexustiles
collection, we need to provide a volume mount from the host machine to the container path where the collection data is stored. By default, collection data is stored in the location indicated by the $SOLR_HOME
environment variable. If you do not provide a custom SOLR_HOME
location, the default is /opt/solr/server/solr
. Therefore, the easiest way to run this image and persist data to a location on the host machine is:
docker run --name solr -v ${PWD}/solrhome/nexustiles:/opt/solr/server/solr/nexustiles -p 8083:8083 -d sdap/solr-singlenode:${BUILD_VERSION}
${PWD}/solrhome/nexustiles
is the directory on host machine where the nexustiles
collection will be created if it does not already exist. If you have run this container before and ${PWD}/solrhome/nexustiles
already contains files, those files will not be overwritten. In this way, it is possible to retain data on the host machine between runs of this docker image.
Don’t Persist Data¶
If you do not need to persist data between runs of this image, just simply run the image without a volume mount.
docker run --name solr -p 8083:8083 -d sdap/solr-singlenode:${BUILD_VERSION}
When the container is removed, the data will be lost.
sdap/solr-cloud¶
This image runs SolrCloud.
How To Build¶
This image can be built from the incubator/sdap/solr directory:
docker build -t sdap/solr-cloud:${BUILD_VERSION} -f cloud/Dockerfile --build-arg tag_version=${BUILD_VERSION} .
How to Run¶
This Docker container runs Apache Solr v7.4 in cloud mode with the nexustiles collection. It requires a running Zookeeper service in order to work. It will automatically bootstrap Zookeeper by uploading configuration and core properties to Zookeeper when it starts.
It is necessary to decide wether or not you want data to persist when the container is stopped or if the data should be discarded.
Note
There are multiple times that host.docker.internal
is used in the example docker run
commands provided below. This is a special DNS name that is known to work on Docker for Mac for connecting from a container to a service on the host. If you are not launching the container with Docker for Mac, there is no guarantee that this DNS name will be resolvable inside the container.
Cloud Specific Environment Variables¶
SDAP_ZK_SERVICE_HOST
default: localhost
This is the hostname of the Zookeeper service that Solr should use to connect.
SDAP_ZK_SERVICE_PORT
default: 2181
The port Solr should try to connect to Zookeeper with.
SDAP_ZK_SOLR_CHROOT
default: solr
The Zookeeper chroot under which Solr configuration will be accessed.
SOLR_HOST
default: localhost
The hostname of the Solr instance that will be recored in Zookeeper.
Zookeeper¶
Zookeeper can be running on the host machine or anywhere that docker can access (e.g. a bridge network). Take note of the host where Zookeeper is running and use that value for the SDAP_ZK_SERVICE_HOST
environment variable.
Persist Data¶
To persist the data, we need to provide a volume mount from the host machine to the container path where the collection data is stored. By default, collection data is stored in the location indicated by the $SOLR_HOME
environment variable. If you do not provide a custom SOLR_HOME
location, the default is /opt/solr/server/solr
.
Assuming Zookeeper is running on the host machine port 2181, the easiest way to run this image and persist data to a location on the host machine is:
docker run --name solr -v ${PWD}/solrhome:/opt/solr/server/solr -p 8983:8983 -d -e SDAP_ZK_SERVICE_HOST="host.docker.internal" -e SOLR_HOST="host.docker.internal" sdap/solr-cloud:${VERSION}
${PWD}/solrhome
is the directory on host machine where SOLR_HOME
will be created if it does not already exist.
Don’t Persist Data¶
If you do not need to persist data between runs of this image, just simply run the image without a volume mount.
Assuming Zookeeper is running on the host machine port 2181, the easiest way to run this image without persisting data is:
docker run --name solr -p 8983:8983 -d -e SDAP_ZK_SERVICE_HOST="host.docker.internal" -e SOLR_HOST="host.docker.internal" sdap/solr-cloud:${VERSION}
When the container is removed, the data will be lost.
Collection Initialization¶
Solr Collections must be created after at least one SolrCloud node is live. When a collection is created, by default Solr will attempt to spread the shards across all of the live nodes at the time of creation. This poses two problems
- The nexustiles collection can not be created during a “bootstrapping” process in this image.
- The nexustiles collection should not be created until an appropriate amount of nodes are live.
A helper container has been created to deal with these issues. See sdap/solr-cloud-init for more details.
The other option is to create the collection manually after starting as many SolrCloud nodes as desired. This can be done through the Solr Admin UI or by utilizing the admin collections API.
sdap/solr-cloud-init¶
This image can be used to automatically create the nexustiles
collection in SolrCloud.
How To Build¶
This image can be built from the incubator/sdap/solr directory:
docker build -t sdap/solr-cloud-init:${BUILD_VERSION} -f cloud-init/Dockerfile .
How to Run¶
This image is designed to run in a container alongside the sdap/solr-cloud container. The purpose is to detect if there are at least MINIMUM_NODES
live nodes in the cluster. If there are, then detect if the nexustiles
collection exists or not. If it does not, this script will create it using the parameters defined by the CREATE_COLLECTION_PARAMS
environment variable. See the reference documents for the create function for the Solr collections API for valid parameters.
Note
The action=CREATE
parameter is already passed for you and should not be part of CREATE_COLLECTION_PARAMS
Note
This image was designed to be long running. It will only exit if there was an error detecting or creating the nexustiles
collection.
Environment Variables¶
MINIMUM_NODES
default: 1
The minimum number of nodes that must be ‘live’ before the collection is created.
SDAP_ZK_SOLR
default: localhost:2181/solr
The host:port/chroot of the zookeeper being used by SolrCloud.
SDAP_SOLR_URL
default: http://localhost:8983/solr/
The URL that should be polled to check if a SolrCloud node is running. This should be the URL of the sdap/solr-cloud container that is being started alongside this container.
ZK_LOCK_GUID
default: c4d193b1-7e47-4b32-a169-a596463da0f5
A GUID that is used to create a lock in zookeeper so that if more than one of these init containers are started at the same time, only one will attempt to create the collection. This GUID should be the same across all containers that are trying to create the same collection.
MAX_RETRIES
default: 30
The number of times we will try to connect to SolrCloud at
SDAP_SOLR_URL
. This is roughly equivalent to how many seconds we will wait for the node atSDAP_SOLR_URL
to become available. IfMAX_RETRIES
is exceeded, the container will exit with an error.CREATE_COLLECTION_PARAMS
default: name=nexustiles&collection.configName=nexustiles&numShards=1
The parameters sent to the collection create function. See the reference documents for the create function for the Solr collections API for valid parameters.
Example Run¶
Assuming Zookeeper is running on the host machine port 2181, and a sdap/solr-cloud container is also running with port 8983 mapped to the host machine, the easiest way to run this image is:
docker run -it --rm --name init -e SDAP_ZK_SOLR="host.docker.internal:2181/solr" -e SDAP_SOLR_URL="http://host.docker.internal:8983/solr/" sdap/solr-cloud-init:${BUILD_VERSION}
After running this image, the nexustiles
collection should be available on the SolrCloud installation. Check the logs for the container to see details.